We did away with the stats table because it's impossible to merge it, so the
revlog is canonical now. But we also want a cheap way to display to the user
how much time or how many cards they've done over the day, even if their study
is split into multiple sessions. We were already storing the new cards of a
day in the top level groups, so we just expand that out to log the other info
too.
In the event of a user studying in two places on the same day without syncing,
the counts will not be accurate as they can't be merged without consulting the
revlog, which we want to avoid for performance reasons. But the graphs and
stats do not use the groups for reporting, so the inaccurate counts are only
temporary. Might need to mention this in an FAQ.
Also, since groups are cheap to fetch now, cards now automatically limit
timeTaken() to the group limit, instead of relying on the calling code to do
so.
Instead of collecting the exact number of cards, we just record whether a
group has any reviews or new cards. By not needing to calculate the exact
numbers, it runs a lot faster than before.
Also, changed the group code to ensure parents are automatically created when
a group is added.
As discussed on the forums, moving to a single collection requires moving some
deck-level configuration into groups so users can have different settings like
new cards/day for each top level item.
Also:
- store id in groups
- add mod time to gconf updates
- move the limiting code that's not specific to scheduling into groups.py
- store the current model id per top level group
- moved tags into json like previous changes, and dropped the unnecessary id
- added tags.py for a tag manager
- moved the tag utilities from utils into tags.py
Like the previous change, models have been moved from a separate DB table to
an entry in the deck. We need them for many operations including reviewing,
and it's easier to keep them in memory than half on disk with a cache that
gets cleared every time we .reset(). This means they are easily serialized as
well - previously they were part Python and part JSON, which made access
confusing.
Because the data is all pulled from JSON now, the instance methods have been
moved to the model registry. Eg:
model.addField(...) -> deck.models.addField(model, ...).
- IDs are now timestamped as with groups et al.
- The data field for plugins was also removed. Config info can be added to
deck.conf; larger data should be stored externally.
- Upgrading needs to be updated for the new model structure.
- HexifyID() now accepts strings as well, as our IDs get converted to strings
in the serialization process.
Rather than use a combination of id lookups on the groups table and a group
configuration cache in the scheduler, I've moved the groups and group config
into json objects on the deck table. This results in a net saving of code and
saves one or more DB lookups on each card answer, in exchange for a small
increase in deck load/save work.
I did a quick survey of AnkiWeb, and the vast majority of decks use less than
100 tags, and it's safe to assume groups will follow a similar pattern.
All groups and group configs except the default one will use integer
timestamps now, to simplify merging when syncing and importing.
defaultGroup() has been removed in favour of keeping the models up to date
(not yet done).
- should never skip recording graves, for the sake of merging
- 1.0 upgrade will fail on decks that have the same fact creation date. need
to work around this in the future
The approach of using incrementing id numbers works for syncing if we assume
the server is canonical and all other clients rewrite their ids as necessary,
but upon reflection it is not sufficient for merging decks in general, as we
have no way of knowing whether objects with the same id are actually the same
or not. So we need some way of uniquely identifying the object.
One approach would be to go back to Anki 1.0's random 64bit numbers, but as
outlined in a previous commit such large numbers can't be handled easy in some
languages like Javascript, and they tend to be fragmented on disk which
impacts performance. It's much better if we can keep content added at the same
time in the same place on disk, so that operations like syncing which are mainly
interested in newly added content can run faster.
Another approach is to add a separate column containing the unique id, which
is what Mnemosyne 2.0 will be doing. Unfortunately it means adding an index
for that column, leading to slower inserts and larger deck files. And if the
current sequential ids are kept, a bunch of code needs to be kept to ensure ids
don't conflict when merging.
To address the above, the plan is to use a millisecond timestamp as the id.
This ensures disk order reflects creation order, allows us to merge the id and
crt columns, avoids the need for a separate index, and saves us from worrying
about rewriting ids. There is of course a small chance that the objects to be
merged were created at exactly the same time, but this is extremely unlikely.
This commit changes models. Other objects will follow.
the initial plan was to zero the creation time and leave the cards/facts there
until we have a chance to garbage collect them on a schema change, but such an
approach won't work with deck subscriptions
- cards in final review are first reset as rev cards so that type==queue and
they can be restored correctly
- new cards in learning have type set to 1 so they too can be restored
correctly