The full sync threshold was a hack to ensure we synced the deck in a
memory-efficient way if there was a lot of data to send. The problem is that
it's not easy for the user to predict how many changes there are, and so it
might come as a surprise to them when a sync suddenly switches to a full sync.
In order to be able to send changes in chunks rather than all at once, some
changes had to be made:
- Clients now set usn=-1 when they modify an object, which allows us to
distinguish between objects that have been modified on the server, and ones
that have been modified on the client. If we don't do this, we would have to
buffer the local changes in a temporary location before adding the server
changes.
- Before a client sends the objects to the server, it changes the usn to
maxUsn both in the payload and the local storage.
- We do deletions at the start
- To determine which card or fact is newer, we have to fetch the modification
time of the local version. We do this in batches rather than try to load the
entire list in memory.
SQLAlchemy is a great tool, but it wasn't a great fit for Anki:
- We often had to drop down to raw SQL for performance reasons.
- The DB cursors and results were wrapped, which incurred a
sizable performance hit due to introspection. Operations like fetching 50k
records from a hot cache were taking more than twice as long to complete.
- We take advantage of sqlite-specific features, so SQL language abstraction
is useless to us.
- The anki schema is quite small, so manually saving and loading objects is
not a big burden.
In the process of porting to DBAPI, I've refactored the database schema:
- App configuration data that we don't need in joins or bulk updates has been
moved into JSON objects. This simplifies serializing, and means we won't
need DB schema changes to store extra options in the future. This change
obsoletes the deckVars table.
- Renamed tables:
-- fieldModels -> fields
-- cardModels -> templates
-- fields -> fdata
- a number of attribute names have been shortened
Classes like Card, Fact & Model remain. They maintain a reference to the deck.
To write their state to the DB, call .flush().
Objects no longer have their modification time manually updated. Instead, the
modification time is updated when they are flushed. This also applies to the
deck.
Decks will now save on close, because various operations that were done at
deck load will be moved into deck close instead. Operations like undoing
buried card are cheap on a hot cache, but expensive on startup.
Programmatically you can call .close(save=False) to avoid a save and a
modification bump. This will be useful for generating due counts.
Because of the new saving behaviour, the save and save as options will be
removed from the GUI in the future.
The q/a cache and field cache generating has been centralized. Facts will
automatically rebuild the cache on flush; models can do so with
model.updateCache().
Media handling has also been reworked. It has moved into a MediaRegistry
object, which the deck holds. Refcounting has been dropped - it meant we had
to compare old and new value every time facts or models were changed, and
existed for the sole purpose of not showing errors on a missing media
download. Instead we just media.registerText(q+a) when it's updated. The
download function will be expanded to ask the user if they want to continue
after a certain number of files have failed to download, which should be an
adequate alternative. And we now add the file into the media DB when it's
copied to th emedia directory, not when the card is commited. This fixes
duplicates a user would get if they added the same media to a card twice
without adding the card.
The old DeckStorage object had its upgrade code split in a previous commit;
the opening and upgrading code has been merged back together, and put in a
separate storage.py file. The correct way to open a deck now is import anki; d
= anki.Deck(path).
deck.getCard() -> deck.sched.getCard()
same with answerCard
deck.getCard(id) returns a Card object now.
And the DB wrapper has had a few changes:
- sql statements are a more standard DBAPI:
- statement() -> execute()
- statements() -> executemany()
- called like execute(sql, 1, 2, 3) or execute(sql, a=1, b=2, c=3)
- column0 -> list
- model config is now stored as a json-serialized dict, which allows us to
quickly gather the info and allows for adding extra options more easily in
the future
- denormalize modelId into the cards table, so we can get the model scheduling
information without having to hit the facts table
- remove position - since we will handle spacing differently we don't need a
separate variable to due to define sort order
- remove lastInterval from cards; the new cram mode and review early shouldn't
need it
- successive->streak
- add new columns for learn mode
- move cram mode into new file; learn more and review early need more thought
- initial work on learn mode
- initial unit tests