The media table was originally introduced when Anki hashed media filenames,
and needed a way to remember the original filename. It also helped with:
1) getting a quick list of all media used in the deck, or the media added
since the last sync, for mobile clients
2) merging identical files with different names
But had some drawbacks:
- every operation that modifies templates, models or facts meant generating
the q/a and checking if any new media had appeared
- each entry is about 70 bytes, and some decks have 100k+ media files
So we remove the media table. We address 1) by being more intelligent about
media downloads on the mobile platform. We ask the user after a full sync if
they want to look for missing media, and they can choose not to if they know
they haven't added any. And on a partial sync, we can scan the contents of the
incoming facts for media references, and download any references we find. This
also avoids all the issues people had with media not downloading because it
was in their media folder but not in the media database.
For 2), when copying media to the media folder, if we have a duplicate
filename, we check if that file has the same md5, and avoid copying if so.
This won't merge identical content that has separate names, but instances
where users need that are rare.
Anki used random 64bit IDs for cards, facts and fields. This had some nice
properties:
- merging data in syncs and imports was simply a matter of copying each way,
as conflicts were astronomically unlikely
- it made it easy to identify identical cards and prevent them from being
reimported
But there were some negatives too:
- they're more expensive to store
- javascript can't handle numbers > 2**53, which means AnkiMobile, iAnki and
so on have to treat the ids as strings, which is slow
- simply copying data in a sync or import can lead to corruption, as while a
duplicate id indicates the data was originally the same, it may have
diverged. A more intelligent approach is necessary.
- sqlite was sorting the fields table based on the id, which meant the fields
were spread across the table, and costly to fetch
So instead, we'll move to incremental ids. In the case of model changes we'll
declare that a schema change and force a full sync to avoid having to deal
with conflicts, and in the case of cards and facts, we'll need to update the
ids on one end to merge. Identical cards can be detected by checking to see if
their id is the same and their creation time is the same.
Creation time has been added back to cards and facts because it's necessary
for sync conflict merging. That means facts.pos is not required.
The graves table has been removed. It's not necessary for schema related
changes, and dead cards/facts can be represented as a card with queue=-4 and
created=0. Because we will record schema modification time and can ensure a
full sync propagates to all endpoints, it means we can remove the dead
cards/facts on schema change.
Tags have been removed from the facts table and are represented as a field
with ord=-1 and fmid=0. Combined with the locality improvement for fields, it
means that fetching fields is not much more expensive than using the q/a
cache.
Because of the above, removing the q/a cache is a possibility now. The q and a
columns on cards has been dropped. It will still be necessary to render the
q/a on fact add/edit, since we need to record media references. It would be
nice to avoid this in the future. Perhaps one way would be the ability to
assign a type to fields, like "image", "audio", or "latex". LaTeX needs
special consider anyway, as it was being rendered into the q/a cache.
- move latex preamble into a deck var and include amsmath by default
- include the pre/postamble in the hash, so changes to the preamble result in
newly generated images
- latex now slots in to the formatQA hook to render images in the q/a
- moved call() to utils
- cache/uncache latex have been obsoleted. User can delete manually, and
images will be regenerated with a DB check