SQLAlchemy is a great tool, but it wasn't a great fit for Anki:
- We often had to drop down to raw SQL for performance reasons.
- The DB cursors and results were wrapped, which incurred a
sizable performance hit due to introspection. Operations like fetching 50k
records from a hot cache were taking more than twice as long to complete.
- We take advantage of sqlite-specific features, so SQL language abstraction
is useless to us.
- The anki schema is quite small, so manually saving and loading objects is
not a big burden.
In the process of porting to DBAPI, I've refactored the database schema:
- App configuration data that we don't need in joins or bulk updates has been
moved into JSON objects. This simplifies serializing, and means we won't
need DB schema changes to store extra options in the future. This change
obsoletes the deckVars table.
- Renamed tables:
-- fieldModels -> fields
-- cardModels -> templates
-- fields -> fdata
- a number of attribute names have been shortened
Classes like Card, Fact & Model remain. They maintain a reference to the deck.
To write their state to the DB, call .flush().
Objects no longer have their modification time manually updated. Instead, the
modification time is updated when they are flushed. This also applies to the
deck.
Decks will now save on close, because various operations that were done at
deck load will be moved into deck close instead. Operations like undoing
buried card are cheap on a hot cache, but expensive on startup.
Programmatically you can call .close(save=False) to avoid a save and a
modification bump. This will be useful for generating due counts.
Because of the new saving behaviour, the save and save as options will be
removed from the GUI in the future.
The q/a cache and field cache generating has been centralized. Facts will
automatically rebuild the cache on flush; models can do so with
model.updateCache().
Media handling has also been reworked. It has moved into a MediaRegistry
object, which the deck holds. Refcounting has been dropped - it meant we had
to compare old and new value every time facts or models were changed, and
existed for the sole purpose of not showing errors on a missing media
download. Instead we just media.registerText(q+a) when it's updated. The
download function will be expanded to ask the user if they want to continue
after a certain number of files have failed to download, which should be an
adequate alternative. And we now add the file into the media DB when it's
copied to th emedia directory, not when the card is commited. This fixes
duplicates a user would get if they added the same media to a card twice
without adding the card.
The old DeckStorage object had its upgrade code split in a previous commit;
the opening and upgrading code has been merged back together, and put in a
separate storage.py file. The correct way to open a deck now is import anki; d
= anki.Deck(path).
deck.getCard() -> deck.sched.getCard()
same with answerCard
deck.getCard(id) returns a Card object now.
And the DB wrapper has had a few changes:
- sql statements are a more standard DBAPI:
- statement() -> execute()
- statements() -> executemany()
- called like execute(sql, 1, 2, 3) or execute(sql, a=1, b=2, c=3)
- column0 -> list
Cards had developed quite a lot of cruft from incremental changes, and a
number of important attributes were stored in names that had no bearing to
their actual use.
Added:
- position, which new cards will be sorted on in the future
- flags, which is reserved for future use
Renamed:
- type to queue
- relativeDelay to type
- noCount to lapses
Removed:
- all new/young/matureEase counts; the information is in the revlog
- firstAnswered, lastDue, lastFactor, averageTime and totalTime for the same
reason
- isDue, spaceUntil and combinedDue, because they are no longer used. Spaced
cards will be implemented differently in a coming commit.
- priority
- yesCount, because it can be inferred from reps & lapses
- tags; they've been stored in facts for a long time now
Also compatibility with deck versions less than 65 has been dropped, so decks
will need to be upgraded to 1.2 before they can be upgraded by the dev code.
All shared decks are on 1.2, so this should hopefully not be a problem.
- rename to revlog
- change the pk to time, as we want an index on time, and the old multi-column
index was expensive and not useful
- remove yes/no count; they can be inferred from the ease
- remove lastFactor, as it's in the previous entry
- remove delay, it can be inferred from last entry
- remove 'next' from nextInterval and nextFactor
- rename 'thinkingTime' to 'userTime'
- rename reps to rep
- migrate old data to new table, and fix some problems in the process: ease0
-> ease1, and limit thinking time to 60 seconds as it should have been
previously
The stats table was how the early non-SQL versions of Anki kept track of
statistics, before there was a revision log. It is being removed because:
- it's not possible to show the statistics for a subset of the deck
- it can't meaningfully be copied on import/export
- it makes it harder to implement sync merging
Implications:
- graphs and deck stats roughly 1.5-3x longer than before, but we'll have the
ability to generate stats for subsections of the deck, and it's not time
critical code
- people who've been using anki since the very early days may notice a drop in
statistics, as early repetitions were recorded in the stats table but the
revlog didn't exist at that point.
- due bugs in old syncs and imports/exports, the stats and revlog may not
match numbers exactly
To remove it, the following changes have been made:
- the graphs and deck stats now generate their data entirely from the revlog
- there are no stats to keep track of how many cards we've answered, so we
pull that information from the revlog in reset()
- we remove _globalStats and _dailyStats from the deck
- we check if a day rollover has occurred using failedCutoff instead
- we remove the getStats() routine
- the ETA code is currently disabled
- timeboxing routines use repsToday instead of stats
- remove stats delete from export
- remove stats table and index in upgrade
- remove stats syncing and globalStats refresh pre-sync
- remove stats count check in fullSync check, which was redundant anyway
- update unit tests
Also:
- newCountToday -> newCount, to bring it in line with revCount&failedCount
which also reflect the currently due count
- newCount -> newAvail
- timeboxing routines renamed since the old names were confusingly similar to
refreshSession() which does something different
Todo:
- update newSeenToday & repsToday when answering a card
- reimplement eta
In various parts of the code we need to get all cards of a given category
(new, failed, etc) regardless of whether they're suspended, buried, etc. So we
store the true type in the obsolete relativeDelay column and add in index for
it, because it's cheaper than putting indices on reps & successive.