- removed 'created' column from various tables. We don't care when things like
models are created, and card creation time didn't reflect the actual time a
card was created
- facts were previously ordered by their creation date. The code would
manually set the creation time for subsequent facts on import by 0.0001
seconds, and then card due times were set by adding the fact time to the
ordinal number*0.000001. This was prone to error, and the number of zeros used
was actually different in different parts of the code. Instead of this, we
replace it with a 'pos' column on facts, which increments for each new fact.
- importing should add new facts with a higher pos, but concurrent updates in
a synced deck can have multiple facts with the same pos
- due times are completely different now, and depend on the card type
- new cards have due=fact.pos or random(0, 10000)
- reviews have due set to an integer representing days since deck
creation/download
- cards in the learn queue use an integer timestamp in seconds
- many columns like modified, lastSync, factor, interval, etc have been converted to
integer columns. They are cheaper to store (large decks can save 10s of
megabytes) and faster to search for.
- cards have their group assigned on fact creation. In the future we'll add a
per-template option for a default group.
- switch to due/random order for the review queue on upgrade. Users can still
switch to the old behaviour if they want, but many people don't care what
it's set to, and due is considerably faster, which may result in a better
user experience
- instead of the old 4 settings, we move to just two, as there's no point
having separate include and exclude options for a non-overlapping set of
cards
- revGroups and newGroups are a list of groupIds to include in the queue. If
all groups are enabled, the UI should set it to an empty list rather than a
list of every available group, and groupLimit() will leave off the
constraint completely
- skip updating buried cards on startup; it's expensive and we'll do that on
deck close in the future
- add an index for groupId. Initial profiling indicates that groupId-based
selective study is considerably faster in certain scenarios
The 50k element deck I'm testing with now opens and builds the queue in 40ms
on a cold cache, of which 34ms is the initial deck startup and 6ms the queue
build. Adding back the undo log and backups will of course increase this, but
this is a big improvement for checking due times in the deck browser.
Users who want to study small subsections at one time (eg, "lesson 14") are
currently best served by creating lots of little decks. This is because:
- selective study is a bit cumbersome to switch between
- the graphs and statitics are for the entire deck
- selective study can be slow on mobile devices - when the list of cards to
hide/show is big, or when there are many due cards, performance can suffer
- scheduling can only be configured per deck
Groups are intended to address the above problems. All cards start off in the
same group, but they can have their group changed. Unlike tags, cards can only
be a member of a single group at once time. This allows us to divide the deck
up into a non-overlapping set of cards, which will make things like showing
due counts for a single category considerably cheaper. The user interface
might want to show something like a deck browser for decks that have more than
one group, showing due counts and allowing people to study each group
individually, or to study all at once.
Instead of storing the scheduling config in the deck or the model, we move the
scheduling into a separate config table, and link that to the groups table.
That way a user can have multiple groups that all share the same scheduling
information if they want.
And deletion tracking is now in a single table.
- limits are stored separately so we can access them quickly when checking
deck counts
- data is used to store cssCache and hexCache; these may be refactored or go
away in the future
- model config is now stored as a json-serialized dict, which allows us to
quickly gather the info and allows for adding extra options more easily in
the future
- denormalize modelId into the cards table, so we can get the model scheduling
information without having to hit the facts table
- remove position - since we will handle spacing differently we don't need a
separate variable to due to define sort order
- remove lastInterval from cards; the new cram mode and review early shouldn't
need it
- successive->streak
- add new columns for learn mode
- move cram mode into new file; learn more and review early need more thought
- initial work on learn mode
- initial unit tests
- move most scheduling parameters from deck to models
- remove obsolete fields in deck and models
- decks->deck
- remove deck id reference in models
- move some deckVars into the deck table
- simplify deckstorage
- lock sessionhelper by default
- add models/currentModel as properties instead of ORM mappings
- remove models.tags
- remove remaining support for memory-backed databases
- use a blank string for syncName instead of null
- remove backup code; will handle in gui
- bump version to 100
- update unit tests
- tags.tag -> tags.name
- priority reset to 0 for now; will be used differently in the future
- cardTags.id removed; (tagId, cardId) is the primary key now
- cardTags.src -> cardTags.type
Cards had developed quite a lot of cruft from incremental changes, and a
number of important attributes were stored in names that had no bearing to
their actual use.
Added:
- position, which new cards will be sorted on in the future
- flags, which is reserved for future use
Renamed:
- type to queue
- relativeDelay to type
- noCount to lapses
Removed:
- all new/young/matureEase counts; the information is in the revlog
- firstAnswered, lastDue, lastFactor, averageTime and totalTime for the same
reason
- isDue, spaceUntil and combinedDue, because they are no longer used. Spaced
cards will be implemented differently in a coming commit.
- priority
- yesCount, because it can be inferred from reps & lapses
- tags; they've been stored in facts for a long time now
Also compatibility with deck versions less than 65 has been dropped, so decks
will need to be upgraded to 1.2 before they can be upgraded by the dev code.
All shared decks are on 1.2, so this should hopefully not be a problem.
- rename to revlog
- change the pk to time, as we want an index on time, and the old multi-column
index was expensive and not useful
- remove yes/no count; they can be inferred from the ease
- remove lastFactor, as it's in the previous entry
- remove delay, it can be inferred from last entry
- remove 'next' from nextInterval and nextFactor
- rename 'thinkingTime' to 'userTime'
- rename reps to rep
- migrate old data to new table, and fix some problems in the process: ease0
-> ease1, and limit thinking time to 60 seconds as it should have been
previously
The stats table was how the early non-SQL versions of Anki kept track of
statistics, before there was a revision log. It is being removed because:
- it's not possible to show the statistics for a subset of the deck
- it can't meaningfully be copied on import/export
- it makes it harder to implement sync merging
Implications:
- graphs and deck stats roughly 1.5-3x longer than before, but we'll have the
ability to generate stats for subsections of the deck, and it's not time
critical code
- people who've been using anki since the very early days may notice a drop in
statistics, as early repetitions were recorded in the stats table but the
revlog didn't exist at that point.
- due bugs in old syncs and imports/exports, the stats and revlog may not
match numbers exactly
To remove it, the following changes have been made:
- the graphs and deck stats now generate their data entirely from the revlog
- there are no stats to keep track of how many cards we've answered, so we
pull that information from the revlog in reset()
- we remove _globalStats and _dailyStats from the deck
- we check if a day rollover has occurred using failedCutoff instead
- we remove the getStats() routine
- the ETA code is currently disabled
- timeboxing routines use repsToday instead of stats
- remove stats delete from export
- remove stats table and index in upgrade
- remove stats syncing and globalStats refresh pre-sync
- remove stats count check in fullSync check, which was redundant anyway
- update unit tests
Also:
- newCountToday -> newCount, to bring it in line with revCount&failedCount
which also reflect the currently due count
- newCount -> newAvail
- timeboxing routines renamed since the old names were confusingly similar to
refreshSession() which does something different
Todo:
- update newSeenToday & repsToday when answering a card
- reimplement eta
Calculating the average on startup is expensive on mobile devices. It might be
nice to provide it as a deck option or per-model setting in the future so that
people can specify how hard their material is and have it treated accordingly.
Previously we had an index on the value field, which was very expensive for
long fields. Instead we use a separate column and take the first 8 characters
of the field value's md5sum, and index that. In decks with lots of text in
fields, it can cut the deck size by 30% or more, and many decks improve by
10-20%. Decks with only a few characters in fields may increase in size
slightly, but this is offset by the fact that we only generate a checksum for
fields that have uniqueness checking on.
Also, fixed import->update reporting the total # of available facts instead of
the number of facts that were imported.
Anki 1.0 had a similar feature but we do things a bit differently now. The
relative spacing applies only to reviews, and spaces cards according to their
interval, instead of spacing all cards the same. Any delay < 1 full day is
treated as no delay, so with the default 10% setting, reviews with an interval
< 10 days are not spaced at all. This should hopefully cut down on support
queries for people wondering why many of their cards were delayed, allows the
two settings to be documented separately, and does away with the somewhat
confusing usage of non-integer new sibling values to disable review spacing.
- this fixes a state where cards failed on that future day could end up
with an earlier due date that the rest of the failed mature cards, leading
to the newly failed cards being repeated prematurely
- this leads to non-deterministic scheduling of the mature bonus fails, so
they are effectively randomized which is probably what most users want
This works fine if the user is showing all cards, but if they have limited
reviews to certain categories, it can result in the counts going negative
because we decremented for cards which weren't actually due. Determining if a
card was actually due or not is an expensive operation, so instead we leave
the counts alone and make sure reviews will finish early if the new/rev counts
are non-zero but the queue is empty.
because field formatting is always on now, users with custom font
sizes/families set only on the card will still have to alter their templates
and either configure the fields or replace the references with triple
curly braces
- move latex preamble into a deck var and include amsmath by default
- include the pre/postamble in the hash, so changes to the preamble result in
newly generated images
- latex now slots in to the formatQA hook to render images in the q/a
- moved call() to utils
- cache/uncache latex have been obsoleted. User can delete manually, and
images will be regenerated with a DB check
- media is no longer hashed, and instead stored in the db using its original
name
- when adding media, its checksum is calculated and used to look for
duplicates
- duplicate filenames will result in a number tacked on the file
- the size column is used to count card references to media. If media is
referenced in a fact but not the question or answer, the count will be zero.
- there is no guarantee media will be listed in the media db if it is unused
on the question & answer
- if rebuildMediaDir(delete=True), then entries with zero references are
deleted, along with any unused files in the media dir.
- rebuildMediaDir() will update the internal checksums, and set the checksum
to "" if a file can't be found
- rebuildMediaDir() is a lot less destructive now, and will leave alone
directories it finds in the media folder (but not look in them either)
- rebuildMediaDir() returns more information about the state of media now
- the online and mobile clients will need to to make sure that when
downloading media, entries with no checksum are non-fatal and should not
abort the download process.
- the ref count is updated every time the q/a is updated - so the db should be
up to date after every add/edit/import
- since we look for media on the q/a now, card templates like '<img
src="{{{field}}}">' will work now
- export original files as gone as it is not needed anymore
- move from per-model media URL to deckVar. downloadMissingMedia() uses this
now. Deck subscriptions will have to be updated to share media another way.
- pass deck in formatQA, as latex support is going to change
this bypasses rebuilding the queue and other startup initialization and thus
loads the deck considerably faster. This is useful when you want to perform
operations on the deck like syncing, but don't need the ability to review
cards
- obsolete spaceUntil - it serves no useful purpose
- the old per-model spacing variables are obsolete, as the new approach
requires uniform spacing across all models for new cards
- introduce a new per-deck variable: newSpacing
- don't fill new queue if we've done today's cards
- still need to check cramming / review early
newSpacing is a time in seconds to delay introduction of sibling new cards.
It can be applied as many times as necessary as there is no harm in new cards
being delayed repeatedly. Because the default queue length is 200 and it can
take quite some time for the spaced cards to be placed in the queue again, we
use a separate array to track spaced new cards provided the configured delay
is less than 20 minutes. At times under 20 minutes this number is not a
guaranteed minimum spacing - if the new card queue is empty the spaced cards
will be flushed before checking the new queue again, as otherwise we end up
trying to fill on every repetition. The due counts no longer decrease by more
than one if the spacing is less than the due cutoff, since that confused some
users.
Review cards are now placed at the end of the current review queue, and will
never be rescheduled to a different day. The old approach had a number of
problems:
- the more card models you had, the more likely a card would be spaced
multiple times, resulting in you forgetting the card before you get a chance
to review it
- spacing was applied even if the due card was already late
- repeatedly failing one card over a period of days or weeks would also stave
the other cards of attention
- the local deck name must now match the online deck
- syncName is a hash of the current deck's path or None
- the hash is checked on deck load, and if it is different (because the deck
was copied or moved), syncing is disabled. This should prevent people from
accidentally clobbering their online decks
When you call operations like deleteCards(), suspendCards() and so on, it is
now necessary to call deck.reset() afterwards. This allows the calling code to
delay a reset if necessary. If the calling code calls a function that says the
caller must reset, the caller should be sure to call .reset() and fetch the
current card again. Failure to do the latter will result in answerCard()
attempting to remove the card from the queue, when the queue has been cleared.
- make sure cardLimit() matches on sql statements that are broken over lines
- fix logic in getCardId()
- don't increment failed count if delay1>0 and card was mature