While writing the documentation I realized that the default templates were
somewhat overwhelming. So I've moved the default settings into the card css,
and moved the css into a separate attribute which gets combined with the
question and answer templates.
Also:
- Detect cloze references directly rather than the conditional wrapper
- Add the cloze css to the template
New/rev card mixing, collapse time and the timeboxing limit are now a
collection property. I appreciate how it could be useful to have those
settings per top-level deck in some cases, but having some settings inherited
from the top level deck makes for a confusing UI.
- allows separate review order for different decks
- makes new card and rev card handling consistent
- for users who find it confusing to have cards from different decks mixed in
and thus click on each deck in turn, they can now just select the parent
deck and have it work as expected
- for users who want their cards mixed together randomly, they can keep the
cards in a single deck
Since we rely on fixIntegrity() to garbage-collect tags and don't try to add
unused ones to the graves, after a check the tag counts will fall out of sync.
- don't send server graves graves back on the next sync
- make sure we update usns of models/tags/decks as well on upload
- don't die when updating decks after current deck deleted
- report counts when sanity check fails
- drop lrnCount; rename lrnRepCount to lrnCount
- on card fetch, decr count by card.left
- drop cardCounts(), rename repCounts() to just counts()
- fix lrn count bugs
Because JSON doesn't support 64 bit numbers, we need to either convert the 64
bit numbers to a string during transport, or store the ids as a string. At
base 91 a 64 bit number only takes an extra two bytes, and it means we can
dump DB results directly into JSON without having to apply any transformation.
We want the background color to fill the card area rather than only the size
of the card content, so we need to set the CSS for the outer container
instead.
Instead of having required and unique flags for every field, enforce both
requirements on the first field, and neither on the rest. This mirrors the
subject/body format people are used to in note-taking apps. The subject
defines the object being learnt, and the remaining fields represent properties
of that object.
In the past, duplicate checking served two purposes: it quickly notified the
user that they're entering the same fact twice, and it notified the user if
they'd accidentally mistyped a secondary field. The former behaviour is
important for avoiding wasted effort, and so it should be done in real time.
The latter behaviour is not essential however - a typo is not wasted effort,
and it could be fixed in a periodic 'find duplicates' function. Given that
some users ended up with sluggish decks due to the overhead a large number of
facts * a large number of unique fields caused, this seems like a change for
the better.
This also means Anki will let you add notes as long as as the first field has
been filled out. Again, this is not a big deal: Anki is still checking to make
sure one or more cards will be generated, and the user can easily add any
missing fields later.
As a bonus, this change simplifies field configuration somewhat. As the card
layout and field dialogs are a popular point of confusion, the more they can
be simplified, the better.
Instead of a separate option to hide question, embed the question into the
answer format by default. Users who don't want to see the question can remove
the question fields, and users who want a separator between the question and
answer (or not) can control it in HTML now.
Also, remove obsolete field CSS, and don't accidentally chomp a character on
upgrade.
Previously {{field}} wrapped the field in a span with the field's font
properties. This wasn't obvious, and caused frequent problems with people
trying to combine field and template text, or use field content in dictionary
links.
Now that AnkiWeb has a wizard for configuring the front & back layout, we can
just put the formatting in the template instead.
The old template handling was too complicated, and generated frequent
questions on the forums. By dropping non-active templates we can do away with
the generate cards function, and advanced users can simulate the old behaviour
by using conditional field templates.
For importing and the deck creation wizard, we need to be able to generate
thousands of cards efficiently. So instead of requiring the creation of a fact
and rendering it, we cache the required fields and cloze references in the
model.
Also, emptyAns is dropped, as people can achieve the same behaviour by adding
the required answer fields as conditional to the question.
Todo: refactor genCards() to work in bulk, handle cloze edits intelligently
(prompt to delete invalid references, create new cards as necessary)
The full sync threshold was a hack to ensure we synced the deck in a
memory-efficient way if there was a lot of data to send. The problem is that
it's not easy for the user to predict how many changes there are, and so it
might come as a surprise to them when a sync suddenly switches to a full sync.
In order to be able to send changes in chunks rather than all at once, some
changes had to be made:
- Clients now set usn=-1 when they modify an object, which allows us to
distinguish between objects that have been modified on the server, and ones
that have been modified on the client. If we don't do this, we would have to
buffer the local changes in a temporary location before adding the server
changes.
- Before a client sends the objects to the server, it changes the usn to
maxUsn both in the payload and the local storage.
- We do deletions at the start
- To determine which card or fact is newer, we have to fetch the modification
time of the local version. We do this in batches rather than try to load the
entire list in memory.
Rather than showing the user how many cards are in the learning queue, we want
to be able to show them the number of reps they have to do to clear the queue,
so they can better estimate the required time. Before we were counting up with
the grade column, but this means we can't quickly sum up the number of reps
left. So we invert it, and count down instead.
I also dropped the 'first time bonus' for now. If there's enough demand for
it, it can be added back by using the flags column, instead of a dedicated
cycles column.
- keep track of rep/time counts per group, instead of just at the top level
- sort by due after retrieving learn cards
- ensure activeGroups is sorted alphabetically
- ensure new cards come in alphabetical group order
- ensure queues are refilled when empty
see the following for background discussion:
http://groups.google.com/group/ankisrs-users/browse_thread/thread/4db5e82f7dff74fb
- change sched index to the more efficient gid, queue, due
- drop the dynamic index support. as there's no no q/a cache anymore, it's
cheap enough to hit the cards table directly, and we can't use the index in
its new form.
- drop order by clauses (see todo)
- ensure there's always an active group. if users want to study all groups at
once, they need to create a top level group. we do this because otherwise
the 'top level group' that's active when everything is selected is not
clear.
to do:
- new cards will appear in gid order, but the gid numbers don't reflect
alphabetical sorting. we need to change the scheduling code so that it steps
through each group in turn
- likewise for the learn queue
Use a more conservative 40MB for systems with a smaller amount of memory.
Ideally we should bump this up if we detect the running system has a decent
amount of memory.
Syncing and shared decks have conflicting priorities:
- For syncing, we need to ensure that the deck remains in a consistent state.
In the past, Anki allowed deletions to be overriden by a more recently
modified object, but this could lead to a broken deck in certain
circumstances. For example, if a user deletes a fact (and its cards) on one
side, but does something to bump a card's mod time on another side, then
when syncing the card would be brought back to life without its fact. Short
of complex code to check all the relations, we're limited to two options:
forcing a full sync when things are deleted, or ensuring objects can't come
back to life.
- When facts are shared between people, we need a way to identify if two facts
arose from the same source. We can't compare based on content, as the
content may have changed partially or completely. And we can't use the
timestamp ids because of the above restriction on bringing objects back to
life. If we did that, people could download a shared deck, decide they don't
want it, and delete it. When they later decide to add it again, it wouldn't
be possible: either nothing would be imported because of the old graves, or
the ids would have to be rewritten. If we do the latter, the facts are no
longer associated with each other, and we lose the ability to update the
deck.
So we need to give facts two IDs: one used as the primary key and for syncing,
and another 'global id' for importing/sharing. I used a 64 bit random number,
because a) it's what Anki's used in the past, so by reusing the old IDs we
don't break existing associations on upgrade, and b) it's a decent compromise
between the possibility of conflicts and performance.
Also re-added a flags column to the facts. The 'data' column is intended to
store JSON in the future for extra features without changing the schema, but
that's slow for simple state checks. Flags will be used as a bitmask.
Upwards counts were a nice idea in theory, but when using them briefly in
practice I quickly realized they're confusing. I'll probably pull the other
option in the future.
because group deletions are likely to be a semi-common operation (esp. for new users trying out shared material), deleting groups will no longer cause a full sync. in order to avoid syncing issues, we now allow cards/facts/etc to point to an invalid group, and in that case, we just treat them like they're in the default group
Decks now have an "update sequence number". All objects also have a USN, which
is set to the deck USN each time they are modified. When syncing, each side
sends any objects with a USN >= clientUSN. When objects are copied via sync,
they have their USNs bumped to the current serverUSN. After a sync, the USN on
both sides is set to serverUSN + 1.
This solves the failing three way test, ensures we receive all changes
regardless of clock drift, and as the revlog also has a USN now, ensures that
old revlog entries are imported properly too.
Objects retain a separate modification time, which is used for conflict
resolution, deck subscriptions/importing, and info for the user.
Note that if the clock is too far off, it will still cause confusion for
users, as the due counts may be different depending on the time. For this
reason it's probably a good idea to keep a limit on how far the clock can
deviate.
We still keep track of the last sync time, but only so we can determine if the
schema has changed since the last sync.
The media code needs to be updated to use USNs too.
- if we store it inside the media folder, we inadvertently bump the folder mod
time every time sqlite creates a journal file
- close/reopen the media db as the deck is closed/opened
I removed the media database in an earlier commit, but it's now necessary
again as I decided to add native media syncing to AnkiWeb.
This time, the DB is stored in the media folder rather than with the deck.
This means we avoid sending it in a full sync, and makes deck backups faster.
The DB is a cache of file modtimes and checksums. When findChanges() is
called, the code checks to see which files were added, changed or deleted
since the last time, and updates the log of changes. Because the scanning step
and log retrieval is separate, it's possible to do the scanning in the
background if the need arises.
If the DB is deleted by the user, Anki will forget any deletions, and add all
the files back to the DB the next time it's accessed.
File changes are recorded as a delete + add.
media.addFile() could be optimized in the future to log media added manually
by the user, allowing us to skip the full directory scan in cases where the
only changes were manually added media.
Ported the sync code to the latest libanki structure. Key points:
No summary:
The old style got each side to fetch ids+mod times and required the client to
diff them and then request or bundle up the appropriate objects. Instead, we now
get each side to send all changed objects, and it's the responsibility of the
other side to decide what needs to be merged and what needs to be discarded.
This allows us to skip a separate summary step, which saves scanning tables
twice, and allows us to reduce server requests from 4 to 3.
Schema changes:
Certain operations that are difficult to merge (such as changing the number of
fields in a model, or deleting models or groups) result in a full sync. The
user is warned about it in the GUI before such schema-changing operations
execute.
Sync size:
For now, we don't try to deal with large incremental syncs. Because the cards,
facts and revlog can be large in memory (hundreds of megabytes in some cases),
they would have to be chunked for the benefit of devices with a low amount of
memory.
Currently findChanges() uses the full fact/card objects which we're planning to
send to the server. It could be rewritten to fetch a summary (just the id, mod
& rep columns) which would save some memory, and then compare against blocks
of a few hundred remote objects at a time. However, it's a bit more
complicated than that:
- If the local summary is huge it could exceed memory limits. Without a local
summary we'd have to query the db for each record, which could be a lot
slower.
- We currently accumulate a list of remote records we need to add locally.
This list also has the potential to get too big. We would need to
periodically commit the changes as we accumulate them.
- Merging a large amount of changes is also potentially slow on mobile
devices.
Given the fact that certain schema-changing operations require a full sync
anyway, I think it's probably best to concentrate on a chunked full sync for
now instead, as provided the user syncs periodically it should not be easy to
hit the full sync limits except after bulk editing operations.
Chunked partial syncing should be possible to add in the future without any
changes to the deck format.
Still to do:
- deck conf merging
- full syncing
- new http proxy
As per the forum thread, the current due counts are really demotivating when
there's a backlog of cards. In attempt to solve this, I'm trying out a new
behaviour as the default: instead of reporting all the due cards including the
backlog, the status bar will show an increasing count of cards studied that
day. Theoretically this should allow users to focus on what they've done
rather than what they have to do. The old behaviour is still there as an option.
We did away with the stats table because it's impossible to merge it, so the
revlog is canonical now. But we also want a cheap way to display to the user
how much time or how many cards they've done over the day, even if their study
is split into multiple sessions. We were already storing the new cards of a
day in the top level groups, so we just expand that out to log the other info
too.
In the event of a user studying in two places on the same day without syncing,
the counts will not be accurate as they can't be merged without consulting the
revlog, which we want to avoid for performance reasons. But the graphs and
stats do not use the groups for reporting, so the inaccurate counts are only
temporary. Might need to mention this in an FAQ.
Also, since groups are cheap to fetch now, cards now automatically limit
timeTaken() to the group limit, instead of relying on the calling code to do
so.
Instead of collecting the exact number of cards, we just record whether a
group has any reviews or new cards. By not needing to calculate the exact
numbers, it runs a lot faster than before.
Also, changed the group code to ensure parents are automatically created when
a group is added.
As discussed on the forums, moving to a single collection requires moving some
deck-level configuration into groups so users can have different settings like
new cards/day for each top level item.
Also:
- store id in groups
- add mod time to gconf updates
- move the limiting code that's not specific to scheduling into groups.py
- store the current model id per top level group
- moved tags into json like previous changes, and dropped the unnecessary id
- added tags.py for a tag manager
- moved the tag utilities from utils into tags.py
Like the previous change, models have been moved from a separate DB table to
an entry in the deck. We need them for many operations including reviewing,
and it's easier to keep them in memory than half on disk with a cache that
gets cleared every time we .reset(). This means they are easily serialized as
well - previously they were part Python and part JSON, which made access
confusing.
Because the data is all pulled from JSON now, the instance methods have been
moved to the model registry. Eg:
model.addField(...) -> deck.models.addField(model, ...).
- IDs are now timestamped as with groups et al.
- The data field for plugins was also removed. Config info can be added to
deck.conf; larger data should be stored externally.
- Upgrading needs to be updated for the new model structure.
- HexifyID() now accepts strings as well, as our IDs get converted to strings
in the serialization process.
Rather than use a combination of id lookups on the groups table and a group
configuration cache in the scheduler, I've moved the groups and group config
into json objects on the deck table. This results in a net saving of code and
saves one or more DB lookups on each card answer, in exchange for a small
increase in deck load/save work.
I did a quick survey of AnkiWeb, and the vast majority of decks use less than
100 tags, and it's safe to assume groups will follow a similar pattern.
All groups and group configs except the default one will use integer
timestamps now, to simplify merging when syncing and importing.
defaultGroup() has been removed in favour of keeping the models up to date
(not yet done).
- should never skip recording graves, for the sake of merging
- 1.0 upgrade will fail on decks that have the same fact creation date. need
to work around this in the future
The approach of using incrementing id numbers works for syncing if we assume
the server is canonical and all other clients rewrite their ids as necessary,
but upon reflection it is not sufficient for merging decks in general, as we
have no way of knowing whether objects with the same id are actually the same
or not. So we need some way of uniquely identifying the object.
One approach would be to go back to Anki 1.0's random 64bit numbers, but as
outlined in a previous commit such large numbers can't be handled easy in some
languages like Javascript, and they tend to be fragmented on disk which
impacts performance. It's much better if we can keep content added at the same
time in the same place on disk, so that operations like syncing which are mainly
interested in newly added content can run faster.
Another approach is to add a separate column containing the unique id, which
is what Mnemosyne 2.0 will be doing. Unfortunately it means adding an index
for that column, leading to slower inserts and larger deck files. And if the
current sequential ids are kept, a bunch of code needs to be kept to ensure ids
don't conflict when merging.
To address the above, the plan is to use a millisecond timestamp as the id.
This ensures disk order reflects creation order, allows us to merge the id and
crt columns, avoids the need for a separate index, and saves us from worrying
about rewriting ids. There is of course a small chance that the objects to be
merged were created at exactly the same time, but this is extremely unlikely.
This commit changes models. Other objects will follow.
the initial plan was to zero the creation time and leave the cards/facts there
until we have a chance to garbage collect them on a schema change, but such an
approach won't work with deck subscriptions
- cards in final review are first reset as rev cards so that type==queue and
they can be restored correctly
- new cards in learning have type set to 1 so they too can be restored
correctly
this means that content added earlier has a higher chance of appearing, but it
makes it consistent with sortCards(), and the user can sort manually if need
be