Commit graph

123 commits

Author SHA1 Message Date
Damien Elmes
ce0df2652f add utils for adding/removing tags 2011-04-28 09:24:03 +09:00
Damien Elmes
e8d6714130 make sure the failed card count reflects the cutoff 2011-04-28 09:24:03 +09:00
Damien Elmes
aacb57c196 add a countless group tree too 2011-04-28 09:24:02 +09:00
Damien Elmes
2dfdfad6f2 update license link 2011-04-28 09:24:01 +09:00
Damien Elmes
8fcc6b3085 gpl3->agpl 2011-04-28 09:24:01 +09:00
Damien Elmes
673913a2a8 fix group tree when no cards use a group 2011-04-28 09:24:01 +09:00
Damien Elmes
c8b16a0e0e accept a reload argument in q() 2011-04-28 09:24:01 +09:00
Damien Elmes
02b1494a10 improve review time data; make sure graph bounds include end points 2011-04-28 09:24:00 +09:00
Damien Elmes
e93ded1e04 update a few group configs 2011-04-28 09:23:59 +09:00
Damien Elmes
77ee8f1385 ditch useGroups 2011-04-28 09:23:59 +09:00
Damien Elmes
f75e2af195 use a single group setting 2011-04-28 09:23:59 +09:00
Damien Elmes
73625e5751 include the gid in the tree so we can tell which groups are real 2011-04-28 09:23:59 +09:00
Damien Elmes
fc96e12a0a add some randomness to lrn interval 2011-04-28 09:23:59 +09:00
Damien Elmes
495b058618 include total count in with rev+due 2011-04-28 09:23:59 +09:00
Damien Elmes
2a1355eb16 make the group tree part of the scheduler instead 2011-04-28 09:23:59 +09:00
Damien Elmes
728715ff84 counts by group 2011-04-28 09:23:59 +09:00
Damien Elmes
e547b0586a simplify undo
The undo code was using triggers and a temporary table to write out all changed rows before making a change. This made for powerful undo/redo support, but had some problems:
- creating the tables and triggers wasn't cheap, especially on mobile devices
- likewise, every data modification required writing into two separate databases, almost doubling the amount of writes required
- it was possible to leave the DB in an inconsistent state if an undoable operation is followed by a non-undoable operation that references the undoable operation, and the user then rolls back the undoable operation.

To address these issues, we simplify undo by integrating it with the autosave changes:
- .save() can be passed a name to mark a rollback point. If the user undoes the change, any changes since the last save are lost
- autosaves happen every 5 minutes, and are pushed back on a .save(), so the maximum work a user can lose is 5 minutes.
- reviews are handled separately, so we can let the user undo multiple reviews at once
- if necessary, special cases could be added for other operations like marking

This means that if a user does two damaging operations in a row they won't be able to restore the first one, but such an event is both unlikely, and is also covered by the backups made each time a deck is opened.
2011-04-28 09:23:59 +09:00
Damien Elmes
2d6e9cc186 shorten some queue options 2011-04-28 09:23:59 +09:00
Damien Elmes
c7eb4253bd set a different log type for lapsed learning 2011-04-28 09:23:58 +09:00
Damien Elmes
3c40854583 fix collapsing; make sure learning cards are put back on the heap 2011-04-28 09:23:57 +09:00
Damien Elmes
cfd4198503 add call to determine number of buttons to show; 2.5m -> 2.5mo 2011-04-28 09:23:57 +09:00
Damien Elmes
31427f0133 fix lapse card scheduling
- make sure we set a timestamp due time, and put the card back in the queue
- add a unit test for it
2011-04-28 09:23:57 +09:00
Damien Elmes
765720b928 make sure due forecast returns 0 for empty days 2011-04-28 09:23:57 +09:00
Damien Elmes
e407697fb9 fix interval calculation for lapsed cards in learning queue 2011-04-28 09:23:57 +09:00
Damien Elmes
c63b4085c6 add queueless counts; fix special field references on upgrade 2011-04-28 09:23:57 +09:00
Damien Elmes
31a548ee42 add a dirty flag
when we make changes that need to be cleaned up on exit, we mark the deck
dirty so that if we exit without saving, we can clean up on next open
2011-04-28 09:23:57 +09:00
Damien Elmes
ed75e4bee2 refactor cloze deletions and text:field, add forecast
Previously cloze deletions were handled by copying the contents of one field
into another and applying transforms to it. This had a number of problems:
- after you add a card, you can't undo the cloze deletion
- if you spot a mistake, you have to edit it twice (or more if you have more
  than one cloze for a sentence)
- making multiple clozes requires copying & pasting the sentence multiple
  times
- this also lead to much bigger decks if the sentences being cloze-deleted are
  large
- related clozes can't be spaced apart as siblings

To address these issues, we introduce the idea of cloze tags in the card
template and fields. If the template has the text:

{{cloze:1:field}}

And a field has the following contents:

{{c1::hello}}

Then the template will automatically replace that part of the text with either
occluded text, or a highlighted answer. All other clozes in the field are
displayed normally.

At the same time, we add support for text: into the template library, instead
of manually creating text: fields in the dict for every field.

Finally, add a forecast routine to get the due counts for the next week, which
is used in the GUI.
2011-04-28 09:23:56 +09:00
Damien Elmes
da0fb0c555 space exiting new cards; fix 1 fact freeze 2011-04-28 09:23:56 +09:00
Damien Elmes
8b2971f91c move time taken maximum into group conf 2011-04-28 09:23:56 +09:00
Damien Elmes
eb18460945 save creation time in deck flush, update cutoff on reset not deck load 2011-04-28 09:23:56 +09:00
Damien Elmes
ccc325f87b remove utcOffset; make it a property of crt instead 2011-04-28 09:23:56 +09:00
Damien Elmes
bd477de1a9 implement cram
still to do:
- altering intervals at cram exit
- tidying up
2011-04-28 09:23:56 +09:00
Damien Elmes
308846aa93 be consistent in review/rev, etc 2011-04-28 09:23:56 +09:00
Damien Elmes
22a72d82c6 give card types a more logical order
they are now 0=new, 1=learning, 2=due, to reflect the natural progression
2011-04-28 09:23:56 +09:00
Damien Elmes
6f5918e8cd move reset etc code into sched 2011-04-28 09:23:56 +09:00
Damien Elmes
be56912caf strip all the old cram code; add way to start cramming from deck 2011-04-28 09:23:56 +09:00
Damien Elmes
17f3cabc25 more unit tests, and unbury on close 2011-04-28 09:23:56 +09:00
Damien Elmes
5e0cc2ff5d next time reports and associated unit tests 2011-04-28 09:23:56 +09:00
Damien Elmes
9e46db5b3e update finished msg 2011-04-28 09:23:56 +09:00
Damien Elmes
0a9d032343 remove review early and learn more 2011-04-28 09:23:56 +09:00
Damien Elmes
a7ec356e0d add tests for leech handling 2011-04-28 09:23:56 +09:00
Damien Elmes
908dccc2c0 implement new review code, add unit tests
Instead of the old approach to sibling spacing, we instead try to pick a due
date that doesn't have any siblings.
2011-04-28 09:23:56 +09:00
Damien Elmes
ca0748b664 limit counts to 1000, at least for now 2011-04-28 09:23:56 +09:00
Damien Elmes
4a6b5c9105 generate the counts separately, so they're not limited by queueLimit
this has a negative performance impact on decks with many new cards or reviews
due, but we can't limit the counts to 200 without annoying people
2011-04-28 09:23:55 +09:00
Damien Elmes
14c49a50cd more learning unit tests 2011-04-28 09:23:55 +09:00
Damien Elmes
e49211c5b6 add edue to cards
The 'entry due' is the due time of a failed card before it enters the learning
queue. When the card graduates or is removed, it has its old due time
restored. We could pull this from the revlog, but it's cheaper to do it this
way.
2011-04-28 09:23:55 +09:00
Damien Elmes
17ee7757de add some group utilities 2011-04-28 09:23:55 +09:00
Damien Elmes
c17758b01c whoops, they're in miliseconds 2011-04-28 09:23:55 +09:00
Damien Elmes
cddbf02734 make sure we return an integer 2011-04-28 09:23:55 +09:00
Damien Elmes
511d6e89a1 remove progress handling code; we'll do it in the GUI or provide cb 2011-04-28 09:23:55 +09:00
Damien Elmes
201e27dd58 add timeToday() and repsToday() 2011-04-28 09:23:55 +09:00
Damien Elmes
174b13f7df speed up new card fetching, add a text factory arg to the db, fix stopMplayer() 2011-04-28 09:23:55 +09:00
Damien Elmes
98290c485d fix integrity check, move dynamic indices into scheduler
A lot of the old checks in fixIntegrity() are no longer relevant, and some of
the others may no longer be required. They can be added back in as the need
arises.
2011-04-28 09:23:54 +09:00
Damien Elmes
e6a8f7f619 move time & count code into scheduler 2011-04-28 09:23:54 +09:00
Damien Elmes
9cec4b2059 ints makes no sense in the context of failed cards; tweak learn code 2011-04-28 09:23:54 +09:00
Damien Elmes
8b46cb4daa log the fact that we're leaving the learning queue in factor 2011-04-28 09:23:54 +09:00
Damien Elmes
e4cc3e3013 revlog work
- remove revlog.py and move code into scheduler
- add a routine to log a learn repetition
- rename flags to type and set type=0 for learn mode
- add to unit test
2011-04-28 09:23:54 +09:00
Damien Elmes
3f92254e1a use the id to sort instead of requiring ordinal
This means that the default learn queue sort order doesn't need another column
in the index, but it also means that generated cards will have a higher id,
and would appear later even if they have a lower ordinal. This is probably an
infrequent issue, and a plugin which rewrites ids would probably be an
adequate solution.
2011-04-28 09:23:54 +09:00
Damien Elmes
ad68500494 updateCache -> renderQA, move some functions around 2011-04-28 09:23:54 +09:00
Damien Elmes
9c247f45bd remove q/a cache, tags in fields, rewrite remaining ids, more
Anki used random 64bit IDs for cards, facts and fields. This had some nice
properties:
- merging data in syncs and imports was simply a matter of copying each way,
  as conflicts were astronomically unlikely
- it made it easy to identify identical cards and prevent them from being
  reimported
But there were some negatives too:
- they're more expensive to store
- javascript can't handle numbers > 2**53, which means AnkiMobile, iAnki and
  so on have to treat the ids as strings, which is slow
- simply copying data in a sync or import can lead to corruption, as while a
  duplicate id indicates the data was originally the same, it may have
  diverged. A more intelligent approach is necessary.
- sqlite was sorting the fields table based on the id, which meant the fields
  were spread across the table, and costly to fetch

So instead, we'll move to incremental ids. In the case of model changes we'll
declare that a schema change and force a full sync to avoid having to deal
with conflicts, and in the case of cards and facts, we'll need to update the
ids on one end to merge. Identical cards can be detected by checking to see if
their id is the same and their creation time is the same.

Creation time has been added back to cards and facts because it's necessary
for sync conflict merging. That means facts.pos is not required.

The graves table has been removed. It's not necessary for schema related
changes, and dead cards/facts can be represented as a card with queue=-4 and
created=0. Because we will record schema modification time and can ensure a
full sync propagates to all endpoints, it means we can remove the dead
cards/facts on schema change.

Tags have been removed from the facts table and are represented as a field
with ord=-1 and fmid=0. Combined with the locality improvement for fields, it
means that fetching fields is not much more expensive than using the q/a
cache.

Because of the above, removing the q/a cache is a possibility now. The q and a
columns on cards has been dropped. It will still be necessary to render the
q/a on fact add/edit, since we need to record media references. It would be
nice to avoid this in the future. Perhaps one way would be the ability to
assign a type to fields, like "image", "audio", or "latex". LaTeX needs
special consider anyway, as it was being rendered into the q/a cache.
2011-04-28 09:23:53 +09:00
Damien Elmes
2f27133705 drop sqlalchemy; massive refactor
SQLAlchemy is a great tool, but it wasn't a great fit for Anki:
- We often had to drop down to raw SQL for performance reasons.
- The DB cursors and results were wrapped, which incurred a
  sizable performance hit due to introspection. Operations like fetching 50k
  records from a hot cache were taking more than twice as long to complete.
- We take advantage of sqlite-specific features, so SQL language abstraction
  is useless to us.
- The anki schema is quite small, so manually saving and loading objects is
  not a big burden.

In the process of porting to DBAPI, I've refactored the database schema:
- App configuration data that we don't need in joins or bulk updates has been
  moved into JSON objects. This simplifies serializing, and means we won't
  need DB schema changes to store extra options in the future. This change
  obsoletes the deckVars table.
- Renamed tables:
-- fieldModels -> fields
-- cardModels -> templates
-- fields -> fdata
- a number of attribute names have been shortened

Classes like Card, Fact & Model remain. They maintain a reference to the deck.
To write their state to the DB, call .flush().

Objects no longer have their modification time manually updated. Instead, the
modification time is updated when they are flushed. This also applies to the
deck.

Decks will now save on close, because various operations that were done at
deck load will be moved into deck close instead. Operations like undoing
buried card are cheap on a hot cache, but expensive on startup.
Programmatically you can call .close(save=False) to avoid a save and a
modification bump. This will be useful for generating due counts.

Because of the new saving behaviour, the save and save as options will be
removed from the GUI in the future.

The q/a cache and field cache generating has been centralized. Facts will
automatically rebuild the cache on flush; models can do so with
model.updateCache().

Media handling has also been reworked. It has moved into a MediaRegistry
object, which the deck holds. Refcounting has been dropped - it meant we had
to compare old and new value every time facts or models were changed, and
existed for the sole purpose of not showing errors on a missing media
download. Instead we just media.registerText(q+a) when it's updated. The
download function will be expanded to ask the user if they want to continue
after a certain number of files have failed to download, which should be an
adequate alternative. And we now add the file into the media DB when it's
copied to th emedia directory, not when the card is commited. This fixes
duplicates a user would get if they added the same media to a card twice
without adding the card.

The old DeckStorage object had its upgrade code split in a previous commit;
the opening and upgrading code has been merged back together, and put in a
separate storage.py file. The correct way to open a deck now is import anki; d
= anki.Deck(path).

deck.getCard() -> deck.sched.getCard()
same with answerCard
deck.getCard(id) returns a Card object now.

And the DB wrapper has had a few changes:
- sql statements are a more standard DBAPI:
 - statement() -> execute()
 - statements() -> executemany()
- called like execute(sql, 1, 2, 3) or execute(sql, a=1, b=2, c=3)
- column0 -> list
2011-04-28 09:23:53 +09:00
Damien Elmes
8e40fdcb18 set the initial factor when card graduates, not when it's created 2011-04-28 09:23:29 +09:00
Damien Elmes
1d6dbf9900 rework tag handling and remove cardTags
The tags tables were initially added to speed up the loading of the browser by
speeding up two operations: gathering a list of all tags to show in the
dropdown box, and finding cards with a given tag. The former functionality is
provided by the tags table, and the latter functionality by the cardTags
table.

Selective study is handled by groups now, which perform better since they
don't require a join or subselect, and can be embedded in the index. So the
only remaining benefit of cardTags is for the browser.

Performance testing indicates that cardTags is not saving us a large amount.
It only takes us 30ms to search a 50k card table for matches with a hot cache.
On a cold cache it means the facts table has to be loaded into memory, which
roughly doubles the load time with the default settings (we need to load the
cards table too, as we're sorting the cards), but that startup time was
necessary with certain settings in the past too (sorting by fact created for
example). With groups implemented, the cost of maintaining a cache just for
initial browser load time is hard to justify.

Other changes:

- the tags table has any missing tags added to it when facts are added/edited.
  This means old tags will stick around even when no cards reference them, but
  is much cheaper than reference counting or a separate table, and simplifies
  updates and syncing.
- the tags table has a modified field now so we can can sync it instead of
  having to scan all facts coming across in a sync
- priority field removed
- we no longer put model names or card templates into the tags table. There
  were two reasons we did this in the past: so we could cram/selective study
  them, and for plugins. Selective study uses groups now, and plugins can
  check the model's name instead (and most already do). This also does away
  with the somewhat confusing behaviour of names also being tags.
- facts have their tags as _tags now. You can get a list with tags(), but
  editing operations should use add/deleteTags() instead of manually editing
  the string.
2011-04-28 09:23:29 +09:00
Damien Elmes
55f4b9b7d0 favour integers, change due representation, fact&card ordering, more
- removed 'created' column from various tables. We don't care when things like
  models are created, and card creation time didn't reflect the actual time a
  card was created
- facts were previously ordered by their creation date. The code would
  manually set the creation time for subsequent facts on import by 0.0001
  seconds, and then card due times were set by adding the fact time to the
  ordinal number*0.000001. This was prone to error, and the number of zeros used
  was actually different in different parts of the code. Instead of this, we
  replace it with a 'pos' column on facts, which increments for each new fact.
- importing should add new facts with a higher pos, but concurrent updates in
  a synced deck can have multiple facts with the same pos

- due times are completely different now, and depend on the card type
- new cards have due=fact.pos or random(0, 10000)
- reviews have due set to an integer representing days since deck
  creation/download
- cards in the learn queue use an integer timestamp in seconds

- many columns like modified, lastSync, factor, interval, etc have been converted to
  integer columns. They are cheaper to store (large decks can save 10s of
  megabytes) and faster to search for.

- cards have their group assigned on fact creation. In the future we'll add a
  per-template option for a default group.

- switch to due/random order for the review queue on upgrade. Users can still
  switch to the old behaviour if they want, but many people don't care what
  it's set to, and due is considerably faster, which may result in a better
  user experience
2011-04-28 09:23:28 +09:00
Damien Elmes
11a035e2f8 change the default rev sort order to daily random; add randomize 2011-04-28 09:23:28 +09:00
Damien Elmes
7694ff81c5 replace old selective study with new lists; fix upgrade
- instead of the old 4 settings, we move to just two, as there's no point
  having separate include and exclude options for a non-overlapping set of
  cards
- revGroups and newGroups are a list of groupIds to include in the queue. If
  all groups are enabled, the UI should set it to an empty list rather than a
  list of every available group, and groupLimit() will leave off the
  constraint completely
2011-04-28 09:23:28 +09:00
Damien Elmes
4c931e0ff4 move other queue variables into qconf; discard old selective study 2011-04-28 09:23:28 +09:00
Damien Elmes
b3ee91a9d5 add index for groupId, improve startup speed
- skip updating buried cards on startup; it's expensive and we'll do that on
  deck close in the future
- add an index for groupId. Initial profiling indicates that groupId-based
  selective study is considerably faster in certain scenarios

The 50k element deck I'm testing with now opens and builds the queue in 40ms
on a cold cache, of which 34ms is the initial deck startup and 6ms the queue
build. Adding back the undo log and backups will of course increase this, but
this is a big improvement for checking due times in the deck browser.
2011-04-28 09:23:28 +09:00
Damien Elmes
bb79b0e17c add new 'groups' concept, refactor deletions
Users who want to study small subsections at one time (eg, "lesson 14") are
currently best served by creating lots of little decks. This is because:
- selective study is a bit cumbersome to switch between
- the graphs and statitics are for the entire deck
- selective study can be slow on mobile devices - when the list of cards to
  hide/show is big, or when there are many due cards, performance can suffer
- scheduling can only be configured per deck

Groups are intended to address the above problems. All cards start off in the
same group, but they can have their group changed. Unlike tags, cards can only
be a member of a single group at once time. This allows us to divide the deck
up into a non-overlapping set of cards, which will make things like showing
due counts for a single category considerably cheaper. The user interface
might want to show something like a deck browser for decks that have more than
one group, showing due counts and allowing people to study each group
individually, or to study all at once.

Instead of storing the scheduling config in the deck or the model, we move the
scheduling into a separate config table, and link that to the groups table.
That way a user can have multiple groups that all share the same scheduling
information if they want.

And deletion tracking is now in a single table.
2011-04-28 09:23:28 +09:00
Damien Elmes
2613143fe9 improve dynamic indices, implement new queue 2011-04-28 09:23:28 +09:00
Damien Elmes
d34a76d5a0 move deck config into json ala models
- limits are stored separately so we can access them quickly when checking
  deck counts
- data is used to store cssCache and hexCache; these may be refactored or go
  away in the future
2011-04-28 09:23:28 +09:00
Damien Elmes
b0b4074cbd start work on learn mode, change models, more
- model config is now stored as a json-serialized dict, which allows us to
  quickly gather the info and allows for adding extra options more easily in
  the future
- denormalize modelId into the cards table, so we can get the model scheduling
  information without having to hit the facts table
- remove position - since we will handle spacing differently we don't need a
  separate variable to due to define sort order
- remove lastInterval from cards; the new cram mode and review early shouldn't
  need it
- successive->streak
- add new columns for learn mode
- move cram mode into new file; learn more and review early need more thought
- initial work on learn mode
- initial unit tests
2011-04-28 09:23:28 +09:00
Damien Elmes
4e7e8b03bc moving scheduling code into separate file, some preliminary refactoring 2011-04-28 09:23:28 +09:00