Commit graph

35 commits

Author SHA1 Message Date
Damien Elmes
ae839dd3a3 group deletion should be recursive; fix css in import 2011-10-22 10:04:32 +09:00
Damien Elmes
bf3bb9dd32 if sync server offline, abort sync tests 2011-10-01 20:44:35 +09:00
Damien Elmes
e7f416406d refactor learning
Rather than showing the user how many cards are in the learning queue, we want
to be able to show them the number of reps they have to do to clear the queue,
so they can better estimate the required time. Before we were counting up with
the grade column, but this means we can't quickly sum up the number of reps
left. So we invert it, and count down instead.

I also dropped the 'first time bonus' for now. If there's enough demand for
it, it can be added back by using the flags column, instead of a dedicated
cycles column.
2011-09-23 10:29:49 +09:00
Damien Elmes
2b34d8a948 more group/sched refactoring
- keep track of rep/time counts per group, instead of just at the top level
- sort by due after retrieving learn cards
- ensure activeGroups is sorted alphabetically
- ensure new cards come in alphabetical group order
- ensure queues are refilled when empty
2011-09-23 08:19:22 +09:00
Damien Elmes
024c42fef8 group scheduling refactor
see the following for background discussion:
http://groups.google.com/group/ankisrs-users/browse_thread/thread/4db5e82f7dff74fb

- change sched index to the more efficient gid, queue, due
- drop the dynamic index support. as there's no no q/a cache anymore, it's
  cheap enough to hit the cards table directly, and we can't use the index in
  its new form.
- drop order by clauses (see todo)
- ensure there's always an active group. if users want to study all groups at
  once, they need to create a top level group. we do this because otherwise
  the 'top level group' that's active when everything is selected is not
  clear.

to do:

- new cards will appear in gid order, but the gid numbers don't reflect
  alphabetical sorting. we need to change the scheduling code so that it steps
  through each group in turn
- likewise for the learn queue
2011-09-22 11:54:01 +09:00
Damien Elmes
ee767ff132 refactor to allow group deletions without schema mod
because group deletions are likely to be a semi-common operation (esp. for new users trying out shared material), deleting groups will no longer cause a full sync. in order to avoid syncing issues, we now allow cards/facts/etc to point to an invalid group, and in that case, we just treat them like they're in the default group
2011-09-15 01:37:30 +09:00
Damien Elmes
bc9f6e6a24 add USNs
Decks now have an "update sequence number". All objects also have a USN, which
is set to the deck USN each time they are modified. When syncing, each side
sends any objects with a USN >= clientUSN. When objects are copied via sync,
they have their USNs bumped to the current serverUSN. After a sync, the USN on
both sides is set to serverUSN + 1.

This solves the failing three way test, ensures we receive all changes
regardless of clock drift, and as the revlog also has a USN now, ensures that
old revlog entries are imported properly too.

Objects retain a separate modification time, which is used for conflict
resolution, deck subscriptions/importing, and info for the user.

Note that if the clock is too far off, it will still cause confusion for
users, as the due counts may be different depending on the time. For this
reason it's probably a good idea to keep a limit on how far the clock can
deviate.

We still keep track of the last sync time, but only so we can determine if the
schema has changed since the last sync.

The media code needs to be updated to use USNs too.
2011-09-13 21:10:21 +09:00
Damien Elmes
362ae3eee2 initial work on sync refactor
Ported the sync code to the latest libanki structure. Key points:

No summary:

The old style got each side to fetch ids+mod times and required the client to
diff them and then request or bundle up the appropriate objects. Instead, we now
get each side to send all changed objects, and it's the responsibility of the
other side to decide what needs to be merged and what needs to be discarded.
This allows us to skip a separate summary step, which saves scanning tables
twice, and allows us to reduce server requests from 4 to 3.

Schema changes:

Certain operations that are difficult to merge (such as changing the number of
fields in a model, or deleting models or groups) result in a full sync. The
user is warned about it in the GUI before such schema-changing operations
execute.

Sync size:

For now, we don't try to deal with large incremental syncs. Because the cards,
facts and revlog can be large in memory (hundreds of megabytes in some cases),
they would have to be chunked for the benefit of devices with a low amount of
memory.

Currently findChanges() uses the full fact/card objects which we're planning to
send to the server. It could be rewritten to fetch a summary (just the id, mod
& rep columns) which would save some memory, and then compare against blocks
of a few hundred remote objects at a time. However, it's a bit more
complicated than that:

- If the local summary is huge it could exceed memory limits. Without a local
  summary we'd have to query the db for each record, which could be a lot
  slower.

- We currently accumulate a list of remote records we need to add locally.
  This list also has the potential to get too big. We would need to
  periodically commit the changes as we accumulate them.

- Merging a large amount of changes is also potentially slow on mobile
  devices.

Given the fact that certain schema-changing operations require a full sync
anyway, I think it's probably best to concentrate on a chunked full sync for
now instead, as provided the user syncs periodically it should not be easy to
hit the full sync limits except after bulk editing operations.

Chunked partial syncing should be possible to add in the future without any
changes to the deck format.

Still to do:
- deck conf merging
- full syncing
- new http proxy
2011-09-08 12:50:42 +09:00
Damien Elmes
d34465c1e6 halve the leech threshold, as it only applies to rev->relearn failures now 2011-09-07 20:02:47 +09:00
Damien Elmes
8997d8cc8b track all reps & time on a per-day basis
We did away with the stats table because it's impossible to merge it, so the
revlog is canonical now. But we also want a cheap way to display to the user
how much time or how many cards they've done over the day, even if their study
is split into multiple sessions. We were already storing the new cards of a
day in the top level groups, so we just expand that out to log the other info
too.

In the event of a user studying in two places on the same day without syncing,
the counts will not be accurate as they can't be merged without consulting the
revlog, which we want to avoid for performance reasons. But the graphs and
stats do not use the groups for reporting, so the inaccurate counts are only
temporary. Might need to mention this in an FAQ.

Also, since groups are cheap to fetch now, cards now automatically limit
timeTaken() to the group limit, instead of relying on the calling code to do
so.
2011-09-07 18:48:29 +09:00
Damien Elmes
28d045feef rewrite groupCounts()
Instead of collecting the exact number of cards, we just record whether a
group has any reviews or new cards. By not needing to calculate the exact
numbers, it runs a lot faster than before.

Also, changed the group code to ensure parents are automatically created when
a group is added.
2011-09-07 03:02:07 +09:00
Damien Elmes
de8a5b69ed top level groups
As discussed on the forums, moving to a single collection requires moving some
deck-level configuration into groups so users can have different settings like
new cards/day for each top level item.

Also:
- store id in groups
- add mod time to gconf updates
- move the limiting code that's not specific to scheduling into groups.py
- store the current model id per top level group
2011-09-07 01:31:46 +09:00
Damien Elmes
c0e992618c add groups.all() 2011-08-28 14:58:22 +09:00
Damien Elmes
a9b4285959 rename a few methods for consistency 2011-08-28 13:48:17 +09:00
Damien Elmes
be5c5a2018 move tags into deck; code into separate file
- moved tags into json like previous changes, and dropped the unnecessary id
- added tags.py for a tag manager
- moved the tag utilities from utils into tags.py
2011-08-28 13:44:29 +09:00
Damien Elmes
78600e8ed6 move group code into a registry like models 2011-08-27 23:45:55 +09:00
Damien Elmes
19fd581839 change default lrn timing; leechAction doesn't need an array 2011-04-28 09:24:05 +09:00
Damien Elmes
cc0df00fe5 put cards at end when forgetting; drop support for forgetting leeches
forgetCards() needs to know the highest positioned card, and that requires a
full table scan, so it's not appropriate for part of answerCards()
2011-04-28 09:24:04 +09:00
Damien Elmes
3c269d5fba sticky fields; add forget option to leeches 2011-04-28 09:24:03 +09:00
Damien Elmes
2dfdfad6f2 update license link 2011-04-28 09:24:01 +09:00
Damien Elmes
8fcc6b3085 gpl3->agpl 2011-04-28 09:24:01 +09:00
Damien Elmes
2479913f9b add a pie graph, add average interval to ivls area 2011-04-28 09:24:00 +09:00
Damien Elmes
e93ded1e04 update a few group configs 2011-04-28 09:23:59 +09:00
Damien Elmes
2d00163323 tree grouping; add column to groups so they can remember tags 2011-04-28 09:23:59 +09:00
Damien Elmes
31427f0133 fix lapse card scheduling
- make sure we set a timestamp due time, and put the card back in the queue
- add a unit test for it
2011-04-28 09:23:57 +09:00
Damien Elmes
8b2971f91c move time taken maximum into group conf 2011-04-28 09:23:56 +09:00
Damien Elmes
9b70af678c add rescheduling and interval reset to cram; don't include already due cards 2011-04-28 09:23:56 +09:00
Damien Elmes
bd477de1a9 implement cram
still to do:
- altering intervals at cram exit
- tidying up
2011-04-28 09:23:56 +09:00
Damien Elmes
908dccc2c0 implement new review code, add unit tests
Instead of the old approach to sibling spacing, we instead try to pick a due
date that doesn't have any siblings.
2011-04-28 09:23:56 +09:00
Damien Elmes
9cec4b2059 ints makes no sense in the context of failed cards; tweak learn code 2011-04-28 09:23:54 +09:00
Damien Elmes
9c247f45bd remove q/a cache, tags in fields, rewrite remaining ids, more
Anki used random 64bit IDs for cards, facts and fields. This had some nice
properties:
- merging data in syncs and imports was simply a matter of copying each way,
  as conflicts were astronomically unlikely
- it made it easy to identify identical cards and prevent them from being
  reimported
But there were some negatives too:
- they're more expensive to store
- javascript can't handle numbers > 2**53, which means AnkiMobile, iAnki and
  so on have to treat the ids as strings, which is slow
- simply copying data in a sync or import can lead to corruption, as while a
  duplicate id indicates the data was originally the same, it may have
  diverged. A more intelligent approach is necessary.
- sqlite was sorting the fields table based on the id, which meant the fields
  were spread across the table, and costly to fetch

So instead, we'll move to incremental ids. In the case of model changes we'll
declare that a schema change and force a full sync to avoid having to deal
with conflicts, and in the case of cards and facts, we'll need to update the
ids on one end to merge. Identical cards can be detected by checking to see if
their id is the same and their creation time is the same.

Creation time has been added back to cards and facts because it's necessary
for sync conflict merging. That means facts.pos is not required.

The graves table has been removed. It's not necessary for schema related
changes, and dead cards/facts can be represented as a card with queue=-4 and
created=0. Because we will record schema modification time and can ensure a
full sync propagates to all endpoints, it means we can remove the dead
cards/facts on schema change.

Tags have been removed from the facts table and are represented as a field
with ord=-1 and fmid=0. Combined with the locality improvement for fields, it
means that fetching fields is not much more expensive than using the q/a
cache.

Because of the above, removing the q/a cache is a possibility now. The q and a
columns on cards has been dropped. It will still be necessary to render the
q/a on fact add/edit, since we need to record media references. It would be
nice to avoid this in the future. Perhaps one way would be the ability to
assign a type to fields, like "image", "audio", or "latex". LaTeX needs
special consider anyway, as it was being rendered into the q/a cache.
2011-04-28 09:23:53 +09:00
Damien Elmes
2f27133705 drop sqlalchemy; massive refactor
SQLAlchemy is a great tool, but it wasn't a great fit for Anki:
- We often had to drop down to raw SQL for performance reasons.
- The DB cursors and results were wrapped, which incurred a
  sizable performance hit due to introspection. Operations like fetching 50k
  records from a hot cache were taking more than twice as long to complete.
- We take advantage of sqlite-specific features, so SQL language abstraction
  is useless to us.
- The anki schema is quite small, so manually saving and loading objects is
  not a big burden.

In the process of porting to DBAPI, I've refactored the database schema:
- App configuration data that we don't need in joins or bulk updates has been
  moved into JSON objects. This simplifies serializing, and means we won't
  need DB schema changes to store extra options in the future. This change
  obsoletes the deckVars table.
- Renamed tables:
-- fieldModels -> fields
-- cardModels -> templates
-- fields -> fdata
- a number of attribute names have been shortened

Classes like Card, Fact & Model remain. They maintain a reference to the deck.
To write their state to the DB, call .flush().

Objects no longer have their modification time manually updated. Instead, the
modification time is updated when they are flushed. This also applies to the
deck.

Decks will now save on close, because various operations that were done at
deck load will be moved into deck close instead. Operations like undoing
buried card are cheap on a hot cache, but expensive on startup.
Programmatically you can call .close(save=False) to avoid a save and a
modification bump. This will be useful for generating due counts.

Because of the new saving behaviour, the save and save as options will be
removed from the GUI in the future.

The q/a cache and field cache generating has been centralized. Facts will
automatically rebuild the cache on flush; models can do so with
model.updateCache().

Media handling has also been reworked. It has moved into a MediaRegistry
object, which the deck holds. Refcounting has been dropped - it meant we had
to compare old and new value every time facts or models were changed, and
existed for the sole purpose of not showing errors on a missing media
download. Instead we just media.registerText(q+a) when it's updated. The
download function will be expanded to ask the user if they want to continue
after a certain number of files have failed to download, which should be an
adequate alternative. And we now add the file into the media DB when it's
copied to th emedia directory, not when the card is commited. This fixes
duplicates a user would get if they added the same media to a card twice
without adding the card.

The old DeckStorage object had its upgrade code split in a previous commit;
the opening and upgrading code has been merged back together, and put in a
separate storage.py file. The correct way to open a deck now is import anki; d
= anki.Deck(path).

deck.getCard() -> deck.sched.getCard()
same with answerCard
deck.getCard(id) returns a Card object now.

And the DB wrapper has had a few changes:
- sql statements are a more standard DBAPI:
 - statement() -> execute()
 - statements() -> executemany()
- called like execute(sql, 1, 2, 3) or execute(sql, a=1, b=2, c=3)
- column0 -> list
2011-04-28 09:23:53 +09:00
Damien Elmes
8e40fdcb18 set the initial factor when card graduates, not when it's created 2011-04-28 09:23:29 +09:00
Damien Elmes
55f4b9b7d0 favour integers, change due representation, fact&card ordering, more
- removed 'created' column from various tables. We don't care when things like
  models are created, and card creation time didn't reflect the actual time a
  card was created
- facts were previously ordered by their creation date. The code would
  manually set the creation time for subsequent facts on import by 0.0001
  seconds, and then card due times were set by adding the fact time to the
  ordinal number*0.000001. This was prone to error, and the number of zeros used
  was actually different in different parts of the code. Instead of this, we
  replace it with a 'pos' column on facts, which increments for each new fact.
- importing should add new facts with a higher pos, but concurrent updates in
  a synced deck can have multiple facts with the same pos

- due times are completely different now, and depend on the card type
- new cards have due=fact.pos or random(0, 10000)
- reviews have due set to an integer representing days since deck
  creation/download
- cards in the learn queue use an integer timestamp in seconds

- many columns like modified, lastSync, factor, interval, etc have been converted to
  integer columns. They are cheaper to store (large decks can save 10s of
  megabytes) and faster to search for.

- cards have their group assigned on fact creation. In the future we'll add a
  per-template option for a default group.

- switch to due/random order for the review queue on upgrade. Users can still
  switch to the old behaviour if they want, but many people don't care what
  it's set to, and due is considerably faster, which may result in a better
  user experience
2011-04-28 09:23:28 +09:00
Damien Elmes
bb79b0e17c add new 'groups' concept, refactor deletions
Users who want to study small subsections at one time (eg, "lesson 14") are
currently best served by creating lots of little decks. This is because:
- selective study is a bit cumbersome to switch between
- the graphs and statitics are for the entire deck
- selective study can be slow on mobile devices - when the list of cards to
  hide/show is big, or when there are many due cards, performance can suffer
- scheduling can only be configured per deck

Groups are intended to address the above problems. All cards start off in the
same group, but they can have their group changed. Unlike tags, cards can only
be a member of a single group at once time. This allows us to divide the deck
up into a non-overlapping set of cards, which will make things like showing
due counts for a single category considerably cheaper. The user interface
might want to show something like a deck browser for decks that have more than
one group, showing due counts and allowing people to study each group
individually, or to study all at once.

Instead of storing the scheduling config in the deck or the model, we move the
scheduling into a separate config table, and link that to the groups table.
That way a user can have multiple groups that all share the same scheduling
information if they want.

And deletion tracking is now in a single table.
2011-04-28 09:23:28 +09:00