Asynchronous tagging

classic Classic list List threaded Threaded
12 messages Options
Jed Brown Jed Brown
Reply | Threaded
Open this post in threaded view
|

Asynchronous tagging

I'm really enjoying notmuch, thanks!  I have a minor issue and a couple
observations worth noting.

1. Changing tags (like removing inbox/unread) has really high latency.
For example, notmuch-show-advance-marking-read-and-archiving takes 2 to
4 seconds (by comparison, this is as long as a vague search returning
1000+ threads).  I have about 100k messages in a maildir on
linux-2.6.31-6, ext4, xapian-1.0.17.  I tried switching to the
development version of xapian, but the notmuch configure didn't pick it
up (maybe it would still work though).  Is this a known issue?  Is it
worth making certain notmuch.el operations asynchronous to hide this
latency?

2. I have 'notmuch new' in an offlineimap postsync hook, but
notmuch-search-refresh-view occasionally complains that another process
has the lock (since I might press '=' when 'notmuch new' is running).
Waiting a moment and trying again works fine, but it would be nice to
clean this up eventually.

3. I had initially put 'notmuch new' in a cron job (instead of
offlineimap postsync hook) and new/search would sometimes complain about
missing files in the maildir.  The first time this happened, it did not
correct itself and I ended up reimporting the database (I had moved some
things around so I could have been at fault).  Since then I have seen
these errors on a couple occasions, but they always go away upon
rerunning 'notmuch new'.  My guess is that it has to do with offlineimap
changing flags so I moved 'notmuch new' to the postsync hook and have
not seen the errors since.  If it is important that notmuch never runs
concurrently with an offlineimap sync, it should eventually go in the
docs.


Thanks again,

Jed

Jed Brown Jed Brown
Reply | Threaded
Open this post in threaded view
|

Re: Asynchronous tagging

On Sat, 21 Nov 2009 19:35:39 +0100, Jed Brown <[hidden email]> wrote:

[...]

> 3. I had initially put 'notmuch new' in a cron job (instead of
> offlineimap postsync hook) and new/search would sometimes complain about
> missing files in the maildir.  The first time this happened, it did not
> correct itself and I ended up reimporting the database (I had moved some
> things around so I could have been at fault).  Since then I have seen
> these errors on a couple occasions, but they always go away upon
> rerunning 'notmuch new'.  My guess is that it has to do with offlineimap
> changing flags so I moved 'notmuch new' to the postsync hook and have
> not seen the errors since.  If it is important that notmuch never runs
> concurrently with an offlineimap sync, it should eventually go in the
> docs.

Actually, this popped up again.  I have a workaround, but here's the story if you are interested.

After changing a flag in Gmail and syncing with offlineimap, I get this
in my inbox

 Today 19:18 [1/2] (null)                                   (null) (inbox unread)

And when I try to open it, the buffer is full of stderr.

  Error opening /home/jed/.mail-archive/gmail-all/new/1258826583_1.20705.kunyang,U=174235,FMD5=844bb96d088d057aa1b32ac1fbc67b56:2,: No such file or directory
  Error opening /home/jed/.mail-archive/gmail-all/new/1258826583_1.20705.kunyang,U=174235,FMD5=844bb96d088d057aa1b32ac1fbc67b56:2,: No such file or directory
  Error opening /home/jed/.mail-archive/gmail-all/new/1258827595_0.20705.kunyang,U=174288,FMD5=844bb96d088d057aa1b32ac1fbc67b56:2,: No such file or directory
  Error opening /home/jed/.mail-archive/gmail-all/new/1258827595_0.20705.kunyang,U=174288,FMD5=844bb96d088d057aa1b32ac1fbc67b56:2,: No such file or directory
  Error opening /home/jed/.mail-archive/gmail-all/new/1258826583_1.20705.kunyang,U=174235,FMD5=844bb96d088d057aa1b32ac1fbc67b56:2,: No such file or directory
  Error opening /home/jed/.mail-archive/gmail-all/new/1258826583_1.20705.kunyang,U=174235,FMD5=844bb96d088d057aa1b32ac1fbc67b56:2,: No such file or directory
  Error opening /home/jed/.mail-archive/gmail-all/new/1258826583_1.20705.kunyang,U=174235,FMD5=844bb96d088d057aa1b32ac1fbc67b56:2,: No such file or directory
  Error opening /home/jed/.mail-archive/gmail-all/new/1258826583_1.20705.kunyang,U=174235,FMD5=844bb96d088d057aa1b32ac1fbc67b56:2,: No such file or directory
  Error opening /home/jed/.mail-archive/gmail-all/new/1258826583_1.20705.kunyang,U=174235,FMD5=844bb96d088d057aa1b32ac1fbc67b56:2,: No such file or directory
  Error opening /home/jed/.mail-archive/gmail-all/new/1258826583_1.20705.kunyang,U=174235,FMD5=844bb96d088d057aa1b32ac1fbc67b56:2,: No such file or directory
  Error opening /home/jed/.mail-archive/gmail-all/new/1258826583_1.20705.kunyang,U=174235,FMD5=844bb96d088d057aa1b32ac1fbc67b56:2,: No such file or directory
  Error opening /home/jed/.mail-archive/gmail-all/new/1258826583_1.20705.kunyang,U=174235,FMD5=844bb96d088d057aa1b32ac1fbc67b56:2,: No such file or directory
  Error opening /home/jed/.mail-archive/gmail-all/new/1258827595_0.20705.kunyang,U=174288,FMD5=844bb96d088d057aa1b32ac1fbc67b56:2,: No such file or directory
  Error opening /home/jed/.mail-archive/gmail-all/new/1258827595_0.20705.kunyang,U=174288,FMD5=844bb96d088d057aa1b32ac1fbc67b56:2,: No such file or directory
  Error opening /home/jed/.mail-archive/gmail-all/new/1258827595_0.20705.kunyang,U=174288,FMD5=844bb96d088d057aa1b32ac1fbc67b56:2,: No such file or directory
  Error opening /home/jed/.mail-archive/gmail-all/new/1258827595_0.20705.kunyang,U=174288,FMD5=844bb96d088d057aa1b32ac1fbc67b56:2,: No such file or directory
  Error opening /home/jed/.mail-archive/gmail-all/new/1258827595_0.20705.kunyang,U=174288,FMD5=844bb96d088d057aa1b32ac1fbc67b56:2,: No such file or directory
  Error opening /home/jed/.mail-archive/gmail-all/new/1258827595_0.20705.kunyang,U=174288,FMD5=844bb96d088d057aa1b32ac1fbc67b56:2,: No such file or directory
  Error opening /home/jed/.mail-archive/gmail-all/new/1258827595_0.20705.kunyang,U=174288,FMD5=844bb96d088d057aa1b32ac1fbc67b56:2,: No such file or directory
  Error opening /home/jed/.mail-archive/gmail-all/new/1258827595_0.20705.kunyang,U=174288,FMD5=844bb96d088d057aa1b32ac1fbc67b56:2,: No such file or directory

It is present in any searches that contain the problem files

  $ notmuch search tag:inbox | wc
  Error opening /home/jed/.mail-archive/gmail-all/new/1258826583_1.20705.kunyang,U=174235,FMD5=844bb96d088d057aa1b32ac1fbc67b56:2,: No such file or directory
  Error opening /home/jed/.mail-archive/gmail-all/new/1258826583_1.20705.kunyang,U=174235,FMD5=844bb96d088d057aa1b32ac1fbc67b56:2,: No such file or directory
  Error opening /home/jed/.mail-archive/gmail-all/new/1258827595_0.20705.kunyang,U=174288,FMD5=844bb96d088d057aa1b32ac1fbc67b56:2,: No such file or directory
  Error opening /home/jed/.mail-archive/gmail-all/new/1258827595_0.20705.kunyang,U=174288,FMD5=844bb96d088d057aa1b32ac1fbc67b56:2,: No such file or directory
       22     321    3150

but other searches are clean

  $ notmuch search to:[hidden email] | wc
       31     499    4398

Explicitly archiving the null message removes it from these queries so
the clutter is gone now, but it has to be done manually because the null
message doesn't match any search terms.


Jed

Carl Worth-2 Carl Worth-2
Reply | Threaded
Open this post in threaded view
|

Re: Asynchronous tagging

In reply to this post by Jed Brown
On Sat, 21 Nov 2009 19:35:39 +0100, Jed Brown <[hidden email]> wrote:
> I'm really enjoying notmuch, thanks!  I have a minor issue and a couple
> observations worth noting.

Thanks, Jed! And welcome to notmuch.

> 1. Changing tags (like removing inbox/unread) has really high latency.

Yes, this is a known bug in Xapian (it rewrites all of the indexed terms
for the email message even though you're just trying to add/remove one
term). The Xapian ticket for this is here:

        replace_document should make minimal changes to database file
        http://trac.xapian.org/ticket/250

I've looked at the code, and it looks like it's going to be easy to
fix. If anyone wants to try, here's the file to change:

        xapian-core/backends/flint/flint_database.cc

And look for:

        // FIXME - in the case where there is overlap between the new
        // termlist and the old termlist, it would be better to compare the
        // two lists, and make the minimum set of modifications required.
        // This would lead to smaller changesets for replication, and
        // probably be faster overall

So I think this might be as easy as just walking over two sorted lists
looking for differences.

Note that this is in the currently default "flint" backend, but the
Xapian folks are probably more interested in fixing the in-development
"chert" backend. So the patch to get upstreamed there will probably also
fix:

        xapian-core/backends/chert/chert_database.cc

(I'm hoping the fix will be the same---an identical comment exists
there.)

Also, if you want to experiment with the chert backend, compile current
Xapian source and run notmuch with XAPIAN_PREFER_CHERT=1. I haven't
tried that yet, but there are claims that a chert database can be 40%
smaller than an equivalent flint database.

> 2. I have 'notmuch new' in an offlineimap postsync hook, but
> notmuch-search-refresh-view occasionally complains that another process
> has the lock (since I might press '=' when 'notmuch new' is running).
> Waiting a moment and trying again works fine, but it would be nice to
> clean this up eventually.

Chris Wilson just contributed a patch to enable read-only usage of
notmuch while another notmuch process holds the write lock. This should
be very nice, (and means that new users will be able to start playing
with notmuch even while the initial index creation is happening).

> 3. I had initially put 'notmuch new' in a cron job (instead of
> offlineimap postsync hook) and new/search would sometimes complain about
> missing files in the maildir.  The first time this happened, it did not
> correct itself and I ended up reimporting the database (I had moved some
> things around so I could have been at fault).  Since then I have seen
> these errors on a couple occasions, but they always go away upon
> rerunning 'notmuch new'.  My guess is that it has to do with offlineimap
> changing flags so I moved 'notmuch new' to the postsync hook and have
> not seen the errors since.  If it is important that notmuch never runs
> concurrently with an offlineimap sync, it should eventually go in the
> docs.

Thanks for the pointer.

Does offlineimap use tmp while it's delivering message and then move
things to new? If so, then maybe all we need to do to fix notmuch to not
look into tmp directories?

-Carl





Karl Wiberg-2 Karl Wiberg-2
Reply | Threaded
Open this post in threaded view
|

Re: Asynchronous tagging

On Sat, Nov 21, 2009 at 9:01 PM, Carl Worth <[hidden email]> wrote:

> Does offlineimap use tmp while it's delivering message and then move
> things to new? If so, then maybe all we need to do to fix notmuch to not
> look into tmp directories?

That's probably the right thing to do regardless---IIRC, the tmp
directory exists so that processes can put messages there while they
are writing them, and then do an atomic rename to the new (or cur)
directory.

--
Karl Wiberg, [hidden email]
   subrabbit.wordpress.com
   www.treskal.com/kalle

Jed Brown Jed Brown
Reply | Threaded
Open this post in threaded view
|

Re: Asynchronous tagging

In reply to this post by Carl Worth-2
On Sat, 21 Nov 2009 21:01:20 +0100, Carl Worth <[hidden email]> wrote:
> Yes, this is a known bug in Xapian (it rewrites all of the indexed terms
> for the email message even though you're just trying to add/remove one
> term). The Xapian ticket for this is here:
>
> replace_document should make minimal changes to database file
> http://trac.xapian.org/ticket/250

This bug report is concerned that it could require an API change, it
sounds like you think this is unnecessary.  Thanks for the detailed
explanation.

> Chris Wilson just contributed a patch to enable read-only usage of
> notmuch while another notmuch process holds the write lock.

I'm running it.

> Does offlineimap use tmp while it's delivering message and then move
> things to new? If so, then maybe all we need to do to fix notmuch to not
> look into tmp directories?

Yes, that's how maildir is supposed to work.  Deliver to tmp, hard link
from new, then unlink in tmp.  The client should never look in tmp.
Should be very quick to fix in notmuch.

Jed

Jan Janak-2 Jan Janak-2
Reply | Threaded
Open this post in threaded view
|

Re: Asynchronous tagging

In reply to this post by Carl Worth-2
On Sat, Nov 21, 2009 at 9:01 PM, Carl Worth <[hidden email]> wrote:

>> 3. I had initially put 'notmuch new' in a cron job (instead of
>> offlineimap postsync hook) and new/search would sometimes complain about
>> missing files in the maildir.  The first time this happened, it did not
>> correct itself and I ended up reimporting the database (I had moved some
>> things around so I could have been at fault).  Since then I have seen
>> these errors on a couple occasions, but they always go away upon
>> rerunning 'notmuch new'.  My guess is that it has to do with offlineimap
>> changing flags so I moved 'notmuch new' to the postsync hook and have
>> not seen the errors since.  If it is important that notmuch never runs
>> concurrently with an offlineimap sync, it should eventually go in the
>> docs.
>
> Thanks for the pointer.
>
> Does offlineimap use tmp while it's delivering message and then move
> things to new? If so, then maybe all we need to do to fix notmuch to not
> look into tmp directories?

Yes, it does. I think all delivery agents work this way, IIRC the
reason why messages are first written in tmp and then moved to new is
to make sure that clients do not see partially written messages.
Maildir has been designed to be lock-less so this is needed.

I get errors about missing files too. There are several reasons why
that can happen:

 1) A message is moved from one folder to another in other mail
clients that work with
    the Maildir spool.

 2) A client changes the flags on a message, for example, when you
read a message or
    mark it as deleted. Maildir stores flags in filenames.

 3) Message flags are updated on the IMAP server (for example when you
mark a message
    as read in gmail). Offlineimap keeps message flags synchronized.
If you mark a
    local message as read then the change is propagated to the IMAP
server and vice
    versa.

 -- Jan

Carl Worth-2 Carl Worth-2
Reply | Threaded
Open this post in threaded view
|

Re: Asynchronous tagging

In reply to this post by Jed Brown
On Sat, 21 Nov 2009 20:45:30 +0100, Jed Brown <[hidden email]> wrote:
> Actually, this popped up again.  I have a workaround, but here's the
> story if you are interested.

Hmmm... we definitely want to fix this, so let's figure this out.

> After changing a flag in Gmail and syncing with offlineimap, I get this
> in my inbox
>
>  Today 19:18 [1/2] (null)                                   (null) (inbox unread)
>
> And when I try to open it, the buffer is full of stderr.
>
>   Error opening /home/jed/.mail-archive/gmail-all/new/1258826583_1.20705.kunyang,U=174235,FMD5=844bb96d088d057aa1b32ac1fbc67b56:2,: No such file or directory

Ah, OK. So you made a change on the Gmail side and that caused a file to
be renamed locally.

And yes, this currently makes notmuch very confused. That's a known
issue that needs to be documented better. And even better needs to be
fixed, (I just added a note for this to TODO).

> Explicitly archiving the null message removes it from these queries so
> the clutter is gone now, but it has to be done manually because the null
> message doesn't match any search terms.

Manually? All tag manipulation is done by search terms, so there's no
other way to remove a tag.

Or did you mean you removed the tag from within emacs? In that case, the
search term used to find the message is the message id itself. (Try
running "M-x visible-mode" from a notmuch-search view in emacs to see
what those look like.)

Meanwhile, just archiving the message won't make things perfect for
you. The document in the database point to the broken file is still
there. And it should still have all of its terms, so will likely show up
if you do more searches. (The "(null)" stuff you're seeing isn't because
the message is NULL---for example, notmuch was able to find the date,
etc. It's just that notmuch couldn't find the subject and authors when
it went to look for the file.)

So if GMail+offlineimap continues to shuffle your files around, you're
going to keep seeing more and more confusion like this buildup.

So we really just need to teach notmuch how to handle an unstable file
store in order to be able to use it in this kind of setup.

-Carl

Jed Brown Jed Brown
Reply | Threaded
Open this post in threaded view
|

Re: Asynchronous tagging

In reply to this post by Jan Janak-2
On Sat, 21 Nov 2009 21:50:10 +0100, Jan Janak <[hidden email]> wrote:
> I get errors about missing files too. There are several reasons why
> that can happen:
>
>  1) A message is moved from one folder to another in other mail
> clients that work with the Maildir spool.

Not a problem in my case because I currently have everything in one big
maildir (100k in one directory is a lot, but not too painful at 0.3s for
ls and 2s to stat everything).

>  2) A client changes the flags on a message, for example, when you
> read a message or mark it as deleted. Maildir stores flags in
> filenames.

This seems like a problem.  I'm not familiar with xapian, is it
necessarily an expensive operation to correct these inconsistencies?
Matching by thread id ought to be cheap.

>  3) Message flags are updated on the IMAP server (for example when you
> mark a message as read in gmail). Offlineimap keeps message flags
> synchronized.  If you mark a local message as read then the change is
> propagated to the IMAP server and vice versa.

Do you know if Offlineimap (or some similar tool) can be told not to
bother keeping flags synchronized?

Jed

Jan Janak-2 Jan Janak-2
Reply | Threaded
Open this post in threaded view
|

Re: Asynchronous tagging

On Sat, Nov 21, 2009 at 10:04 PM, Jed Brown <[hidden email]> wrote:
>>  3) Message flags are updated on the IMAP server (for example when you
>> mark a message as read in gmail). Offlineimap keeps message flags
>> synchronized.  If you mark a local message as read then the change is
>> propagated to the IMAP server and vice versa.
>
> Do you know if Offlineimap (or some similar tool) can be told not to
> bother keeping flags synchronized?

Try using the cmdline option -q, from offlineimap's help:

-q  Run  only quick synchronizations.   Ignore any flag updates on IMAP servers.

This kinda works, but even with this option I am still seeing missing
files if I work with my inbox in gmail. AFAIK there is currently no
easy way to prevent that.

  -- Jan

Jed Brown Jed Brown
Reply | Threaded
Open this post in threaded view
|

Re: Asynchronous tagging

In reply to this post by Carl Worth-2
On Sat, 21 Nov 2009 22:00:13 +0100, Carl Worth <[hidden email]> wrote:
> Ah, OK. So you made a change on the Gmail side and that caused a file to
> be renamed locally.

yes

> Or did you mean you removed the tag from within emacs? In that case, the
> search term used to find the message is the message id itself. (Try
> running "M-x visible-mode" from a notmuch-search view in emacs to see
> what those look like.)

Exactly, that's what I meant by manually.  Those messages don't match a
nice generic pattern.

> Meanwhile, just archiving the message won't make things perfect for
> you. The document in the database point to the broken file is still
> there. And it should still have all of its terms, so will likely show up
> if you do more searches. (The "(null)" stuff you're seeing isn't because
> the message is NULL---for example, notmuch was able to find the date,
> etc. It's just that notmuch couldn't find the subject and authors when
> it went to look for the file.)

Yeah.

> So if GMail+offlineimap continues to shuffle your files around, you're
> going to keep seeing more and more confusion like this buildup.
>
> So we really just need to teach notmuch how to handle an unstable file
> store in order to be able to use it in this kind of setup.

This seems unavoidable with maildir in the presence of any
synchronization, or use of a different client.

An ugly, but possible solution would be to mirror the entire maildir via
hard links with whatever naming scheme you like.  You then have a stable
link to the file and can resolve changing names in the real maildir.
This eats up a lot of inodes.

Jed

Carl Worth-2 Carl Worth-2
Reply | Threaded
Open this post in threaded view
|

Re: Asynchronous tagging

In reply to this post by Jed Brown
On Sat, 21 Nov 2009 22:04:50 +0100, Jed Brown <[hidden email]> wrote:
> >  2) A client changes the flags on a message, for example, when you
> > read a message or mark it as deleted. Maildir stores flags in
> > filenames.
>
> This seems like a problem.  I'm not familiar with xapian, is it
> necessarily an expensive operation to correct these inconsistencies?

There's not really anything Xapian-specific here. It should be a
relatively easy change to make notmuch do the right thing here. It just
happens that the original author/user of notmuch isn't using anything
that changes his filenames---so I hadn't noticed. :-)

> Matching by thread id ought to be cheap.

Naturally. And that's of course exactly what notmuch does. So in my
usage, the only time "notmuch new" sees a Message-ID that it has seen
before, is when it encounters a duplicate copy of a message. So the code
currently just ignores it.

Mikhail wrote a patch:

        [hidden email]

that does the simple thing in this case of just noticing whether the old
filename has since been removed, and in this case updating the document
to the new filename.

The problem he ran into is that renames aren't updating mtimes and the
current "notmuch new" has an optimization to not even look at files
unless their mtime is newer than the mtime last seen for the directory
they are in.

So some investigation is needed to see how important that optimization
is, and if it's important to see whether there's another way to keep the
performance while being able to support renames. (Or alternately,
allowing the user to configure an option saying, "I need to support
renames even if that means that notmuch new is a bit slower.").

-Carl

Keith Packard Keith Packard
Reply | Threaded
Open this post in threaded view
|

Re: Asynchronous tagging

On Sat, 21 Nov 2009 23:46:44 +0100, Carl Worth <[hidden email]> wrote:

> So some investigation is needed to see how important that optimization
> is, and if it's important to see whether there's another way to keep the
> performance while being able to support renames. (Or alternately,
> allowing the user to configure an option saying, "I need to support
> renames even if that means that notmuch new is a bit slower.").

I'd suggest that the best way to make this more efficient would be to
capture directory contents (along with the directory mtime) and use that
to detect changes. If we assume that mail messages are never changed, we
could use that to avoid stat'ing files in directories too.

--
[hidden email]

attachment0 (189 bytes) Download Attachment