Xapian commits unexpectedly slow

classic Classic list List threaded Threaded
3 messages Options
Matthew Schauer Matthew Schauer
Reply | Threaded
Open this post in threaded view
|

Xapian commits unexpectedly slow

Greetings,

I've been trying to migrate about 25K e-mails to Notmuch, and I'm seeing
some frustrating performance characteristics that don't seem to match
with the experience others report.  I'm dumping messages from
Thunderbird in batches and then running `notmuch new` to add each batch
to the database.  The indexing performance remains okay, at more than
200 per second, but after Notmuch has reported it's finished indexing,
it hangs for as much as several minutes before exiting.  A stack trace
confirms that this is Xapian committing the database, with most of the
time seemingly spent in `fdatasync`.  The time spent grows with the size
of the database, not the number of e-mails being imported, which means
this will remain a problem during day-to-day usage.

Has nobody else had a problem like this?  Is my setup just weird?  I'm
using Notmuch 0.29.3 from the Arch community repository, with Xapian
1.4.14 also from Arch repositories.  I am using a spinning-platter hard
disk, but I find it hard to believe that an SSD is required to get
Xapian to perform well at this scale.

Please let me know if you have any performance pointers or can help me
investigate this further.  Many thanks!


Matthew Schauer
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: Xapian commits unexpectedly slow

Matthew Schauer <[hidden email]> writes:

> Greetings,
>
> I've been trying to migrate about 25K e-mails to Notmuch, and I'm seeing
> some frustrating performance characteristics that don't seem to match
> with the experience others report.

25,000 messages should really not be a strain, spinning rust or no.

> I'm dumping messages from
> Thunderbird in batches and then running `notmuch new` to add each batch
> to the database.  The indexing performance remains okay, at more than
> 200 per second, but after Notmuch has reported it's finished indexing,
> it hangs for as much as several minutes before exiting.  A stack trace
> confirms that this is Xapian committing the database, with most of the
> time seemingly spent in `fdatasync`.  The time spent grows with the size
> of the database, not the number of e-mails being imported, which means
> this will remain a problem during day-to-day usage.

It would be interesting if you could report the results of running the
notmuch performance test suite (under performance-test/ in the source).

The other thing I'm curious about is the actual size of the
database. This varies a lot, but in the past pathological performance
has sometimes been linked to indexing things that should not be,
bloating the database.

d
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Matthew Schauer Matthew Schauer
Reply | Threaded
Open this post in threaded view
|

Re: Xapian commits unexpectedly slow


On 12/28/19 5:21 PM, David Bremner wrote:

> Matthew Schauer <[hidden email]> writes:
>
>> Greetings,
>>
>> I've been trying to migrate about 25K e-mails to Notmuch, and I'm seeing
>> some frustrating performance characteristics that don't seem to match
>> with the experience others report.
>
> 25,000 messages should really not be a strain, spinning rust or no.
>
>> I'm dumping messages from
>> Thunderbird in batches and then running `notmuch new` to add each batch
>> to the database.  The indexing performance remains okay, at more than
>> 200 per second, but after Notmuch has reported it's finished indexing,
>> it hangs for as much as several minutes before exiting.  A stack trace
>> confirms that this is Xapian committing the database, with most of the
>> time seemingly spent in `fdatasync`.  The time spent grows with the size
>> of the database, not the number of e-mails being imported, which means
>> this will remain a problem during day-to-day usage.
>
> It would be interesting if you could report the results of running the
> notmuch performance test suite (under performance-test/ in the source).

Nifty!  Here are the results -- I assume you know how to interpret them
better than I do:

T00-new.sh: Testing notmuch new                         [0.4 large]
                        Wall(s) Usr(s)  Sys(s)  Res(K)  In/Out(512B)
  Initial notmuch new   1163.05 854.26  45.97   444304  2343120/13645200
  notmuch new #2        2.23    0.02    0.03    9384    2144/8
  notmuch new #3        0.01    0.01    0.00    9460    0/8
  notmuch new #4        0.01    0.01    0.00    9428    0/8
  notmuch new #5        0.01    0.00    0.00    9468    0/8
  notmuch new #6        0.01    0.01    0.00    9692    0/8
  new (52374 mv)        1351.01 537.75  235.45  959524  1027288/8531616
  new (52374 mv back)   834.15  489.27  213.97  967040  184/4754016
  new (52374 cp)        747.23  284.03  105.51  941992  0/4007120

T01-dump-restore.sh: Testing dump and restore           [0.4 large]
                        Wall(s) Usr(s)  Sys(s)  Res(K)  In/Out(512B)
  load nmbug tags       32.64   4.16    3.97    12744   776/38968
  dump *                5.02    4.81    0.18    26256   8/27928
  restore *             5.94    5.43    0.48    9728    0/0

T02-tag.sh: Testing tagging                             [0.4 large]
                        Wall(s) Usr(s)  Sys(s)  Res(K)  In/Out(512B)
  tag * +new_tag        611.53  305.67  229.54  111304  0/1840208
  tag * +existing_tag   0.05    0.01    0.00    9340    96/0
  tag * -existing_tag   513.58  242.77  215.88  36252   0/1937792
  tag * -missing_tag    0.02    0.00    0.01    9332    0/0

T03-reindex.sh: Testing reindexing                      [0.4 large]
                        Wall(s) Usr(s)  Sys(s)  Res(K)  In/Out(512B)
  reindex *             1893.02 590.11  150.22  392180  7572912/4620792
  reindex *             853.85  440.58  115.60  337320  3072648/2512376
  reindex *             629.36  415.50  107.50  337188  1501448/2507848

T04-thread-subquery.sh: Testing thread subqueries       [0.4 large]
                        Wall(s) Usr(s)  Sys(s)  Res(K)  In/Out(512B)
  search thread:{} ...  28.38   8.25    1.49    94304   278064/144
  search thread:{} ...  11.25   5.26    0.63    94300   81520/144
  search thread:{} ...  3.24    2.94    0.29    94284   0/144

> The other thing I'm curious about is the actual size of the
> database. This varies a lot, but in the past pathological performance
> has sometimes been linked to indexing things that should not be,
> bloating the database.

Here are some relevant lines from my import process, showing how long
Notmuch thinks it's taking, how long it's actually taking, and the size
of the database after each import.  The sizes seem reasonable to me.
For comparison, the maildir itself is 4.5 GiB after all of the imports.

Processed 1057 total files in 1s (759 files/sec.).
notmuch new  1.48s user 0.08s system 52% cpu 2.982 total
6.0M    .notmuch

Processed 1669 total files in 3s (438 files/sec.).
notmuch new  3.95s user 0.19s system 63% cpu 6.516 total
27M     .notmuch

Processed 3338 total files in 9s (359 files/sec.).
notmuch new  9.73s user 0.44s system 71% cpu 14.288 total
71M     .notmuch

Processed 7547 total files in 24s (304 files/sec.).
notmuch new  23.82s user 0.97s system 83% cpu 29.521 total
167M    .notmuch

Processed 8224 total files in 39s (210 files/sec.).
notmuch new  35.72s user 1.96s system 52% cpu 1:12.08 total
334M    .notmuch

Processed 9530 total files in 39s (239 files/sec.).
notmuch new  35.10s user 1.88s system 74% cpu 49.630 total
519M    .notmuch

Processed 6029 total files in 46s (129 files/sec.).
notmuch new  26.42s user 1.88s system 24% cpu 1:54.35 total
641M    .notmuch

Processed 6387 total files in 38s (167 files/sec.).
notmuch new  24.29s user 1.69s system 13% cpu 3:10.35 total
706M    .notmuch

Processed 3113 total files in 10s (308 files/sec.).
notmuch new  10.65s user 0.82s system 6% cpu 2:53.25 total
725M    .notmuch

Processed 410 total files in 1s (344 files/sec.).
notmuch new  1.21s user 0.32s system 1% cpu 1:57.68 total
726M    .notmuch

Processed 317 total files in almost no time.
notmuch new  1.09s user 0.34s system 1% cpu 2:22.09 total
727M    .notmuch
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch