consistent database corruption with notmuch new

classic Classic list List threaded Threaded
31 messages Options
12
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: out of memory on idle machine

David Bremner <[hidden email]> writes:

> Gregor Zattler <[hidden email]> writes:

> I don't have any /cur directories in my version. I do have a few (3 or 4) /tmp
> directories that are apparently not indexed. That's a bit mysterious,
> but nothing on the scale of what you are seeing.

Not so mysterious as it turns out. The tmp dirs in question had
filenames duplicated elsewhere in maildir (a violation of the maildir
spec), and where ignored by notmuch because they were in tmp/.  It
doesn't seem like either of these issues is relevant to your situation.

As a kind of desperation move, you could try bisecting your mailstore,
to see how small of a set of messages you can duplicate the problem
with.

d
_______________________________________________
notmuch mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
Olly Betts Olly Betts
Reply | Threaded
Open this post in threaded view
|

Re: out of memory on idle machine

In reply to this post by David Bremner-2
On Thu, Feb 11, 2021 at 06:53:27AM -0400, David Bremner wrote:
> At this point I don't really have any good ideas, so I'm waiting for
> results from the 1.4.18 trial.

I've uploaded a backport, but it's the first backport of xapian-core to
buster so it'll need manual approval.  Hopefully that'll happen over the
weekend, but it could take longer.

Cheers,
    Olly
_______________________________________________
notmuch mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
Gregor Zattler Gregor Zattler
Reply | Threaded
Open this post in threaded view
|

Re: out of memory on idle machine

In reply to this post by Olly Betts
Hi Olly, notmuch and xapian developers,

sorry for late answer, I had problems with the test system:
* Olly Betts <[hidden email]> [09. Feb. 2021]:
> On Wed, Feb 03, 2021 at 07:59:43AM -0400, David Bremner wrote:
>> Gregor Zattler <[hidden email]> writes:
>>> A Xapian exception occurred finding message: Db block overwritten - are there multiple writers?.
>>
>> I have included the Xapian list in copy in case that message rings a
>> bell.
>
> There was a bug fixed in 1.4.7 which incorrectly resulted in this error
> message, but it seems from the quoted text you're using 1.4.11.

yes.

>> I guess you know there are not multiple writers in your setup.
>
> There's a lock file locked by fcntl() which protects against multiple
> writers, so someone/something would have need to have deleted that
> behind Xapian's back, or else a bug somewhere in the locking code stack.

There is no other writer.  The system is used only for this
test atm, the emails are on a dedicated data partition.

> (Aside from that bug, probably the most common case here over time has
> been that someone deleted the lock file thinking it's "stale", but it's
> not the mere presence of the file that means the lock is held.  It's
> not at all frequent, but perhaps we should adjust this message to better
> reflect that.)
>
> Have you tried xapian-check on this database?

not this time.  Yust wait.  I have different databases from
different runs of notmuch new, "xapian-3" being identical
with "xapian-3"


grfz@mic:~/Mail/.notmuch$ for name in xapian  xapian-3 xapian-2 xapian-1 ; do echo "===== $name"; xapian-check $name ; done > /tmp/xapian-checks 2>&1

results in:

===== xapian
docdata:
blocksize=8K items=577 firstunused=10 revision=4 levels=1 root=6
B-tree checked okay
docdata table structure checked OK

termlist:
blocksize=8K items=68006 firstunused=20302 revision=4 levels=2 root=687
xapian-check: DatabaseError: Block 17959: Used block also in freelist
===== xapian-3
docdata:
blocksize=8K items=577 firstunused=10 revision=4 levels=1 root=6
B-tree checked okay
docdata table structure checked OK

termlist:
blocksize=8K items=68006 firstunused=20302 revision=4 levels=2 root=687
xapian-check: DatabaseError: Block 17959: Used block also in freelist
===== xapian-2
docdata:
blocksize=8K items=577 firstunused=10 revision=3 levels=1 root=6
B-tree checked okay
docdata table structure checked OK

termlist:
blocksize=8K items=68006 firstunused=20302 revision=3 levels=2 root=8651
B-tree checked okay
termlist table structure checked OK

postlist:
blocksize=8K items=1971726 firstunused=25177 revision=3 levels=2 root=6651
B-tree checked okay
postlist table structure checked OK

position:
blocksize=8K items=9589984 firstunused=52204 revision=3 levels=2 root=18102
B-tree checked okay
position table structure checked OK

spelling:
Lazily created, and not yet used.

synonym:
Lazily created, and not yet used.

No errors found
===== xapian-1
docdata:
blocksize=8K items=371 firstunused=4 revision=2 levels=1 root=2
B-tree checked okay
docdata table structure checked OK

termlist:
blocksize=8K items=32426 firstunused=8650 revision=2 levels=2 root=687
B-tree checked okay
termlist table structure checked OK

postlist:
blocksize=8K items=562665 firstunused=6650 revision=2 levels=2 root=380
B-tree checked okay
postlist table structure checked OK

position:
blocksize=8K items=4359147 firstunused=18099 revision=2 levels=2 root=377
B-tree checked okay
position table structure checked OK

spelling:
Lazily created, and not yet used.

synonym:
Lazily created, and not yet used.

No errors found



There are no problems reported on earlier databases although
they resulted in reindexing of almost all of the emails and
in case of the corrupted xapian database not all file are
checked.


This is not fixable:
grfz@mic:~/Mail/.notmuch$ cp -a xapian xapian-checked
grfz@mic:~/Mail/.notmuch$ xapian-check xapian-checked F
docdata:
B-tree checked okay
docdata table structure checked OK

termlist:
xapian-check: DatabaseError: Block 17959: Used block also in freelist

grfz@mic:~/Mail/.notmuch$ xapian-check xapian-checked
docdata:
blocksize=8K items=577 firstunused=10 revision=4 levels=1 root=6
B-tree checked okay
docdata table structure checked OK

termlist:
blocksize=8K items=68006 firstunused=20302 revision=4 levels=2 root=687
xapian-check: DatabaseError: Block 17959: Used block also in freelist


>> Olly Betts mentioned in a different thread that he will build a version
>> of xapian 1.4.18 for buster backports, so trying with that is probably a
>> good step when it is available.
>
> Yes - 1.4.18 packages are now in Debian testing, so hopefully I can get
> this done soon.

OK, I'll wait for that.

>> % xapian-delve -1 -A XDIRECTORY ~/Mail/.notmuch/xapian | sort -u > delve.txt
>
> FWIW, the output should be sorted and unique already (sorted by byte
> order, so equivalent to `LC_ALL=C sort`).

ok, thanks.

Ciao, Gregor
--
 -... --- .-. . -.. ..--.. ...-.-
_______________________________________________
notmuch mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
Gregor Zattler Gregor Zattler
Reply | Threaded
Open this post in threaded view
|

Re: out of memory on idle machine

In reply to this post by Olly Betts
Hi Olly, David, xapian and notmuch developers,
* Olly Betts <[hidden email]> [12. Feb. 2021]:
> On Thu, Feb 11, 2021 at 06:53:27AM -0400, David Bremner wrote:
>> At this point I don't really have any good ideas, so I'm waiting for
>> results from the 1.4.18 trial.
>
> I've uploaded a backport, but it's the first backport of xapian-core to
> buster so it'll need manual approval.  Hopefully that'll happen over the
> weekend, but it could take longer.

I installed version 1.4.18 and the errors are the same.  I
will take Davids advice and try to bisect my mail store.
This will take some time.  I'll report back.


Ciao, Gregor
--
 -... --- .-. . -.. ..--.. ...-.-
_______________________________________________
notmuch mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
Gregor Zattler Gregor Zattler
Reply | Threaded
Open this post in threaded view
|

bug: chokes on long directory names (was: Re: out of memory on idle machine)

In reply to this post by David Bremner-2
Hi David, Olly, notmuch and xapian developers,
* David Bremner <[hidden email]> [11. Feb. 2021]:
> David Bremner <[hidden email]> writes:
> As a kind of desperation move, you could try bisecting your mailstore,
> to see how small of a set of messages you can duplicate the problem
> with.

this I did, somehow.  I found the culprit: It's a maildir
with one single mail in it.  The name of the maildir is
exceptionally long [because generated from a List-Id:
-Header] and the mail arrived at the very day, my notmuch
database corrupted.  This maildir alone provokes that every
next notmuch new will rescan all (?) files.

Then I tried to only index this maildir, it showed the same
strange re-indexing but even when running notmuch new for a
while in a loop (>1000 times), the database showed no
corruption.

When instead I shorten the name of the maildir to three
characters with the very same email file in it, nothing
happens, it indexes the file once and not again.

Then I prolonged the name of the file instead of the
directory and even with the longest possible filename (or
path?)

/home/grfz/Mail/nuk/new/1607641473.31514_2.no1607641473.31514_2.no1607641473.31514_2.no1607641473.31514_2.no1607641473.31514_2.no1607641473.31514_2.no1607641473.31514_2.no1607641473.31514_2.no1607641473.31514_2.no1607641473.31514_2.no1607641473.31514_2.no16076414734160.14_2.no

notmuch has no problem indexing this and not to reindex it
in the next run.


So notmuch or xapian (I don't know) chokes on extreme long
directory names.  I consider this to be a bug.



My scripts create this long names from List-Id and some
such.  The one which triggered the problems is from an online
shop:

[hidden email]/

Since, as I tested, this can be reproduced with the simplest
of email in a maildir with an extremly long name, I do not
attach the maildir in question.  But if anyone wants it I
can send it.



I then had a look at other long directory names and there is
another one which also triggers the problem, it also has
only one email in it and arrived on 12th of January:

[hidden email]


Since I removed both on my laptop, notmuch new works again,
yeah!  Now I will have a look on my .procmailrc.

Thanks for your attention, thanks for notmuch and for xapian,
Grgeor

--
 -... --- .-. . -.. ..--.. ...-.-
_______________________________________________
notmuch mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

[PATCH] test: add known broken test for long directory bug

In [1] Gregor Zattler explained the results of his hard working
tracking down a bug in notmuch with long directories. This test
duplicates the bug.

[1]: id:[hidden email]
---
 test/T050-new.sh | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/test/T050-new.sh b/test/T050-new.sh
index 76bda959..f84dc2b0 100755
--- a/test/T050-new.sh
+++ b/test/T050-new.sh
@@ -339,6 +339,20 @@ test_expect_code 1 "NOTMUCH_NEW --debug 2>&1"
 
 notmuch config set new.tags $OLDCONFIG
 
+test_begin_subtest "Long directory names don't cause rescan"
+test_subtest_known_broken
+name=$(printf 'z%.0s' {1..234})
+generate_message [dir]=$name
+NOTMUCH_NEW  > OUTPUT
+notmuch new  >> OUTPUT
+rm -r ${MAIL_DIR}/${name}
+notmuch new >> OUTPUT
+cat <<EOF > EXPECTED
+Added 1 new message to the database.
+No new mail.
+No new mail. Removed 1 message.
+EOF
+test_expect_equal_file EXPECTED OUTPUT
 
 test_begin_subtest "Xapian exception: read only files"
 chmod u-w ${MAIL_DIR}/.notmuch/xapian/*.*
--
2.30.2
_______________________________________________
notmuch mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: bug: chokes on long directory names (was: Re: out of memory on idle machine)

In reply to this post by Gregor Zattler
Gregor Zattler <[hidden email]> writes:

> Hi David, Olly, notmuch and xapian developers,
> * David Bremner <[hidden email]> [11. Feb. 2021]:
>> David Bremner <[hidden email]> writes:
>> As a kind of desperation move, you could try bisecting your mailstore,
>> to see how small of a set of messages you can duplicate the problem
>> with.
>
> this I did, somehow.  I found the culprit: It's a maildir
> with one single mail in it.  The name of the maildir is
> exceptionally long [because generated from a List-Id:
> -Header] and the mail arrived at the very day, my notmuch
> database corrupted.  This maildir alone provokes that every
> next notmuch new will rescan all (?) files.

Hi Gregor;

I am very impressed with your persistence. I suspect it is a bug in
notmuch. I don't know all the details yet, but in the normal case the
directory name is added to the database prefixed with XDIRECTORY. I
noticed this isn't happening in the case of directories 234 or
longer. That is roughly the Xapian term limit of 245 characters in
total. I'm not sure why the discrepency of one character, but the main
point is that notmuch is probably improperly ignoring an error from
Xapian when adding these overlong terms.

Thanks again for the debugging, I suspect would have never found this
bug on my own.

David
_______________________________________________
notmuch mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
Tomi Ollila-2 Tomi Ollila-2
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] test: add known broken test for long directory bug

In reply to this post by David Bremner-2
On Wed, Mar 17 2021, David Bremner wrote:

> In [1] Gregor Zattler explained the results of his hard working
> tracking down a bug in notmuch with long directories. This test
> duplicates the bug.
>
> [1]: id:[hidden email]
> ---
>  test/T050-new.sh | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
>
> diff --git a/test/T050-new.sh b/test/T050-new.sh
> index 76bda959..f84dc2b0 100755
> --- a/test/T050-new.sh
> +++ b/test/T050-new.sh
> @@ -339,6 +339,20 @@ test_expect_code 1 "NOTMUCH_NEW --debug 2>&1"
>  
>  notmuch config set new.tags $OLDCONFIG
>  
> +test_begin_subtest "Long directory names don't cause rescan"
> +test_subtest_known_broken
> +name=$(printf 'z%.0s' {1..234})

could do printf -v name 'z%.0s' {1..234}

> +generate_message [dir]=$name
> +NOTMUCH_NEW  > OUTPUT
> +notmuch new  >> OUTPUT

2 spaces in lines above

apart from those 2 spaces lgtm.

Tomi

> +rm -r ${MAIL_DIR}/${name}
> +notmuch new >> OUTPUT
> +cat <<EOF > EXPECTED
> +Added 1 new message to the database.
> +No new mail.
> +No new mail. Removed 1 message.
> +EOF
> +test_expect_equal_file EXPECTED OUTPUT
>  
>  test_begin_subtest "Xapian exception: read only files"
>  chmod u-w ${MAIL_DIR}/.notmuch/xapian/*.*
> --
> 2.30.2
_______________________________________________
notmuch mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] test: add known broken test for long directory bug

Tomi Ollila <[hidden email]> writes:

> On Wed, Mar 17 2021, David Bremner wrote:
>
> could do printf -v name 'z%.0s' {1..234}
>
>> +generate_message [dir]=$name
>> +NOTMUCH_NEW  > OUTPUT
>> +notmuch new  >> OUTPUT
>
> 2 spaces in lines above
>
> apart from those 2 spaces lgtm.
>
> Tomi

Thanks. Pushed with those changes.

d
_______________________________________________
notmuch mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

[PATCH] lib/n_d_index_file: check return value from _n_m_add_filename

In reply to this post by David Bremner-2
Ignoring this return value seems like a bad idea in general, and in
particular it has been hiding one or more bugs related to handling
long directory names.
---

This is not a fix for the aforementioned bugs, but it at least makes
clear part of the problem.  The XDDIRENTRYnnnnn: terms are not checked
for length in the same way as XDIRECTORY terms. It isn't clear the
same hashing strategy will work, as the XDDIRECTORY terms are used to
create lists of child directories. It may be the best we can do is
enforce a limit on the length of path elements in trees indexed by
notmuch.

 lib/add-message.cc | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/add-message.cc b/lib/add-message.cc
index 485debad..0c34d318 100644
--- a/lib/add-message.cc
+++ b/lib/add-message.cc
@@ -529,7 +529,9 @@ notmuch_database_index_file (notmuch_database_t *notmuch,
     goto DONE;
  }
 
- _notmuch_message_add_filename (message, filename);
+ ret = _notmuch_message_add_filename (message, filename);
+ if (ret)
+    goto DONE;
 
  if (is_new || is_ghost) {
     _notmuch_message_add_term (message, "type", "mail");
--
2.30.2
_______________________________________________
notmuch mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] lib/n_d_index_file: check return value from _n_m_add_filename

David Bremner <[hidden email]> writes:

> Ignoring this return value seems like a bad idea in general, and in
> particular it has been hiding one or more bugs related to handling
> long directory names.
> ---
>
> This is not a fix for the aforementioned bugs, but it at least makes
> clear part of the problem.  The XDDIRENTRYnnnnn: terms are not checked
> for length in the same way as XDIRECTORY terms. It isn't clear the
> same hashing strategy will work, as the XDDIRECTORY terms are used to
> create lists of child directories. It may be the best we can do is
> enforce a limit on the length of path elements in trees indexed by
> notmuch.

Applied to master.

d
_______________________________________________
notmuch mailing list -- [hidden email]
To unsubscribe send an email to [hidden email]
12