talloc_abort in notmuch_thread_get_tags () when db has been modified

classic Classic list List threaded Threaded
13 messages Options
Gaute Hope Gaute Hope
Reply | Threaded
Open this post in threaded view
|

talloc_abort in notmuch_thread_get_tags () when db has been modified

Hi,

a user of astroid [0] ran into a issue [1] (full trace at issue) where
reading a long query causes a talloc_abort in notmuch_thread_get_tags
(). 'notmuch new' is running at the same time, and most likely a thread
in the query has been modified since the query was done. Note that a
notmuch_thread_get_authors () call returns NULL without causing a full
crash. The code causing the crash is:

```
    for (tags = notmuch_thread_get_tags (nm_thread);
         notmuch_tags_valid (tags);
         notmuch_tags_move_to_next (tags))
    {
      tag = notmuch_tags_get (tags); // tag belongs to tags
    }

    // or db.cc:508 in astroid/src.
```

while:

```
    const char * auths = notmuch_thread_get_authors (nm_thread);
```

returns `NULL`, but does not crash.

Is there a way for me to handle this from the application side?
Admittedly I do keep query objects around for a while
(astroid/src/thread_index.cc:141), but in this case the issue would
probably occur anyway since it simply takes a long time to read the
query.

Regards, Gaute

[0] https://github.com/gauteh/astroid
[1] https://github.com/gauteh/astroid/issues/64
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: talloc_abort in notmuch_thread_get_tags () when db has been modified

Gaute Hope <[hidden email]> writes:

> Hi,
>
> a user of astroid [0] ran into a issue [1] (full trace at issue) where
> reading a long query causes a talloc_abort in notmuch_thread_get_tags
> (). 'notmuch new' is running at the same time, and most likely a thread
> in the query has been modified since the query was done. Note that a
> notmuch_thread_get_authors () call returns NULL without causing a full
> crash. The code causing the crash is:
>
> ```
>     for (tags = notmuch_thread_get_tags (nm_thread);
>          notmuch_tags_valid (tags);
>          notmuch_tags_move_to_next (tags))
>     {
>       tag = notmuch_tags_get (tags); // tag belongs to tags
>     }
>
>     // or db.cc:508 in astroid/src.
> ```
>

The most likely cause of such a crash looks to me like nm_thread is NULL
or corrupted when passed in to get_tags. It's used without checking as a
talloc context, and that call to talloc never returns.
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Gaute Hope Gaute Hope
Reply | Threaded
Open this post in threaded view
|

Re: talloc_abort in notmuch_thread_get_tags () when db has been modified

In reply to this post by Gaute Hope
David Bremner writes on January 18, 2016 13:25:
> The most likely cause of such a crash looks to me like nm_thread is NULL
> or corrupted when passed in to get_tags. It's used without checking as a
> talloc context, and that call to talloc never returns.
>

Ok, I'll check some further. I am checking whether nm_thread is NULL
though, the preceding code is as follows
(astroid/src/modes/thread_index/thread_index.cc:258):

```
    for (;
         notmuch_threads_valid (threads);
         notmuch_threads_move_to_next (threads)) {

      notmuch_thread_t  * thread;
      thread = notmuch_threads_get (threads);

      if (thread == NULL) {
        log << error << "ti: error: could not get thread." << endl;
        throw database_error ("ti: could not get thread (is NULL)");
      }

      /* test for revision discarded */
      const char * ti = notmuch_thread_get_thread_id (thread);
      if (ti == NULL) {
        log << error << "ti: revision discarded, trying to reopen." << endl;
        reopen_tries++;
        refresh (all, current_thread + count, false);
        return;
      }


      NotmuchThread *t = new NotmuchThread (thread); // get_tags is inside here

      notmuch_thread_destroy (thread);

```

(note that there is a bit of code there trying to determine whether the
db is still valid, or needs to be re-opened)

- g
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Gaute Hope Gaute Hope
Reply | Threaded
Open this post in threaded view
|

Re: talloc_abort in notmuch_thread_get_tags () when db has been modified

Gaute Hope writes on January 18, 2016 13:45:
> David Bremner writes on January 18, 2016 13:25:
>> The most likely cause of such a crash looks to me like nm_thread is NULL
>> or corrupted when passed in to get_tags. It's used without checking as a
>> talloc context, and that call to talloc never returns.
>>
>
> Ok, I'll check some further. I am checking whether nm_thread is NULL
> though, [...]

Hi,

The stack trace that I get is as follows:

```
                Stack trace of thread 15719:
                #0  0x00007fc80cd9f2a8 raise (libc.so.6)
                #1  0x00007fc80cda072a abort (libc.so.6)
                #2  0x00007fc80c95889c n/a (libtalloc.so.2)
                #3  0x00007fc80c95a02d talloc_named_const (libtalloc.so.2)
                #4  0x00007fc814d674c5 _notmuch_string_list_create (libnotmuch.so.4)
                #5  0x00007fc814d75f32 notmuch_thread_get_tags (libnotmuch.so.4)
                #6  0x00000000004757cb _ZN7Astroid13NotmuchThread8get_tagsEP15_notmuch_thread (astroid)

```

this happens when:
1) start a long running query loading in the background
2) modify the db enough for the query to get invalidated.

as far as I can see, there is _no_ way to catch this error without
completely crashing the application. I would have to isolate this code
in a separate process or trap SIGABRT (which is certainly messy).

Best regards, Gaute

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: talloc_abort in notmuch_thread_get_tags () when db has been modified

Gaute Hope <[hidden email]> writes:

> as far as I can see, there is _no_ way to catch this error without
> completely crashing the application. I would have to isolate this code
> in a separate process or trap SIGABRT (which is certainly messy).

I'm not sure what you expect libnotmuch to do here. There's a fatal
"should not happen" error in the memory allocator; it isn't really the
sort of thing one can recover from. It's also not in code we control.

Of course _why_ this error is happening could still be notmuch's
fault. Can you reproduce the problem under valgrind?

d
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Gaute Hope Gaute Hope
Reply | Threaded
Open this post in threaded view
|

Re: talloc_abort in notmuch_thread_get_tags () when db has been modified

David Bremner writes on mars 7, 2016 13:01:

> Gaute Hope <[hidden email]> writes:
>
>> as far as I can see, there is _no_ way to catch this error without
>> completely crashing the application. I would have to isolate this code
>> in a separate process or trap SIGABRT (which is certainly messy).
>
> I'm not sure what you expect libnotmuch to do here. There's a fatal
> "should not happen" error in the memory allocator; it isn't really the
> sort of thing one can recover from. It's also not in code we control.
>
> Of course _why_ this error is happening could still be notmuch's
> fault. Can you reproduce the problem under valgrind?
Hi again,

For future reference: Attached is C++ test code that demonstrates the problem
(at least on my setup). It is part of the astroid test suite.

The test-code must be adapted to your _test_ notmuch db.

To pick up on this again, this issue started cropping up more frequently
again, and I can't see a way currently to anticipate or recover from
this from a user application of the notmuch library. There seems to be
an XapianError, which may or may not be handled by notmuch.

Regards, Gaute

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch

test_notmuch_standalone.cc (6K) Download Attachment
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: talloc_abort in notmuch_thread_get_tags () when db has been modified

Gaute Hope <[hidden email]> writes:

> David Bremner writes on mars 7, 2016 13:01:
>> Gaute Hope <[hidden email]> writes:
>>
>> Of course _why_ this error is happening could still be notmuch's
>> fault. Can you reproduce the problem under valgrind?
>

> Hi again,
>
> For future reference: Attached is C++ test code that demonstrates the problem
> (at least on my setup). It is part of the astroid test suite.
>

And did you try running this under valgrind?

> The test-code must be adapted to your _test_ notmuch db.
>
> To pick up on this again, this issue started cropping up more frequently
> again, and I can't see a way currently to anticipate or recover from
> this from a user application of the notmuch library. There seems to be
> an XapianError, which may or may not be handled by notmuch.

Previously you only reported a talloc error. Do you have a new stacktrace?
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Gaute Hope Gaute Hope
Reply | Threaded
Open this post in threaded view
|

Re: talloc_abort in notmuch_thread_get_tags () when db has been modified

David Bremner writes on februar 17, 2017 13:28:

> Gaute Hope <[hidden email]> writes:
>
>> David Bremner writes on mars 7, 2016 13:01:
>>> Gaute Hope <[hidden email]> writes:
>>>
>>> Of course _why_ this error is happening could still be notmuch's
>>> fault. Can you reproduce the problem under valgrind?
>>
>
>> Hi again,
>>
>> For future reference: Attached is C++ test code that demonstrates the problem
>> (at least on my setup). It is part of the astroid test suite.
>>
>
> And did you try running this under valgrind?
>

```
$ valgrind test/test_notmuch_standalone
==9543== Memcheck, a memory error detector
==9543== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==9543== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==9543== Command: test/test_notmuch_standalone
==9543==
db: running test query..
query: *, approx: 10 threads.
thread id to change: 0000000000000002, thread no: 3
restarting query..
moving to thread: 2
tags: unread
tags: inbox
continue loading..
threads != NULL
terminate called after throwing an instance of 'Xapian::DatabaseModifiedError'
==9543==
==9543== Process terminating with default action of signal 6 (SIGABRT): dumping core
==9543==    at 0xE46E04F: raise (in /usr/lib/libc-2.24.so)
==9543==    by 0xE46F479: abort (in /usr/lib/libc-2.24.so)
==9543==    by 0xD7494EC: __gnu_cxx::__verbose_terminate_handler() (vterminate.cc:95)
==9543==    by 0xD7472A5: __cxxabiv1::__terminate(void (*)()) (eh_terminate.cc:47)
==9543==    by 0xD7472F0: std::terminate() (eh_terminate.cc:57)
==9543==    by 0xD747507: __cxa_throw (eh_throw.cc:87)
==9543==    by 0xEEB987D: ??? (in /usr/lib/libxapian.so.30.2.0)
==9543==    by 0xEEBC4A7: ??? (in /usr/lib/libxapian.so.30.2.0)
==9543==    by 0xEEBEE14: ??? (in /usr/lib/libxapian.so.30.2.0)
==9543==    by 0xEEBF0B7: ??? (in /usr/lib/libxapian.so.30.2.0)
==9543==    by 0xEEBFF77: ??? (in /usr/lib/libxapian.so.30.2.0)
==9543==    by 0xEE9539A: ??? (in /usr/lib/libxapian.so.30.2.0)
==9543==
==9543== HEAP SUMMARY:
==9543==     in use at exit: 332,606 bytes in 1,171 blocks
==9543==   total heap usage: 28,503 allocs, 27,332 frees, 3,835,392 bytes allocated
==9543==
==9543== LEAK SUMMARY:
==9543==    definitely lost: 232 bytes in 1 blocks
==9543==    indirectly lost: 285 bytes in 2 blocks
==9543==      possibly lost: 8,577 bytes in 93 blocks
==9543==    still reachable: 323,432 bytes in 1,074 blocks
==9543==                       of which reachable via heuristic:
==9543==                         newarray           : 1,536 bytes in 16 blocks
==9543==         suppressed: 0 bytes in 0 blocks
==9543== Rerun with --leak-check=full to see details of leaked memory
==9543==
==9543== For counts of detected and suppressed errors, rerun with: -v
==9543== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Aborted (core dumped)
```

>> The test-code must be adapted to your _test_ notmuch db.
>>
>> To pick up on this again, this issue started cropping up more frequently
>> again, and I can't see a way currently to anticipate or recover from
>> this from a user application of the notmuch library. There seems to be
>> an XapianError, which may or may not be handled by notmuch.
>
> Previously you only reported a talloc error. Do you have a new stacktrace?
>

Yeah - unsure if it is the same.

```

(gdb) r
Starting program: /home/gaute/dev/mail/notm/astroid/test/test_notmuch_standalone
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
db: running test query..
query: *, approx: 10 threads.
thread id to change: 0000000000000002, thread no: 3
restarting query..
moving to thread: 2
tags: unread
tags: inbox
continue loading..
threads != NULL
terminate called after throwing an instance of 'Xapian::DatabaseModifiedError'

Program received signal SIGABRT, Aborted.
0x00007fffee46b04f in raise () from /usr/lib/libc.so.6
(gdb) bt
#0  0x00007fffee46b04f in raise () at /usr/lib/libc.so.6
#1  0x00007fffee46c47a in abort () at /usr/lib/libc.so.6
#2  0x00007fffef2624ed in __gnu_cxx::__verbose_terminate_handler() ()
    at /build/gcc-multilib/src/gcc/libstdc++-v3/libsupc++/vterminate.cc:95
#3  0x00007fffef2602a6 in __cxxabiv1::__terminate(void (*)()) (handler=<optimized out>)
    at /build/gcc-multilib/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:47
#4  0x00007fffef2602f1 in std::terminate() ()
    at /build/gcc-multilib/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:57
#5  0x00007fffef260508 in __cxxabiv1::__cxa_throw(void*, std::type_info*, void (*)(void*)) (obj=0xb0eff0, tinfo=0x7fffede100b8 <typeinfo for Xapian::DatabaseModifiedError>, dest=0x7fffedaa7e30 <Xapian::DatabaseModifiedError::~DatabaseModifiedError()>)
    at /build/gcc-multilib/src/gcc/libstdc++-v3/libsupc++/eh_throw.cc:87
#6  0x00007fffedac587e in  () at /usr/lib/libxapian.so.30
#7  0x00007fffedac84a8 in  () at /usr/lib/libxapian.so.30
#8  0x00007fffedacae15 in  () at /usr/lib/libxapian.so.30
#9  0x00007fffedacb0b8 in  () at /usr/lib/libxapian.so.30
#10 0x00007fffedacbf78 in  () at /usr/lib/libxapian.so.30
#11 0x00007fffedaa139b in  () at /usr/lib/libxapian.so.30
#12 0x00007fffeda55e3c in Xapian::Document::termlist_begin() const ()
    at /usr/lib/libxapian.so.30
#13 0x00007ffff716e11b in _notmuch_message_ensure_metadata(_notmuch_message*) (message=message@entry=0xb11430) at lib/message.cc:331
#14 0x00007ffff716e989 in notmuch_message_get_thread_id(notmuch_message_t*) (message=message@entry=0xb11430) at lib/message.cc:536
#15 0x00007ffff7174685 in _notmuch_thread_create(void*, notmuch_database_t*, unsigned int, notmuch_doc_id_set_t*, notmuch_string_list_t*, notmuch_exclude_t, notmuch_sort_t) (ctx=0xb0f270, notmuch=0xaf62c0, seed_doc_id=3, match_set=match_set@entry=0xb0ef28, exclude_terms=0xabb060, omit_excluded=NOTMUCH_EXCLUDE_TRUE, sort=NOTMUCH_SORT_NEWEST_FIRST) at lib/thread.cc:456
#16 0x00007ffff71715bd in notmuch_threads_get(notmuch_threads_t*) (threads=0xb0ef10) at lib/query.cc:532
#17 0x00000000005f85a5 in main() () at test/test_notmuch_standalone.cc:191
```
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: talloc_abort in notmuch_thread_get_tags () when db has been modified

Gaute Hope <[hidden email]> writes:

> threads != NULL
> terminate called after throwing an instance of 'Xapian::DatabaseModifiedError'

Yeah, that looks like a different problem. But it _should_ be something
we can catch in libnotmuch.

d
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Gaute Hope Gaute Hope
Reply | Threaded
Open this post in threaded view
|

Re: talloc_abort in notmuch_thread_get_tags () when db has been modified

David Bremner writes on februar 17, 2017 16:35:
> Gaute Hope <[hidden email]> writes:
>
>> threads != NULL
>> terminate called after throwing an instance of 'Xapian::DatabaseModifiedError'
>
> Yeah, that looks like a different problem. But it _should_ be something
> we can catch in libnotmuch.

For reference; I'm getting several reports of this or similar error:

  * https://github.com/astroidmail/astroid/issues/414
  * https://github.com/astroidmail/astroid/blob/master/tests/test_notmuch_standalone.cc

Presently, I cannot reproduce it myself, but seems to be fairly
consistent for the users this happens with.

Not sure if this is talloc_abort() anymore.

Regards, Gaute

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Gaute Hope Gaute Hope
Reply | Threaded
Open this post in threaded view
|

Re: talloc_abort in notmuch_thread_get_tags () when db has been modified

Gaute Hope writes on november 3, 2017 11:45:

> David Bremner writes on februar 17, 2017 16:35:
>> Gaute Hope <[hidden email]> writes:
>>
>>> threads != NULL
>>> terminate called after throwing an instance of 'Xapian::DatabaseModifiedError'
>>
>> Yeah, that looks like a different problem. But it _should_ be something
>> we can catch in libnotmuch.
>
> For reference; I'm getting several reports of this or similar error:
>
>   * https://github.com/astroidmail/astroid/issues/414
>   * https://github.com/astroidmail/astroid/blob/master/tests/test_notmuch_standalone.cc
>
> Presently, I cannot reproduce it myself, but seems to be fairly
> consistent for the users this happens with.
>
> Not sure if this is talloc_abort() anymore.

Actually, at this point this seems to be caused by different GMime
versions used for binary and notmuch library.

Regards, Gaute

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: talloc_abort in notmuch_thread_get_tags () when db has been modified

Gaute Hope <[hidden email]> writes:

>> Not sure if this is talloc_abort() anymore.
>
> Actually, at this point this seems to be caused by different GMime
> versions used for binary and notmuch library.
>
> Regards, Gaute

OK, that sounds like not-a-notmuch-bug.

d
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Gaute Hope Gaute Hope
Reply | Threaded
Open this post in threaded view
|

Re: talloc_abort in notmuch_thread_get_tags () when db has been modified

David Bremner writes on november 3, 2017 13:18:

> Gaute Hope <[hidden email]> writes:
>
>>> Not sure if this is talloc_abort() anymore.
>>
>> Actually, at this point this seems to be caused by different GMime
>> versions used for binary and notmuch library.
>>
>> Regards, Gaute
>
> OK, that sounds like not-a-notmuch-bug.

Yes.. most definetely not.

- gaute

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch