segfault using python bindings

classic Classic list List threaded Threaded
14 messages Options
dcepelik dcepelik
Reply | Threaded
Open this post in threaded view
|

segfault using python bindings

Hello Notmuch devs,

I'm facing an issue trying to use the Python bindings. This trivial
piece of code segfaults:

    import notmuch

    database = notmuch.Database()
    threads = database.create_query('tag:inbox and not tag:killed').search_threads()
   
    for t in threads:
        print("Thread:", t)
        msgs = t.get_toplevel_messages()
        for m in msgs:
            print("Message:", m)
        msgs = t.get_toplevel_messages()
        next(msgs)

The problem is triggered by the call to next. Doing this instead works:

    database = notmuch.Database()

    threads = database.create_query('tag:inbox and not tag:killed').search_threads()
    for t in threads:
        print("Thread:", t)
        msgs = t.get_toplevel_messages()
        for m in msgs:
            print("Message:", m)

    threads = database.create_query('tag:inbox and not tag:killed').search_threads()
    for t in threads:
        print("Thread:", t)
        msgs = t.get_toplevel_messages()
        for m in msgs:
            print("Message:", m)

It seems that the problem is caused by calling get_toplevel_messages
twice on the same Threads object.

I've been able to narrow the problem down using gdb. The first few
frames of the stack-trace are:

    #0  0x000055555557da90 in  ()
    #1  0x00007ffff665db5a in Xapian::Document::Internal::get_value[abi:cxx11](unsigned int) const () at /usr/lib/libxapian.so.30
    #2  0x00007ffff665db91 in Xapian::Document::get_value[abi:cxx11](unsigned int) const () at /usr/lib/libxapian.so.30
    #3  0x00007ffff6e3165a in notmuch_message_get_header(notmuch_message_t*, char const*)
        (message=0x5555556e6920, header=0x7ffff7195f78 "from") at lib/message.cc:549
    #4  0x00007ffff6efb1c8 in ffi_call_unix64 () at /usr/lib/libffi.so.6

The () seems to denote C++'s tuple class:

    (gdb) frame 0
    #0  0x000055555557da90 in ?? ()
    (gdb) lis
    1 // <tuple> -*- C++ -*-
    2
    3 // Copyright (C) 2007-2018 Free Software Foundation, Inc.
    4 //
    5 // This file is part of the GNU ISO C++ Library.  This library is free
    6 // software; you can redistribute it and/or modify it under the
    7 // terms of the GNU General Public License as published by the
    8 // Free Software Foundation; either version 3, or (at your option)
    9 // any later version.
    10

Searching further, I've arrived at the following piece of Xapian code
(xapian-core/backends/documentinternal.h):

    390     std::string get_value(Xapian::valueno slot) const {
    391         if (values) {
    392             auto i = values->find(slot);
    393             if (i != values->end())
    394                 return i->second;
    395             return std::string();
    396         }
    397
    398         return fetch_value(slot);
    399     }

Since the invalid dereference indicates a tuple, I suspect the crash
stems from the use `second' on line 394. (The (->) dereference likely
does not cause the crash since otherwise we wouldn't arrive at the
tuple.)

When I use alot with this version of notmuch/python bindings, it works
just fine, but I suspect that's because alot wraps all the provided
objects to avoid this sort of bugs (and allow for repeated iteration
over Threads objects, etc), and hence avoid calling get_toplevel_messages()
twice.

Does the problem lie with my fundamental misunderstanding of how the
bindings work, or is this a bug?

Linux x1 4.18.16-arch1-1-ARCH #1 SMP PREEMPT Sat Oct 20 22:06:45 UTC 2018 x86_64 GNU/Linux
Python 3.7.1
built from upstream @ 7f726c6 (AUR: notmuch-git 3:0.28.2.7.g7f726c6e-1)


                                            Regards, David

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch

signature.asc (499 bytes) Download Attachment
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: segfault using python bindings

David Čepelík <[hidden email]> writes:

> Hello Notmuch devs,
>
> I'm facing an issue trying to use the Python bindings. This trivial
> piece of code segfaults:
>
>     import notmuch

I don't remember the details [1], but there are known conflicts between
recent versions of python3 and the way the notmuch python bindings
manage memory. So it could be that. There was also an initiative to
rewrite at (python3 only?) version of the bindings that did not have
this problem. I haven't heard much about that recently.


[1]: There is some discussion in the list archives.
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Gaute Hope Gaute Hope
Reply | Threaded
Open this post in threaded view
|

Re: segfault using python bindings

David Bremner writes on November 11, 2018 21:16:
> [1]: There is some discussion in the list archives.

See id:[hidden email]

Regards, Gaute

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Floris Bruynooghe Floris Bruynooghe
Reply | Threaded
Open this post in threaded view
|

Re: segfault using python bindings

In reply to this post by David Bremner-2
Hi,

On Sun 11 Nov 2018 at 16:16 -0400, David Bremner wrote:

> David Čepelík <[hidden email]> writes:
>
>> Hello Notmuch devs,
>>
>> I'm facing an issue trying to use the Python bindings. This trivial
>> piece of code segfaults:
>>
>>     import notmuch
>
> I don't remember the details [1], but there are known conflicts between
> recent versions of python3 and the way the notmuch python bindings
> manage memory. So it could be that. There was also an initiative to
> rewrite at (python3 only?) version of the bindings that did not have
> this problem. I haven't heard much about that recently.

These are at https://github.com/flub/notmuch/tree/cffi/bindings/python-cffi

I'm not really convinced of the way forward last time it was discussed
on how to get them merged into notmuch itself so have failed to put in
the not insignificant effort.

I've since wondered if just getting them standalone on pypi is perhaps a
useful service in the mean time as it's relatively little effort.  And
if there eventually is a desire again to get them merged in some way
that could still be done.


Cheers,
Floris
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Brian May-2 Brian May-2
Reply | Threaded
Open this post in threaded view
|

Re: segfault using python bindings

Floris Bruynooghe <[hidden email]> writes:

> I've since wondered if just getting them standalone on pypi is perhaps a
> useful service in the mean time as it's relatively little effort.  And
> if there eventually is a desire again to get them merged in some way
> that could still be done.

Standalone on pypi would be my preferred option.

It is defacto Python standard to refer to all dependancies in something
like requirements.txt or Pipfile from pypi.
--
Brian May <[hidden email]>
https://linuxpenguins.xyz/brian/
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: segfault using python bindings

In reply to this post by Floris Bruynooghe
Floris Bruynooghe <[hidden email]> writes:

>
> These are at https://github.com/flub/notmuch/tree/cffi/bindings/python-cffi
>
> I'm not really convinced of the way forward last time it was discussed
> on how to get them merged into notmuch itself so have failed to put in
> the not insignificant effort.
>
> I've since wondered if just getting them standalone on pypi is perhaps a
> useful service in the mean time as it's relatively little effort.  And
> if there eventually is a desire again to get them merged in some way
> that could still be done.
>

What effort are you referring to specifically? Integration with the
notmuch test suite?
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: segfault using python bindings

In reply to this post by Brian May-2
Brian May <[hidden email]> writes:

> Floris Bruynooghe <[hidden email]> writes:
>
>> I've since wondered if just getting them standalone on pypi is perhaps a
>> useful service in the mean time as it's relatively little effort.  And
>> if there eventually is a desire again to get them merged in some way
>> that could still be done.
>
> Standalone on pypi would be my preferred option.
>
> It is defacto Python standard to refer to all dependancies in something
> like requirements.txt or Pipfile from pypi.

As I mentioned last time this was discussed, the python bindings are
currently more or less a core part of notmuch as both the test
suite and developement need them.

d
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Daniel Kahn Gillmor Daniel Kahn Gillmor
Reply | Threaded
Open this post in threaded view
|

Re: segfault using python bindings

In reply to this post by David Bremner-2
On Fri 2018-11-16 06:27:12 -0400, David Bremner wrote:

> Floris Bruynooghe <[hidden email]> writes:
>
>> These are at https://github.com/flub/notmuch/tree/cffi/bindings/python-cffi
>>
>> I'm not really convinced of the way forward last time it was discussed
>> on how to get them merged into notmuch itself so have failed to put in
>> the not insignificant effort.
>>
>> I've since wondered if just getting them standalone on pypi is perhaps a
>> useful service in the mean time as it's relatively little effort.  And
>> if there eventually is a desire again to get them merged in some way
>> that could still be done.
>
> What effort are you referring to specifically? Integration with the
> notmuch test suite?
My recollection is that the main question was about supporting the old
python interface with the new bindings, so that consumers would have a
smooth upgrade path.  Is that not right?

Floris, i really appreciate the work you put in here, and i'd love to
see notmuch be able to adopt it directly.   can we figure out what is
needed to take these changes?

       --dkg

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch

signature.asc (233 bytes) Download Attachment
Floris Bruynooghe Floris Bruynooghe
Reply | Threaded
Open this post in threaded view
|

Re: segfault using python bindings

In reply to this post by David Bremner-2
On Fri 16 Nov 2018 at 06:29 -0400, David Bremner wrote:

> Brian May <[hidden email]> writes:
>
>> Floris Bruynooghe <[hidden email]> writes:
>>
>>> I've since wondered if just getting them standalone on pypi is perhaps a
>>> useful service in the mean time as it's relatively little effort.  And
>>> if there eventually is a desire again to get them merged in some way
>>> that could still be done.
>>
>> Standalone on pypi would be my preferred option.
>>
>> It is defacto Python standard to refer to all dependancies in something
>> like requirements.txt or Pipfile from pypi.
>
> As I mentioned last time this was discussed, the python bindings are
> currently more or less a core part of notmuch as both the test
> suite and developement need them.

Sure, I think pypi publishing is orthogonal to this however.  Either or
both versions of the bindings could be published on pypi in addition to
being in the main repo.  As Brian mentions it would improve
discoverability and improves integration on the python side.  There's
even tooling to bundle the library these days with the manylinux1
wheels.  So there's no need to stop anyone who'd like to do this.
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Floris Bruynooghe Floris Bruynooghe
Reply | Threaded
Open this post in threaded view
|

Re: segfault using python bindings

In reply to this post by Daniel Kahn Gillmor
On Fri 16 Nov 2018 at 07:15 -0500, Daniel Kahn Gillmor wrote:

> On Fri 2018-11-16 06:27:12 -0400, David Bremner wrote:
>> Floris Bruynooghe <[hidden email]> writes:
>>
>>> These are at https://github.com/flub/notmuch/tree/cffi/bindings/python-cffi
>>>
>>> I'm not really convinced of the way forward last time it was discussed
>>> on how to get them merged into notmuch itself so have failed to put in
>>> the not insignificant effort.
>>>
>>> I've since wondered if just getting them standalone on pypi is perhaps a
>>> useful service in the mean time as it's relatively little effort.  And
>>> if there eventually is a desire again to get them merged in some way
>>> that could still be done.
>>
>> What effort are you referring to specifically? Integration with the
>> notmuch test suite?
>
> My recollection is that the main question was about supporting the old
> python interface with the new bindings, so that consumers would have a
> smooth upgrade path.  Is that not right?

That's indeed what I was referring to, integration with the test suite
is fine as was discussed last time imho.

> Floris, i really appreciate the work you put in here, and i'd love to
> see notmuch be able to adopt it directly.   can we figure out what is
> needed to take these changes?

Thanks.  I think mainly when the technical approach was discussed [0] no
actual users of the current Python API weighed in with if they'd be
interested in a migration of the API and if so, how it might work for
them.  So while the gradual approach described there is technically
somewhat nice I have no idea if anyone would benefit from it, or whether
the benefits outweigh all the work involved.

As I was recently thinking however, maybe there's nothing wrong with new
bindings being published as a 3rd party package on pypi.  It'd make it
more discoverable and if people start to adopt it maybe there'd be more
demand for integrating it back with more clarity over how smooth a
transition path needs to be.

Also lastly an apology.  I could have done more to move this forward,
but I simply haven't found^Wmade the time for it.

Cheers,
Floris


[0] id:[hidden email]
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: segfault using python bindings

In reply to this post by Floris Bruynooghe
Floris Bruynooghe <[hidden email]> writes:

>> As I mentioned last time this was discussed, the python bindings are
>> currently more or less a core part of notmuch as both the test
>> suite and developement need them.
>
> Sure, I think pypi publishing is orthogonal to this however.  Either or
> both versions of the bindings could be published on pypi in addition to
> being in the main repo.  As Brian mentions it would improve
> discoverability and improves integration on the python side.  There's
> even tooling to bundle the library these days with the manylinux1
> wheels.  So there's no need to stop anyone who'd like to do this.

Well, I agree with all that (and did in the previous thread too). But
the context was Florian's idea of publishing on pypi instead of/before
integrating with notmuch. That's of course his right to do, but my main
(selfish) interest is in having python bindings shipping with notmuch
that work properly with recent python3. I guess even having a separate
set of incompatible python3 only bindings would be better than the
current situation. We could just ship the two bindings in parallel,
deprecate the python2 bindings, and give people a year or so to
transition.
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Dirk Van Haerenborgh Dirk Van Haerenborgh
Reply | Threaded
Open this post in threaded view
|

Re: segfault using python bindings

In reply to this post by dcepelik
Hi,

I've encountered something very similar just today. But not with python,
but my rust bindings. I could very easily reproduce it using:

for (messages = notmuch_thread_get_messages (thread);
notmuch_messages_valid (messages);
notmuch_messages_move_to_next (messages))
{
notmuch_message_t *message = notmuch_messages_get (messages);
const char *mid = notmuch_message_get_message_id(message);
fprintf(stdout, "Message: %s\n", mid);
notmuch_message_destroy(message);
}

for (messages = notmuch_thread_get_messages (thread);
notmuch_messages_valid (messages);
notmuch_messages_move_to_next (messages))
{
notmuch_message_t *message = notmuch_messages_get (messages);
const char *mid = notmuch_message_get_message_id(message);
fprintf(stdout, "Message: %s\n", mid);
notmuch_message_destroy(message);
}

This is not a typo. I deliberately duplicated that bit.
The second time it runs that loop, it will always segfault, unless you
omit the first 'notmuch_message_destroy' call.

Given your example, I suspect the Python3 bindings to do something very
similar. I would think that this is the correct way to use the API?

Kind regards,
-Dirk


On Fri, 09 Nov 2018 15:49:13 +0100, David Čepelík wrote:

> Hello Notmuch devs,
>
> I'm facing an issue trying to use the Python bindings. This trivial
> piece of code segfaults:
>
> import notmuch
>
> database = notmuch.Database()
> threads = database.create_query('tag:inbox and not
> tag:killed').search_threads()
>
> for t in threads:
> print("Thread:", t)
> msgs = t.get_toplevel_messages()
> for m in msgs:
> print("Message:", m)
> msgs = t.get_toplevel_messages()
> next(msgs)
>
> The problem is triggered by the call to next. Doing this instead works:
>
> database = notmuch.Database()
>
> threads = database.create_query('tag:inbox and not
> tag:killed').search_threads()
> for t in threads:
> print("Thread:", t)
> msgs = t.get_toplevel_messages()
> for m in msgs:
> print("Message:", m)
>
> threads = database.create_query('tag:inbox and not
> tag:killed').search_threads()
> for t in threads:
> print("Thread:", t)
> msgs = t.get_toplevel_messages()
> for m in msgs:
> print("Message:", m)
>
> It seems that the problem is caused by calling get_toplevel_messages
> twice on the same Threads object.
>
> I've been able to narrow the problem down using gdb. The first few
> frames of the stack-trace are:
>
> #0 0x000055555557da90 in ()
> #1 0x00007ffff665db5a in
> Xapian::Document::Internal::get_value[abi:cxx11](unsigned int) const
> () at /usr/lib/libxapian.so.30 #2 0x00007ffff665db91 in
> Xapian::Document::get_value[abi:cxx11](unsigned int) const () at
> /usr/lib/libxapian.so.30 #3 0x00007ffff6e3165a in
> notmuch_message_get_header(notmuch_message_t*, char const*)
> (message=0x5555556e6920, header=0x7ffff7195f78 "from") at
> lib/message.cc:549
> #4 0x00007ffff6efb1c8 in ffi_call_unix64 () at /usr/lib/libffi.so.6
>
> The () seems to denote C++'s tuple class:
>
> (gdb) frame 0 #0 0x000055555557da90 in ?? ()
> (gdb) lis 1 // <tuple> -*- C++ -*-
> 2
> 3 // Copyright (C) 2007-2018 Free Software Foundation, Inc.
> 4 //
> 5 // This file is part of the GNU ISO C++ Library. This library is
> free 6 // software; you can redistribute it and/or modify it under
> the 7 // terms of the GNU General Public License as published by
the

> 8 // Free Software Foundation; either version 3, or (at your option)
> 9 // any later version.
> 10
>
> Searching further, I've arrived at the following piece of Xapian code
> (xapian-core/backends/documentinternal.h):
>
> 390 std::string get_value(Xapian::valueno slot) const {
> 391 if (values) {
> 392 auto i = values->find(slot);
> 393 if (i != values->end())
> 394 return i->second;
> 395 return std::string();
> 396 }
> 397 398 return fetch_value(slot);
> 399 }
>
> Since the invalid dereference indicates a tuple, I suspect the crash
> stems from the use `second' on line 394. (The (->) dereference likely
> does not cause the crash since otherwise we wouldn't arrive at the
> tuple.)
>
> When I use alot with this version of notmuch/python bindings, it works
> just fine, but I suspect that's because alot wraps all the provided
> objects to avoid this sort of bugs (and allow for repeated iteration
> over Threads objects, etc), and hence avoid calling
> get_toplevel_messages()
> twice.
>
> Does the problem lie with my fundamental misunderstanding of how the
> bindings work, or is this a bug?
>
> Linux x1 4.18.16-arch1-1-ARCH #1 SMP PREEMPT Sat Oct 20 22:06:45 UTC
> 2018 x86_64 GNU/Linux Python 3.7.1 built from upstream @ 7f726c6 (AUR:
> notmuch-git 3:0.28.2.7.g7f726c6e-1)
>
>
> Regards, David
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: segfault using python bindings

Dirk Van Haerenborgh <[hidden email]> writes:


> This is not a typo. I deliberately duplicated that bit.
> The second time it runs that loop, it will always segfault, unless you
> omit the first 'notmuch_message_destroy' call.
>
> Given your example, I suspect the Python3 bindings to do something very
> similar. I would think that this is the correct way to use the API?
>
> Kind regards,
> -Dirk

Unless I misremember / misunderstand something, calling
notmuch_message_destroy is not a good idea, unless you are going to
recreate the thread object. The thread has an internal list of messages,
which you are freeing.

Possibly the documentation could be improved on this point.

d
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Dirk Van Haerenborgh-2 Dirk Van Haerenborgh-2
Reply | Threaded
Open this post in threaded view
|

Re: segfault using python bindings

Thanks,

The documentation regarding lifetime of pointers is indeed at some points rather limited.
Is there ever a need for calling thread/message_destroy?

From what I understood, lifetime is like this:
Database > Query > Threads > Thread > Messages > Message

I assumed that destroying a 'Message' would only invalidate it for the 'Messages' iterator. Given that you can't destroy a message without invalidating it for the Thread itself, it seems weird that the documentation states that the 'Message' is only valid for the lifetime of the iterator.

Can someone clarify this for me? It would simplify things a lot in notmuch-rs.

-Dirk

On Mon, 19 Nov 2018 at 00:34, David Bremner <[hidden email]> wrote:
Dirk Van Haerenborgh <[hidden email]> writes:


> This is not a typo. I deliberately duplicated that bit.
> The second time it runs that loop, it will always segfault, unless you
> omit the first 'notmuch_message_destroy' call.
>
> Given your example, I suspect the Python3 bindings to do something very
> similar. I would think that this is the correct way to use the API?
>
> Kind regards,
> -Dirk

Unless I misremember / misunderstand something, calling
notmuch_message_destroy is not a good idea, unless you are going to
recreate the thread object. The thread has an internal list of messages,
which you are freeing.

Possibly the documentation could be improved on this point.

d

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch