Crash with Python bindings

classic Classic list List threaded Threaded
38 messages Options
12
Konrad Hinsen Konrad Hinsen
Reply | Threaded
Open this post in threaded view
|

Crash with Python bindings

Hi everyone,

I have been writing quite a few Python scripts for notmuch before
running into a strange bug. Here is a minimal script producing it:

--------------------------------------------------
from notmuch import Query, Database

def foo(bar):
     pass

db = Database()
q = Query(db, "*")
db.close()
--------------------------------------------------

Running this script (Python 3.5, MacOS X) yields:

[1]    22478 abort      pydev3 Temp/notmuch_test.py

The crash actually happens *after* db.close(), when the Python
interpreter exists, and therefore I suspect that no data is lost, but I
hesitate to use scripts with that behavior in production use.

The strange part is that what causes the crash is the presence of the
function foo(), even though it is never called. Remove foo and the
script runs fine. It is also necessary to create a Query object. The
combination of a function definition (any) and the creation of a Query
object yields the crash. This looks like a memory management issue, but
I didn't explore it any further.

Cheers,
   Konrad.
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: Crash with Python bindings

Konrad Hinsen <[hidden email]> writes:

> Hi everyone,
>
> I have been writing quite a few Python scripts for notmuch before
> running into a strange bug. Here is a minimal script producing it:
>
> --------------------------------------------------
> from notmuch import Query, Database
>
> def foo(bar):
>      pass
>
> db = Database()
> q = Query(db, "*")
> db.close()

Do you really call the constructor without a path? Or are you censoring
the script for some reason?

d
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Konrad Hinsen Konrad Hinsen
Reply | Threaded
Open this post in threaded view
|

Re: Crash with Python bindings

David Bremner <[hidden email]> writes:

>> from notmuch import Query, Database
>>
>> def foo(bar):
>>      pass
>>
>> db = Database()
>> q = Query(db, "*")
>> db.close()
>
> Do you really call the constructor without a path? Or are you censoring
> the script for some reason?

No path means path=None, which stands for the path from
~/.notmuch-config. That's exactly what I want. Is there some reason not
to rely on this mechanism?

Cheers,
  Konrad.
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Konrad Hinsen Konrad Hinsen
Reply | Threaded
Open this post in threaded view
|

Re: Crash with Python bindings

In reply to this post by Konrad Hinsen
Hi Justus,

> So I guess what happens is that Python3 changed how the interpreter
> environment is torn down and they actually destroy the 'q' object.  If
> that is so, then your data is indeed safe.

That reminds me of a recent change in object finalization in Python 3:

  https://www.python.org/dev/peps/pep-0442/

I'll look into that, thanks for the gdb exploration!

Cheers,
  Konrad.
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: Crash with Python bindings

In reply to this post by Konrad Hinsen
Konrad Hinsen <[hidden email]> writes:

> David Bremner <[hidden email]> writes:
>
>>> from notmuch import Query, Database
>>>
>>> def foo(bar):
>>>      pass
>>>
>>> db = Database()
>>> q = Query(db, "*")
>>> db.close()
>>
>> Do you really call the constructor without a path? Or are you censoring
>> the script for some reason?
>
> No path means path=None, which stands for the path from
> ~/.notmuch-config. That's exactly what I want. Is there some reason not
> to rely on this mechanism?

Oh sorry, I'm (obviously) not that familiar with the python bindings.

d
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
W. Trevor King W. Trevor King
Reply | Threaded
Open this post in threaded view
|

Re: Crash with Python bindings

In reply to this post by Konrad Hinsen
On Tue, Jan 12, 2016 at 10:41:57AM +0100, Konrad Hinsen wrote:

> --------------------------------------------------
> from notmuch import Query, Database
>
> def foo(bar):
>      pass
>
> db = Database()
> q = Query(db, "*")
> db.close()
> --------------------------------------------------
>
> Running this script (Python 3.5, MacOS X) yields:
>
> [1]    22478 abort      pydev3 Temp/notmuch_test.py
> …
> The strange part is that what causes the crash is the presence of the
> function foo(), even though it is never called. Remove foo and the
> script runs fine. It is also necessary to create a Query object.
Adding some more data-points, I see the same results with Python 3.4.3
on Gentoo, although stderr just gets “Aborted”.  All permutations seem
to work on Python 3.3.5.

Cheers,
Trevor

--
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch

signature.asc (836 bytes) Download Attachment
W. Trevor King W. Trevor King
Reply | Threaded
Open this post in threaded view
|

Re: Crash with Python bindings

In reply to this post by Konrad Hinsen
On Tue, Jan 12, 2016 at 03:23:46PM +0100, Konrad Hinsen wrote:

> Hi Justus,
>
> > So I guess what happens is that Python3 changed how the
> > interpreter environment is torn down and they actually destroy the
> > 'q' object.  If that is so, then your data is indeed safe.
>
> That reminds me of a recent change in object finalization in Python
> 3:
>
>   https://www.python.org/dev/peps/pep-0442/
I'm pretty sure that is what's going on.  The PEP landed in Python
3.4, explaining why I don't see this issue in 3.3 [1].  And it has
[2]:

   In particular, this PEP obsoletes the current guideline that
   "objects with a __del__ method should not be part of a reference
   cycle".

I'm not sure what the best way is to fix __del__ in our Python
bindings, but if you manage them explicitly:

  db = Database()
  q = Query(db, "*")
  del q
  db.close()
  del db

you can avoid the abort (which happens when q.__del__ is called after
db.__del__).  We could make that sort of cleanup easier with context
managers for Query objects (we have them for databases since [3]), and
they look like the only object that keep an internal database
reference:

  with Database() as db:
    with Query(db, "*") as q:
      # do something with q
    db.close()

Cheers,
Trevor

[1]: id:[hidden email]
[2]: https://www.python.org/dev/peps/pep-0442/#impact
[3]: 36ce7e3c (python: implement the context manager protocol for
     database objects, 2012-02-15, v0.12)

--
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch

signature.asc (836 bytes) Download Attachment
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: Crash with Python bindings

In reply to this post by David Bremner-2
David Bremner <[hidden email]> writes:

>> No path means path=None, which stands for the path from
>> ~/.notmuch-config. That's exactly what I want. Is there some reason not
>> to rely on this mechanism?
>
> Oh sorry, I'm (obviously) not that familiar with the python bindings.

Nothing to do with Konrad's crash, but I consider the fact that the
python bindings read ~/.notmuch-config to be a kind of layering
violation, since that file belongs to the CLI, while the bindings are
supposed to provide access to libnotmuch. Whether this is a real problem
or just an aesthetic one, I'm not sure, but I thought I'd mention it
since we are thinking of various config related issues. Obviously the
location of the database is not one of the things it makes sense to
store in the database. I can imagine scenarios where the bindings might
be usable without the CLI, but they seem fairly artificial so far, since
it seems like almost everyone needs notmuch-new / notmuch-insert.

d

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
W. Trevor King W. Trevor King
Reply | Threaded
Open this post in threaded view
|

Binding access to ~/.notmuch-config (was: Crash with Python bindings)

On Tue, Jan 12, 2016 at 03:03:15PM -0400, David Bremner wrote:
> Nothing to do with Konrad's crash, but I consider the fact that the
> python bindings read ~/.notmuch-config to be a kind of layering
> violation, since that file belongs to the CLI, while the bindings
> are supposed to provide access to libnotmuch.

I think of ~/.notmuch-config as being shared between all client code,
and in that view it makes sense to have both the CLI and Python
bindings (and other bindings) access it to figure out how to configure
their library access calls.  Having a separate config file for each
client to point at the default database path seems like more trouble
than it's worth, as does adding a library function for “reach into
some local config and return the default database path”.

Cheers,
Trevor

--
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch

signature.asc (836 bytes) Download Attachment
Konrad Hinsen Konrad Hinsen
Reply | Threaded
Open this post in threaded view
|

Re: Binding access to ~/.notmuch-config

On 12/01/16 20:13, W. Trevor King wrote:

> On Tue, Jan 12, 2016 at 03:03:15PM -0400, David Bremner wrote:
>> Nothing to do with Konrad's crash, but I consider the fact that the
>> python bindings read ~/.notmuch-config to be a kind of layering
>> violation, since that file belongs to the CLI, while the bindings
>> are supposed to provide access to libnotmuch.
>
> I think of ~/.notmuch-config as being shared between all client code,
> and in that view it makes sense to have both the CLI and Python
> bindings (and other bindings) access it to figure out how to configure

I agree. I see notmuch as a collection of CLI tools, some of which are
part of the distribution and others are written by myself for my
specific needs. I'd like them all to share a single configuration file.
In fact, I'd love to be able to add sections specific to my Python scripts.

Cheers,
   Konrad.
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: Binding access to ~/.notmuch-config

Konrad Hinsen <[hidden email]> writes:

> I agree. I see notmuch as a collection of CLI tools, some of which are
> part of the distribution and others are written by myself for my
> specific needs. I'd like them all to share a single configuration file.
> In fact, I'd love to be able to add sections specific to my Python scripts.

Yes, I understand that it's convenient, but the current set up is not
really very robust.  The python bindings are making assumptions about a
file format that has never been documented, and whose stability has
never been promised. It's roughly like calling private functions not
part of the published API. Sometimes it's the only way to do something,
but that doesn't mean it won't break. I think we'll eventually want to
provide some config related API, but having having arbitrary programs
reading and writing this file isn't it, IMHO.  In particular upcoming
changes may move some configuration items out of this file and into a
library level configuration API.

d
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
W. Trevor King W. Trevor King
Reply | Threaded
Open this post in threaded view
|

Re: Binding access to ~/.notmuch-config

On Wed, Jan 13, 2016 at 08:25:10AM -0400, David Bremner wrote:

> Konrad Hinsen writes:
> > I agree. I see notmuch as a collection of CLI tools, some of which
> > are part of the distribution and others are written by myself for
> > my specific needs. I'd like them all to share a single
> > configuration file.  In fact, I'd love to be able to add sections
> > specific to my Python scripts.
>
> Yes, I understand that it's convenient, but the current set up is
> not really very robust…  In particular upcoming changes may move
> some configuration items out of this file and into a library level
> configuration API.
I think you mean [1].  And that's fine, since Python scripts, etc.,
can use that API to set and access config settings, be they standard
or script-specific (as far as I can tell, I haven't reviewed that
series in detail).  Docs on any standard settings would be nice, but I
guess they'd land as features moved out of ~/.notmuch-config and into
the database.  The only setting that can't move into the database (as
you pointed out earlier [2]) is the path to the database.  That's
currently all the Python bindings extract now, and making that route
the officially blessed way to find the default database path makes
sense to me.

Cheers,
Trevor

[1]: id:[hidden email]
     http://thread.gmane.org/gmane.mail.notmuch.general/21643
[2]: id:[hidden email]
     http://article.gmane.org/gmane.mail.notmuch.general/21639

--
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch

signature.asc (836 bytes) Download Attachment
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: Crash with Python bindings

In reply to this post by W. Trevor King
"W. Trevor King" <[hidden email]> writes:

> you can avoid the abort (which happens when q.__del__ is called after
> db.__del__).  We could make that sort of cleanup easier with context
> managers for Query objects (we have them for databases since [3]), and
> they look like the only object that keep an internal database
> reference:
>
>   with Database() as db:
>     with Query(db, "*") as q:
>       # do something with q
>     db.close()
>

I'm reminded [1] that this problem still exists. If noone has any idea
of a fix, should we document one of the workarounds?

     
[1]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=893057
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Justus Winter-3 Justus Winter-3
Reply | Threaded
Open this post in threaded view
|

Re: Crash with Python bindings

David Bremner <[hidden email]> writes:

> "W. Trevor King" <[hidden email]> writes:
>
>> you can avoid the abort (which happens when q.__del__ is called after
>> db.__del__).  We could make that sort of cleanup easier with context
>> managers for Query objects (we have them for databases since [3]), and
>> they look like the only object that keep an internal database
>> reference:
>>
>>   with Database() as db:
>>     with Query(db, "*") as q:
>>       # do something with q
>>     db.close()
So while this shouldn't crash of course, this code is wrong.  The
context manager closes the database, so doing db.close() at the end of
the block is superfluous.

> I'm reminded [1] that this problem still exists. If noone has any idea
> of a fix, should we document one of the workarounds?

I don't remember the details, but the different semantics of garbage
collection and talloc was problematic.  In essence, every wrapping
Python object must keep a reference to its parent (as in parent in the
talloc hierarchy).

The bug report [1] sounds like that the crash happens at interpreter
shutdown, where iirc the objects destructors are called in arbitrary
order.


Justus

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch

signature.asc (497 bytes) Download Attachment
Floris Bruynooghe Floris Bruynooghe
Reply | Threaded
Open this post in threaded view
|

Re: Crash with Python bindings

In reply to this post by David Bremner-2
Hi all,

David Bremner <[hidden email]> writes:

> "W. Trevor King" <[hidden email]> writes:
>
>> you can avoid the abort (which happens when q.__del__ is called after
>> db.__del__).  We could make that sort of cleanup easier with context
>> managers for Query objects (we have them for databases since [3]), and
>> they look like the only object that keep an internal database
>> reference:
>>
>>   with Database() as db:
>>     with Query(db, "*") as q:
>>       # do something with q
>>     db.close()
>>
>
> I'm reminded [1] that this problem still exists. If noone has any idea
> of a fix, should we document one of the workarounds?

This is exactly what I have fixed in my alternative bindings which I
created around the end of last year [0].  So we do have an idea of how
to fix this, at the time I said I do believe that it's possible to also
do this for the existing bindings even though it is a lot of work.
After some talking between dkg and me we got to a way forward which
proposed this, but I must admit that after messing a little with getting
a pytest run integrated into the notmuch test-suite instead of using tox
I lost momentum on the project and didn't advance any further.

If someone can hook pytest runs with various python versions into the
notmuch test suit I'd be very much obliged and probably have another go
at this as it's still an interesting problem and gives a nice way
forward.

Cheers,
Floris

[0] https://github.com/flub/notmuch/tree/cffi/bindings/python-cffi
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Daniel Kahn Gillmor Daniel Kahn Gillmor
Reply | Threaded
Open this post in threaded view
|

Re: Crash with Python bindings

On Fri 2018-03-16 19:30:37 +0100, Floris Bruynooghe wrote:
> If someone can hook pytest runs with various python versions into the
> notmuch test suit I'd be very much obliged and probably have another go
> at this as it's still an interesting problem and gives a nice way
> forward.

I don't really know what this request means -- so maybe that means that
i'm not the right person for the task, which is fine.

but it's also possible that the right person for the task *also* doesn't
know what you're asking for, so if you could elaborate a bit further
i think that would be super helpful :)

thanks for looking into this further!

  --dkg
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Floris Bruynooghe Floris Bruynooghe
Reply | Threaded
Open this post in threaded view
|

Re: Crash with Python bindings

Daniel Kahn Gillmor <[hidden email]> writes:

> On Fri 2018-03-16 19:30:37 +0100, Floris Bruynooghe wrote:
>> If someone can hook pytest runs with various python versions into the
>> notmuch test suit I'd be very much obliged and probably have another go
>> at this as it's still an interesting problem and gives a nice way
>> forward.
>
> I don't really know what this request means -- so maybe that means that
> i'm not the right person for the task, which is fine.
>
> but it's also possible that the right person for the task *also* doesn't
> know what you're asking for, so if you could elaborate a bit further
> i think that would be super helpful :)

Fair enough :)
Here a somewhat more long-form version of this:

Before even attempting to refactor the existing bindings to use cffi as
a backend instead off ctypes and/or adding the changes needed to track
the lifetime of objects correctly I would like to be able to write full
unitttest-level tests for the bindings to be able to guarantee that no
user-level APIs are broken.  In my version of the bindings I did this
the traditional Python way: using pytest for writing unittest and using
tox to invoke the tests for the various supported versions of python.

One of the feedback items I got from the patch I sent last time was that
the project would be reluctant to adopt this and would like to avoid
virtualenv and pip with their behaviour of downloading things over the
network.  Instead wishing for it to use a system python which should
have the available tools already installed (i.e. pytest).  And all this
just integrated in the existing test suite.

So my last attempt at this looks like I made a test/T391-pytest.sh file
with the idea of running a subtest for each python version, with the
subtest being a ``pythonX.Y -m pytest bindings/python/tests/`` so it'd
run the entire test.  To be nice this also needs to be hooked up so that
the subtests get skipped when a python version is not available, or is
missing pytest itself.

So while trying to figure this out is where I got distracted last time
and started working more on other things.


Kind Regards,
Floris
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Justus Winter-3 Justus Winter-3
Reply | Threaded
Open this post in threaded view
|

New Python bindings (was: Crash with Python bindings)

In reply to this post by Floris Bruynooghe
Hi Floris :)

Floris Bruynooghe <[hidden email]> writes:

> This is exactly what I have fixed in my alternative bindings which I
> created around the end of last year [0].  So we do have an idea of how
> to fix this, at the time I said I do believe that it's possible to also
> do this for the existing bindings even though it is a lot of work.
> After some talking between dkg and me we got to a way forward which
> proposed this, but I must admit that after messing a little with getting
> a pytest run integrated into the notmuch test-suite instead of using tox
> I lost momentum on the project and didn't advance any further.

I'm sorry that I didn't speak up when you announced your work.  I'm
actually excited about a new set of bindings for Python.  I agree with
using cffi.  I briefly looked at the code, and I believe it is much
nicer than what we currently have.

I trust that it works fine with Python 3, does it?

The testsuite cannot depend on pulling stuff from the net simply because
build servers typically do not have access to it.  That is a hard
requirement.

I don't remember what kind of tests there are for the current bindings
(are there any...?), but shouldn't these just continue to work for the
time being?


Thanks,
Justus

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch

signature.asc (497 bytes) Download Attachment
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

pytest integration for the notmuch test suite

Here's one approach. A given pytest "file" can be embedded in a normal
(for us) test script.  As I write this, it occurs to me you might be
thinking of embedding unit tests in the bindings source files; that
would be easy to add, something along the lines of

test_begin_subtest "python bindings embedded unit tests"
test_expect_success "${NOTMUCH_PYTEST} ${NOTMUCH_SRCDIR}/bindings/python/notmuch"

You could also run one source file of tests with

test_begin_subtest "python bindings foo tests"
test_expect_success "${NOTMUCH_PYTEST} ${NOTMUCH_SRCDIR}/bindings/python/notmuch/test_foo.py"

that would give a less granular result, at the cost of more boilerplate

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

[PATCH 1/3] configure: check for pytest binary

This is to support future use of pytest in the test suite
---
 configure | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/configure b/configure
index b177b141..ab45878d 100755
--- a/configure
+++ b/configure
@@ -62,6 +62,7 @@ CXXFLAGS=${CXXFLAGS:-\$(CFLAGS)}
 LDFLAGS=${LDFLAGS:-}
 XAPIAN_CONFIG=${XAPIAN_CONFIG:-}
 PYTHON=${PYTHON:-}
+PYTEST=${PYTEST:-}
 
 # We don't allow the EMACS or GZIP Makefile variables inherit values
 # from the environment as we do with CC and CXX above. The reason is
@@ -118,6 +119,8 @@ Other environment variables can be used to control configure itself,
  library. [$XAPIAN_CONFIG]
  PYTHON Name of python command to use in
  configure and the test suite.
+        PYTEST Name of pytest command to use in
+                        the test suite.
 
 Additionally, various options can be specified on the configure
 command line.
@@ -571,6 +574,24 @@ if [ $have_python -eq 0 ]; then
     errors=$((errors + 1))
 fi
 
+pytest=""
+if [ $have_python -eq 1 ]; then
+    printf "Checking for pytest... "
+    have_pytest=0
+
+    for name in ${PYTEST} pytest-3 pytest pytest-2; do
+        if command -v $name > /dev/null; then
+            have_pytest=1
+            pytest=$name
+            printf "Yes (%s).\n" $pytest
+            break
+        fi
+    done
+    if [ $have_pytest -eq 0 ]; then
+        printf "No (some tests may be skipped).\n"
+    fi
+fi
+
 printf "Checking for valgrind development files... "
 if pkg-config --exists valgrind; then
     printf "Yes.\n"
@@ -1234,6 +1255,9 @@ NOTMUCH_HAVE_MAN=$((have_sphinx))
 # Name of python interpreter
 NOTMUCH_PYTHON=${python}
 
+# Name of pytest runner
+NOTMUCH_PYTEST=${pytest}
+
 # Are the ruby development files (and ruby) available? If not skip
 # building/testing ruby bindings.
 NOTMUCH_HAVE_RUBY_DEV=${have_ruby_dev}
--
2.16.2

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
12