Update on python-cffi bindings

classic Classic list List threaded Threaded
5 messages Options
Floris Bruynooghe Floris Bruynooghe
Reply | Threaded
Open this post in threaded view
|

Update on python-cffi bindings

Hi all,

Thanks for all the feedback on an early post of my CFFI-based libnotmuch
Python bindings.  I've now completed these somewhat more and they now
have most of the functionality.  Here's what's new since last time:

- All tests pass on Python 3.5, 3.6 and pypy3.5

  I could probably add 3.4 as well, but I got the impression from people
  this was less important.  Likewise it seemed there was a more-or-less
  consensus that missing python 2.7 is no big deal.

- properties supported as a mapping

  This behaves as a normal mapping, with one extra method:
  getall(prefix='', *, exact=False).  Which returns an iterator rather
  then just one item.

- queries and threads

  There's now a Database.messages(query, *,
                                  omit_excluded=EXCLUDE.TRUE,
                                  sort=SORT.UNSORTED,
                                  exclude=tags=None)
  function.  It behaves as a generator, i.e. an iterator is returned.
  Likewise for Database.threads().  Also Database.count_messages() and
  Database.count_threads() all have the same signature (well, they
  return ints).

- Message.frozen() as a context manager

  with msg.frozen():
      msg.clear()
      raise Exception('oops')

  just works, that's less obvious to make this work then it looks but
  seemed worth it to look native in Python.

I've pushed all changes to github.com/flub/notmuch on the cffi branch.
I'd be keen to hear more feedback from other people.

Things still to do:

- Nice docs, there's comprehensive docstrings now but sphinx docs are
  clearly superior.

- A few API bits, e.g. Database.upgrade() and some others.  They're
  probably less important and I'll likely only do them as people request
  them.

- Improve finalization.  It'd be nice to give users direct access to
  .destroy() methods to free the memory, or even turn everything into
  (optional) context managers which would automatically call .destroy():

  with db.threads('tag:foo') as threads:
      for thread in threads:
          pass

  This is probably nice for long-running applications which may want to
  be somewhat careful about their memory.  But finalizers need to be
  implemented with a slightly more extended version of
  weakref.finalizer() for this to be safe.

Let me know what you think.  I'd also be happy to re-post a patch, but
it's huge and also the commits not very clean currently.  Please let me
know the best way to proceed.

Floris
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Daniel Kahn Gillmor Daniel Kahn Gillmor
Reply | Threaded
Open this post in threaded view
|

Re: Update on python-cffi bindings

Hi Floris--

On Sun 2017-12-17 19:08:18 +0100, Floris Bruynooghe wrote:

> Thanks for all the feedback on an early post of my CFFI-based libnotmuch
> Python bindings.  I've now completed these somewhat more and they now
> have most of the functionality.  Here's what's new since last time:

thanks for this work!  I appreciate and share your goals of fixing the
memory management models so that notmuch objects can't incur the crashes
i've heard reported, and i also appreciate your attention to performance
concerns on different python platforms (e.g. making sure things are
performant on both CPython and PyPy).

I confess that i haven't read the series in full, but i have two main
concerns that i'd generally use to evaluate proposals like this:

 0) how much does the API change?  that is, if we're expecting existing
    users of the notmuch python bindings to adopt this new approach, how
    much pain are we putting them through?  How much of an effort has
    been made to reduce that pain, and do we have a clear and
    comprehensive porting guide?

 1) how accessible is the new implementation for contributions from
    other developers?  For example, a transition to a highly idiomatic
    style of python that no other developers would be able to improve
    would put a large maintenance burden on you.


Do you have any thoughts about these questions?

For example, for point 0, have you tried to run alot or some other
python-based notmuch MUA against this newer python binding?  what
changes were necessary?

For example, for point 1, can you show me how i would (as a fellow notmuch
contributor) create a patch to add support for some of the recent (post
0.25.3) changes to notmuch in the python interface?

Also, the old python bindings are actually directly exercised by the
notmuch test suite.  i see you've adopted the tox.ini convention to
trigger a run of pytest, but that's not how the current test suite
works.  Do you think notmuch needs to adopt tox in order to use this
series, or do you think you could integrate the testing of this module
into the currently-existing test suite?

the more things that this series changes at once, the harder it is to
push for its integration.  bite-size, piecemeal changes will have an
easier path to wider adoption.

thanks for this work!  I look forward to us solving the problems you're
working on it this series.  I hope we can get there :)

       --dkg

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch

signature.asc (847 bytes) Download Attachment
Floris Bruynooghe Floris Bruynooghe
Reply | Threaded
Open this post in threaded view
|

Re: Update on python-cffi bindings

Daniel Kahn Gillmor <[hidden email]> writes:

> Hi Floris--
>
> On Sun 2017-12-17 19:08:18 +0100, Floris Bruynooghe wrote:
>
> i've heard reported, and i also appreciate your attention to performance
> concerns on different python platforms (e.g. making sure things are
> performant on both CPython and PyPy).

Oh btw, I can happily report this has paid off.  These bindings perform
much better on PyPy while performing slightly worse on CPython.  I
haven't proven this but my guess is most of the performance loss on
CPython is due to the memory management which unfortunately involves a
lot of calls on each object destruction, also each object needs to be
destroyed individually so when destroying a parent all children will get
destroyed first which is a waste from libnotmuch's point of view.  This
can be improved on by using a weakref.finalizer-modeled approach, but
is some more work.

> I confess that i haven't read the series in full, but i have two main
> concerns that i'd generally use to evaluate proposals like this:
>
>  0) how much does the API change?  that is, if we're expecting existing
>     users of the notmuch python bindings to adopt this new approach, how
>     much pain are we putting them through?  How much of an effort has
>     been made to reduce that pain, and do we have a clear and
>     comprehensive porting guide?

The API changes a lot and there is no easy migration.  And history has
shown that's a terrible way to get something new adopted.  Last time I
suggested a possible multi-tiered approach (maybe not as explicit):

1 I think it's possible to move the memory management technique to the
  existing API without API breakage.  It would still allow users to call
  functions in the wrong order etc, but that's not any regression.

2 It's probably possible to either switch the existing API to use CFFI
  or create a drop-in replacement for it based on CFFI.  The benefit
  here is two-fold: users get better PyPy performance and I believe it
  becomes easier to maintain the bindings, e.g. all you need to do to
  call notmuch_database_get_path is
  capi.ffi.string(capi.lib.notmuch_database_get_path(dbptr)) (see
  https://github.com/flub/notmuch/blob/cffi/bindings/python-cffi/notdb/_database.py#L263)
  for an actual example).  But this really depends on what everyone else
  here that maintains the Python bindings here thinks.  I'd encourage to
  have a look at the CFFI implementation to get an idea of this.

3 As last step I still think providing the more idiomatic bindings is
  useful, especially for new users.  It does take the burden of
  correctly calling the C functions somewhat more.  This could then
  either treat a notmuch_cffi as a lower level API which more directly
  maps the C API or it could call C directly as it does now.  I'm not
  currently sure on which is more feasible or better here.

  An additional thing that could be done here to ease migration is to
  allow creating the new objects from the old ones allowing a codebase
  to gradually adopt the newer API where appropriate.  E.g.:

     db = notmuch.Database(...)
     msg = db.find_message(...)
     new_db = notdb.Database.from_notmuch(db)
     new_msg = notdb.Message.from_notmuch(msg)
     print('Tags not on msg: {}'.format(new_db.tags - new_msg.tags))

  This would rely on the existing API to migrate to CFFI, otherwise it
  could still be possible but would be very hairy.

So do people reckon following this path would be more desirable and
worth the extra effort?  Would there be a willingness to change the
exising notmuch API to a CFFI implementation?  I didn't go down this
path yet as last time there was no feedback on this suggestion while
there was some moderate curiosity for a more idiomatic API.

>  1) how accessible is the new implementation for contributions from
>     other developers?  For example, a transition to a highly idiomatic
>     style of python that no other developers would be able to improve
>     would put a large maintenance burden on you.

- For the CFFI-part I believe this is easier then the existing ctypes as
  I tried to say above.

- For exposing completely new APIs, sure building the
  notdb.ImmutableTagSet and MutableTagSet was not straight forward,
  likewise for the PropertiesMap.  Many other things are much easier
  though.  One possible nice way to alleviate this with the idea of the
  existing notmuch API being the lower-level layer nearly mimicking the
  C-API directly.  That way adding new APIs there is more or less
  straight forward and there is less time pressure to add them to the
  higher level API.  Especially if mixing the APIs is supported.


> Do you have any thoughts about these questions?
>
> For example, for point 0, have you tried to run alot or some other
> python-based notmuch MUA against this newer python binding?  what
> changes were necessary?
>
> For example, for point 1, can you show me how i would (as a fellow notmuch
> contributor) create a patch to add support for some of the recent (post
> 0.25.3) changes to notmuch in the python interface?

I think I've answered these somewhat already.  If you think it would be
useful to see a real example of one of the recent patches against what I
have now I could create this.  Just let me know.

> Also, the old python bindings are actually directly exercised by the
> notmuch test suite.  i see you've adopted the tox.ini convention to
> trigger a run of pytest, but that's not how the current test suite
> works.  Do you think notmuch needs to adopt tox in order to use this
> series, or do you think you could integrate the testing of this module
> into the currently-existing test suite?

I must admit I didn't look too much at the existing tests untill just
now.  If I understand correctly the existing tests are in
T390-python.sh.  In this case I'd say the tests I added are a lot more
thorough.  The reason I added tox.ini is to easily test against multiple
python versions, i.e. CPython 3.5, 3.6 & PyPy 3.5.  If I had to guess at
the best way to integrate I'd say it's probably best to create a
TXXX-python-pytest-pyXY.sh for each supported python version and lose
tox.ini.  I'd suggest for those tests to be a simplified version of what
tox does: create a virtualenv and run pytest inside of it.

I'm open to other suggestions though, I'm not very familiar with the
notmuch testing.


Let me know what you all think,
Floris
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Daniel Kahn Gillmor Daniel Kahn Gillmor
Reply | Threaded
Open this post in threaded view
|

Re: Update on python-cffi bindings

On Thu 2017-12-21 12:30:39 +0100, Floris Bruynooghe wrote:

> The API changes a lot and there is no easy migration.  And history has
> shown that's a terrible way to get something new adopted.  Last time I
> suggested a possible multi-tiered approach (maybe not as explicit):
>
> 1 I think it's possible to move the memory management technique to the
>   existing API without API breakage.  It would still allow users to call
>   functions in the wrong order etc, but that's not any regression.
>
> 2 It's probably possible to either switch the existing API to use CFFI
>   or create a drop-in replacement for it based on CFFI.  The benefit
>   here is two-fold: users get better PyPy performance and I believe it
>   becomes easier to maintain the bindings, e.g. all you need to do to
>   call notmuch_database_get_path is
>   capi.ffi.string(capi.lib.notmuch_database_get_path(dbptr)) (see
>   https://github.com/flub/notmuch/blob/cffi/bindings/python-cffi/notdb/_database.py#L263)
>   for an actual example).  But this really depends on what everyone else
>   here that maintains the Python bindings here thinks.  I'd encourage to
>   have a look at the CFFI implementation to get an idea of this.
these two steps on their own seem like they give us:

  * better and safer memory management

  * better performance on PyPy

  * arguably, easier maintenance of the bindings.

These seem like unambiguous wins, and there is no downside -- people
using the notmuch module can just upgrade to the new version and it's
done.  I'd love to see these changes happen.

The only thing it doesn't do is provide more idiomatic bindings for new
users, which you describe as:

> 3 As last step I still think providing the more idiomatic bindings is
>   useful, especially for new users.  It does take the burden of
>   correctly calling the C functions somewhat more.  This could then
>   either treat a notmuch_cffi as a lower level API which more directly
>   maps the C API or it could call C directly as it does now.  I'm not
>   currently sure on which is more feasible or better here.
>
>   An additional thing that could be done here to ease migration is to
>   allow creating the new objects from the old ones allowing a codebase
>   to gradually adopt the newer API where appropriate.  E.g.:
>
>      db = notmuch.Database(...)
>      msg = db.find_message(...)
>      new_db = notdb.Database.from_notmuch(db)
>      new_msg = notdb.Message.from_notmuch(msg)
>      print('Tags not on msg: {}'.format(new_db.tags - new_msg.tags))
>
>   This would rely on the existing API to migrate to CFFI, otherwise it
>   could still be possible but would be very hairy.
I'm not sure i understand this last sentence.  Are you saying that step
3 depends on step 2 happening?

I'm not sure about the name "notdb" but i don't mind the idea of there
being some other "more-pythonic" notmuch bindings.  New users would
likely prefer it, and that'd be fine.

> - For exposing completely new APIs, sure building the
>   notdb.ImmutableTagSet and MutableTagSet was not straight forward,
>   likewise for the PropertiesMap.  Many other things are much easier
>   though.  One possible nice way to alleviate this with the idea of the
>   existing notmuch API being the lower-level layer nearly mimicking the
>   C-API directly.  That way adding new APIs there is more or less
>   straight forward and there is less time pressure to add them to the
>   higher level API.  Especially if mixing the APIs is supported.

I think this is in line with the approach described above.  I like it.

> I must admit I didn't look too much at the existing tests untill just
> now.  If I understand correctly the existing tests are in
> T390-python.sh.  In this case I'd say the tests I added are a lot more
> thorough.  The reason I added tox.ini is to easily test against multiple
> python versions, i.e. CPython 3.5, 3.6 & PyPy 3.5.  If I had to guess at
> the best way to integrate I'd say it's probably best to create a
> TXXX-python-pytest-pyXY.sh for each supported python version and lose
> tox.ini.  I'd suggest for those tests to be a simplified version of what
> tox does: create a virtualenv and run pytest inside of it.

I personally would rather not deal with virtualenvs during development
if we don't need them.

Take a look at test_python() in test/test-lib.sh to see how the python
tests are currently handled -- it's pretty simple, relies on the
configured version of notmuch for testing, and it would be great to keep
the simplicity.

I think the approach you've outlined above sounds really good, and i
would be happy to see it happen for notmuch.  It strikes me as much more
useful and less scary than a hard cutover or overhaul that requires
adoption of a new API from the ground up.

Thanks for continuing to push on this, Floris!

         --dkg

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch

signature.asc (847 bytes) Download Attachment
Floris Bruynooghe Floris Bruynooghe
Reply | Threaded
Open this post in threaded view
|

Re: Update on python-cffi bindings

Hi all,

Daniel Kahn Gillmor <[hidden email]> writes:

> On Thu 2017-12-21 12:30:39 +0100, Floris Bruynooghe wrote:
>> The API changes a lot and there is no easy migration.  And history has
>> shown that's a terrible way to get something new adopted.  Last time I
>> suggested a possible multi-tiered approach (maybe not as explicit):
>>
>> 1 I think it's possible to move the memory management technique to the
>>   existing API without API breakage.  It would still allow users to call
>>   functions in the wrong order etc, but that's not any regression.
>>
>> 2 It's probably possible to either switch the existing API to use CFFI
>>   or create a drop-in replacement for it based on CFFI.  The benefit
>>   here is two-fold: users get better PyPy performance and I believe it
>>   becomes easier to maintain the bindings, e.g. all you need to do to
>>   call notmuch_database_get_path is
>>   capi.ffi.string(capi.lib.notmuch_database_get_path(dbptr)) (see
>>   https://github.com/flub/notmuch/blob/cffi/bindings/python-cffi/notdb/_database.py#L263)
>>   for an actual example).  But this really depends on what everyone else
>>   here that maintains the Python bindings here thinks.  I'd encourage to
>>   have a look at the CFFI implementation to get an idea of this.
>
> these two steps on their own seem like they give us:
>
>   * better and safer memory management
>
>   * better performance on PyPy
>
>   * arguably, easier maintenance of the bindings.
>
> These seem like unambiguous wins, and there is no downside -- people
> using the notmuch module can just upgrade to the new version and it's
> done.  I'd love to see these changes happen.
>
> The only thing it doesn't do is provide more idiomatic bindings for new
> users, which you describe as:
>
>> 3 As last step I still think providing the more idiomatic bindings is
>>   useful, especially for new users.  It does take the burden of
>>   correctly calling the C functions somewhat more.  This could then
>>   either treat a notmuch_cffi as a lower level API which more directly
>>   maps the C API or it could call C directly as it does now.  I'm not
>>   currently sure on which is more feasible or better here.
>>
>>   An additional thing that could be done here to ease migration is to
>>   allow creating the new objects from the old ones allowing a codebase
>>   to gradually adopt the newer API where appropriate.  E.g.:
>>
>>      db = notmuch.Database(...)
>>      msg = db.find_message(...)
>>      new_db = notdb.Database.from_notmuch(db)
>>      new_msg = notdb.Message.from_notmuch(msg)
>>      print('Tags not on msg: {}'.format(new_db.tags - new_msg.tags))
>>
>>   This would rely on the existing API to migrate to CFFI, otherwise it
>>   could still be possible but would be very hairy.
>
> I'm not sure i understand this last sentence.  Are you saying that step
> 3 depends on step 2 happening?

The mixing of a new API with the existing API would be possible if we
can successfully migrate the existing notmuch API to use CFFI.  If not
it may theoretically be possible but could be a lot of pain.

> I'm not sure about the name "notdb"

I'm in no way attached to this name, just needed something to work with.
Naming things is hard, if anyone has a different idea please share.

> but i don't mind the idea of there
> being some other "more-pythonic" notmuch bindings.  New users would
> likely prefer it, and that'd be fine.
>
>> - For exposing completely new APIs, sure building the
>>   notdb.ImmutableTagSet and MutableTagSet was not straight forward,
>>   likewise for the PropertiesMap.  Many other things are much easier
>>   though.  One possible nice way to alleviate this with the idea of the
>>   existing notmuch API being the lower-level layer nearly mimicking the
>>   C-API directly.  That way adding new APIs there is more or less
>>   straight forward and there is less time pressure to add them to the
>>   higher level API.  Especially if mixing the APIs is supported.
>
> I think this is in line with the approach described above.  I like it.

I would really like to hear from some of the existing python binding
users on all of the above and what they thing of this approach.  Is this
work worth it to you?  Will it make you more or less likely to want to
migrate to more pythonic bindings?

In particular I'm not that convinced in the suitability of the existing
library as a low-level binding as it currently sits somewhere above
basic wrapping of libnotmuch.  Anyway, I would appreciate any feedback
on the proposed APIs and way forward.


Cheers,
Floris
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch