[PATCH] python: open messages in binary mode

classic Classic list List threaded Threaded
9 messages Options
Florian Klink Florian Klink
Reply | Threaded
Open this post in threaded view
|

[PATCH] python: open messages in binary mode

currently, notmuch's get_message_parts() opens the file in text mode and passes
the file object to email.message_from_file(fp). In case the email contains
UTF-8 characters, reading might fail inside email.parser with the following exception:

  File "/usr/lib/python3.6/site-packages/notmuch/message.py", line 591, in get_message_parts
    email_msg = email.message_from_binary_file(fp)
  File "/usr/lib/python3.6/email/__init__.py", line 62, in message_from_binary_file
    return BytesParser(*args, **kws).parse(fp)
  File "/usr/lib/python3.6/email/parser.py", line 110, in parse
    return self.parser.parse(fp, headersonly)
  File "/usr/lib/python3.6/email/parser.py", line 54, in parse
    data = fp.read(8192)
  File "/usr/lib/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 1865: invalid continuation byte

To fix this, read file in binary mode and pass to
email.message_from_binary_file(fp).

Signed-off-by: Florian Klink <[hidden email]>
---
 bindings/python/notmuch/message.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/bindings/python/notmuch/message.py b/bindings/python/notmuch/message.py
index cce377d0..531b22d0 100644
--- a/bindings/python/notmuch/message.py
+++ b/bindings/python/notmuch/message.py
@@ -587,8 +587,8 @@ class Message(Python3StringMixIn):
 
     def get_message_parts(self):
         """Output like notmuch show"""
-        fp = open(self.get_filename())
-        email_msg = email.message_from_file(fp)
+        fp = open(self.get_filename(), 'rb')
+        email_msg = email.message_from_binary_file(fp)
         fp.close()
 
         out = []
--
2.14.1

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] python: open messages in binary mode

Florian Klink <[hidden email]> writes:

> To fix this, read file in binary mode and pass to
> email.message_from_binary_file(fp).
>

Thanks for the patch, but notmuch is not (yet) python3 only. Apparently
that function is only since python 3.2. I'm not sure if/when we'll drop
python 2.7 support, but not without deprecating it for a few releases.

Also, since compatibility is a bit tricky here, it would be great to
have a test. See test/T390-python.sh for some examples.

d
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Gaute Hope Gaute Hope
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] python: open messages in binary mode

David Bremner writes on august 25, 2017 0:11:
> Florian Klink <[hidden email]> writes:
>
>> To fix this, read file in binary mode and pass to
>> email.message_from_binary_file(fp).
>>
>
> Thanks for the patch, but notmuch is not (yet) python3 only. Apparently
> that function is only since python 3.2. I'm not sure if/when we'll drop
> python 2.7 support, but not without deprecating it for a few releases.

Is there anyone still exclusively on Python 2.7? Perhaps the time is
ripe for starting that process? Encoding compatability is an unholy mess
to maintain for one Python distro.

Is any of alot, afew, etc still on Python 2 only?

Regards, Gaute

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Florian Klink Florian Klink
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] python: open messages in binary mode

>>that function is only since python 3.2. I'm not sure if/when we'll drop
>>python 2.7 support, but not without deprecating it for a few releases.

>Is there anyone still exclusively on Python 2.7? Perhaps the time is
>ripe for starting that process? Encoding compatability is an unholy
>mess to maintain for one Python distro.

If Python 2 doesn't have email.message_from_binary_file(), it might be the bug
occuring to be can't really be fixed in Python 2 anyways. Maybe it's possible to
open the file in binary mode on Python 2, and pass this to
email.message_from_file() though, I will tinker around a bit this evening, and
let you know.

>Is any of alot, afew, etc still on Python 2 only?

afew works on both Python 2 and 3
alot seems to currently be Python 2 only (at least the Travis runs are), but it
looks like they are thinking about moving to Python 3 and dropping Python 2:
https://github.com/pazz/alot/issues/1047#issuecomment-300713819

Florian
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Florian Klink Florian Klink
Reply | Threaded
Open this post in threaded view
|

[PATCH v2 1/2] python: open messages in binary mode

In reply to this post by David Bremner-2
currently, notmuch's get_message_parts() opens the file in text mode and passes
the file object to email.message_from_file(fp). In case the email contains
UTF-8 characters, reading might fail inside email.parser with the following exception:

  File "/usr/lib/python3.6/site-packages/notmuch/message.py", line 591, in get_message_parts
    email_msg = email.message_from_binary_file(fp)
  File "/usr/lib/python3.6/email/__init__.py", line 62, in message_from_binary_file
    return BytesParser(*args, **kws).parse(fp)
  File "/usr/lib/python3.6/email/parser.py", line 110, in parse
    return self.parser.parse(fp, headersonly)
  File "/usr/lib/python3.6/email/parser.py", line 54, in parse
    data = fp.read(8192)
  File "/usr/lib/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 1865: invalid continuation byte

To fix this, read file in binary mode and pass to
email.message_from_binary_file(fp).

Unfortunately, Python 2 doesn't support
email.message_from_binary_file(fp), so keep using
email.message_from_file(fp) there.

Signed-off-by: Florian Klink <[hidden email]>
---
 bindings/python/notmuch/message.py | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/bindings/python/notmuch/message.py b/bindings/python/notmuch/message.py
index cce377d0..d5b98e4f 100644
--- a/bindings/python/notmuch/message.py
+++ b/bindings/python/notmuch/message.py
@@ -41,6 +41,7 @@ from .tag import Tags
 from .filenames import Filenames
 
 import email
+import sys
 
 
 class Message(Python3StringMixIn):
@@ -587,8 +588,11 @@ class Message(Python3StringMixIn):
 
     def get_message_parts(self):
         """Output like notmuch show"""
-        fp = open(self.get_filename())
-        email_msg = email.message_from_file(fp)
+        fp = open(self.get_filename(), 'rb')
+        if sys.version_info[0] < 3:
+            email_msg = email.message_from_file(fp)
+        else:
+            email_msg = email.message_from_binary_file(fp)
         fp.close()
 
         out = []
--
2.14.1

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Florian Klink Florian Klink
Reply | Threaded
Open this post in threaded view
|

[PATCH v2 2/2] T390-python: add test for get_message_parts and special characters

This imports a message with ISO-8859-2 encoded characters, then opens
the database using the python bindings. We peek through all mesage
parts, afterwards print the message id.

Signed-off-by: Florian Klink <[hidden email]>
Signed-off-by: Andreas Rammhold <[hidden email]>
---
 test/T390-python.sh | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/test/T390-python.sh b/test/T390-python.sh
index a9a61145..5921cac9 100755
--- a/test/T390-python.sh
+++ b/test/T390-python.sh
@@ -56,5 +56,22 @@ grep '^[0-9a-f]' OUTPUT > INITIAL_OUTPUT
 test_begin_subtest "output of count matches test code"
 notmuch count --lastmod '*' | cut -f2-3 > OUTPUT
 test_expect_equal_file INITIAL_OUTPUT OUTPUT
+add_message '[content-type]="text/plain; charset=iso-8859-2"' \
+            '[content-transfer-encoding]=8bit' \
+            '[subject]="ISO-8859-2 encoded message"' \
+            "[body]=$'Czech word tu\350\362\341\350\350\355 means pinguin\'s.'" # ISO-8859-2 characters are generated by shell's escape sequences
+test_begin_subtest "Add ISO-8859-2 encoded message, call get_message_parts"
+test_python <<EOF
+import notmuch
+db = notmuch.Database(mode=notmuch.Database.MODE.READ_ONLY)
+q_new = notmuch.Query(db, 'ISO-8859-2 encoded message')
+for m in q_new.search_messages():
+    for mp in m.get_message_parts():
+      continue
+    print(m.get_message_id())
+EOF
+
+notmuch search --sort=oldest-first --output=messages "tučňáččí" | sed s/^id:// > EXPECTED
+test_expect_equal_file EXPECTED OUTPUT
 
 test_done
--
2.14.1

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v2 1/2] python: open messages in binary mode

In reply to this post by Florian Klink
Florian Klink <[hidden email]> writes:

> currently, notmuch's get_message_parts() opens the file in text mode and passes
> the file object to email.message_from_file(fp). In case the email contains
> UTF-8 characters, reading might fail inside email.parser with the following exception:
>

merged series to master. Thanks for the fix. BTW, I noticed the bug only
happens with python3.

d
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Florian Klink Florian Klink
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v2 1/2] python: open messages in binary mode

>merged series to master. Thanks for the fix. BTW, I noticed the bug only
>happens with python3.

Thanks for merging :-)
Yes, most distributions still symlink /usr/bin/python to python2 - maybe that's
the reason why a lot of code still runs on python 2…
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Tomi Ollila-2 Tomi Ollila-2
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v2 1/2] python: open messages in binary mode

On Mon, Oct 02 2017, Florian Klink wrote:

>>merged series to master. Thanks for the fix. BTW, I noticed the bug only
>>happens with python3.
>
> Thanks for merging :-)
> Yes, most distributions still symlink /usr/bin/python to python2 - maybe that's
> the reason why a lot of code still runs on python 2…

In windows environments one often sees just python2 :(

In macos environments one often sees just python2 :(

In CentOS/RHEL one have to pick python3 (3.4) from EPEL, and python3
did not seem to work out of the box after installing (had to do
ln -s python34 python3 ) :(


Tomi
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch