web interface to notmuch

classic Classic list List threaded Threaded
24 messages Options
12
Matthew Lear Matthew Lear
Reply | Threaded
Open this post in threaded view
|

web interface to notmuch

Hello all. A little side project at work involves me trying to put together part of a knowledge share system where users can query and search email stored and indexed centrally (by offlineimap & notmuch). My intention is to provide a means to support multiple concurrent read-only accesses to the notmuch database from users' web browsers so they can query and search mail.

Consider a few different email addresses being plugged into various systems, all receiving email on different topics. I'd like to build an application which presents a web frontend which I can run on the server which fetches and indexes the mail, and thus present a web interface to search all mail using notmuch.

notmuch-web has not seen much development for a few years.
noservice looks pretty nifty but I'm a little unsure of the status and if it's missing anything fundamental.

I think my requirements are pretty basic:

* Read-only access
* Search and display mail only (no sending), including html mails
* Freeform entry of search terms in accordance with notmuch-search-terms(7).

Would anybody have any ideas about the best way to undertake such a project?

notmuch-web and noservice definitively look like they could be leveraged, but I don't know if I'd be better trying to construct something from the ground up which is better suited / tailored to my requirements (which are much less than either of the above were intended to fulfil).

A standalone app would be preferred rather than having to rely on a web server, although I'm not picky about infrastructure. Web based programming is not my forte so I'd appreciate any feedback relating also to implementation, currently available open source web frameworks which could be used / considered / leveraged, etc.

Many thanks,
--  Matt


_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Brian Sniffen-2 Brian Sniffen-2
Reply | Threaded
Open this post in threaded view
|

Re: web interface to notmuch

I put together something like this, visible at
https://github.com/briansniffen/notmuch/tree/nmweb/contrib/notmuch-web

It's not much of a service.  I am pretty sure it is exploitable---that
content in text/html parts of messages can do Bad Things to your
session.

I haven't thought nearly hard enough about how it will deal with
multiple users.

But it's < 250 lines of Python, so perhaps you can adapt it to what you
need.  It uses web.py, so you *could* run it standalone, but you'll
probably be happier with Apache or nginx or something in front of it,
handling TLS termination and that sort of thing.

It's only approach to sending mail is generating mailto: links that will
open in whatever client the user has configured.

-Brian

Matthew Lear <[hidden email]> writes:

> Hello all. A little side project at work involves me trying to put together
> part of a knowledge share system where users can query and search email
> stored and indexed centrally (by offlineimap & notmuch). My intention is to
> provide a means to support multiple concurrent read-only accesses to the
> notmuch database from users' web browsers so they can query and search mail.
>
> Consider a few different email addresses being plugged into various
> systems, all receiving email on different topics. I'd like to build an
> application which presents a web frontend which I can run on the server
> which fetches and indexes the mail, and thus present a web interface to
> search all mail using notmuch.
>
> notmuch-web has not seen much development for a few years.
> noservice looks pretty nifty but I'm a little unsure of the status and if
> it's missing anything fundamental.
>
> I think my requirements are pretty basic:
>
> * Read-only access
> * Search and display mail only (no sending), including html mails
> * Freeform entry of search terms in accordance with notmuch-search-terms(7).
>
> Would anybody have any ideas about the best way to undertake such a project?
>
> notmuch-web and noservice definitively look like they could be leveraged,
> but I don't know if I'd be better trying to construct something from the
> ground up which is better suited / tailored to my requirements (which are
> much less than either of the above were intended to fulfil).
>
> A standalone app would be preferred rather than having to rely on a web
> server, although I'm not picky about infrastructure. Web based programming
> is not my forte so I'd appreciate any feedback relating also to
> implementation, currently available open source web frameworks which could
> be used / considered / leveraged, etc.
>
> Many thanks,
> --  Matt
> _______________________________________________
> notmuch mailing list
> [hidden email]
> https://notmuchmail.org/mailman/listinfo/notmuch
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Daniel Kahn Gillmor Daniel Kahn Gillmor
Reply | Threaded
Open this post in threaded view
|

Re: web interface to notmuch

On Thu 2017-10-19 11:01:53 -0400, Brian Sniffen wrote:
> I put together something like this, visible at
> https://github.com/briansniffen/notmuch/tree/nmweb/contrib/notmuch-web
>
> It's not much of a service.  I am pretty sure it is exploitable---that
> content in text/html parts of messages can do Bad Things to your
> session.

I think this is the crux of the problem, right?  I was noticing the
other day that notmuch's own mail archives are published in pipermail,
which is *absolutely terrible* compared to dealing with a mailstore with
notmuch as a frontend.  I'd love to be able to expose the archive to the
public this way.

Assuming that you had a sanitize_this_html_part() function available to
you, do you think it would be possible to make this safe?  Have you
considered proposing it for inclusion in contrib upstream?

     --dkg

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch

signature.asc (847 bytes) Download Attachment
Brian Sniffen-2 Brian Sniffen-2
Reply | Threaded
Open this post in threaded view
|

Re: web interface to notmuch

> On Oct 19, 2017, at 12:55 PM, Daniel Kahn Gillmor <[hidden email]> wrote:
>
>> On Thu 2017-10-19 11:01:53 -0400, Brian Sniffen wrote:
>> I put together something like this, visible at
>> https://github.com/briansniffen/notmuch/tree/nmweb/contrib/notmuch-web
>>
>> It's not much of a service.  I am pretty sure it is exploitable---that
>> content in text/html parts of messages can do Bad Things to your
>> session.
>
> I think this is the crux of the problem, right?  I was noticing the
> other day that notmuch's own mail archives are published in pipermail,
> which is *absolutely terrible* compared to dealing with a mailstore with
> notmuch as a frontend.  I'd love to be able to expose the archive to the
> public this way.
>
> Assuming that you had a sanitize_this_html_part() function available to
> you, do you think it would be possible to make this safe?  Have you
> considered proposing it for inclusion in contrib upstream?

I don’t think they can be sanitized. Web tech moves so fast. But maybe they can be isolated. GMail uses a separate domain for the content from the UI; I have hopes about response headers and iframe attributes.

Also, if the whole site’s static—not just the nmweb part—you probably can’t hurt much.
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Daniel Kahn Gillmor Daniel Kahn Gillmor
Reply | Threaded
Open this post in threaded view
|

Re: web interface to notmuch

On Thu 2017-10-19 16:00:33 -0400, Brian Sniffen wrote:
> I don’t think they can be sanitized. Web tech moves so fast.

well, there are at least a handful of python modules that claim to do
some sort of sanitization.

in debian alone, we have at least:

   python3-django-html-sanitizer
   python3-feedparser
   python3-bleach
   python3-w3lib

so, one approach would be to just adopt one of them, and then it's their
fault if it breaks :)

I'm not saying it's a great approach, but it seems better than the
current situation where no sanitization is done at all.

> But maybe they can be isolated. GMail uses a separate domain for the
> content from the UI; I have hopes about response headers and iframe
> attributes.

That's an interesting approach too, though it doesn't isolate message A
from message B, which is a distinct concern.  The worry isn't just that
the content could take over the UI, right?

Maybe isolation and sanitization can be used in combination?  even if
neither of them are perfect, it'd be a damn sight better than pipermail
:P

> Also, if the whole site’s static—not just the nmweb part—you probably
> can’t hurt much.

depends on what kind of harm you're talking about -- i think the privacy
harms are potentially pretty serious.  The public library is static, but
if reading one book meant that you ended up reporting on your future
reading habits (of any book) to some unknown third party, that would be
pretty bad.

       --dkg

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch

signature.asc (847 bytes) Download Attachment
W. Trevor King W. Trevor King
Reply | Threaded
Open this post in threaded view
|

Re: web interface to notmuch

In reply to this post by Matthew Lear
I haven't looked at the backing code in a while, but I really like the
public-inbox [1] approach to archival.  Since Gmane died, Git (and a
few other projects [2]) have also been using the author's hosted
version.  I haven't looked at the backing code in a while, but it's
live Perl, not a static site.  It uses Xapian for search (like
notmuch), but I was unable to talk Eric into using notmuch directly
because of our lack of Perl bindings [3].  Still, it's a pretty
similar idea, and it may be a good fit for you.  Previous discussion
on this list in [4,5].

Cheers,
Trevor

[1]: https://public-inbox.org/README.html
[2]: https://public-inbox.org/hosted.html
[3]: https://public-inbox.org/meta/20141027005553.GA10990@.../
[4]: id:[hidden email]
[5]: id:[hidden email]

--
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch

signature.asc (849 bytes) Download Attachment
Jani Nikula Jani Nikula
Reply | Threaded
Open this post in threaded view
|

Re: web interface to notmuch

In reply to this post by Daniel Kahn Gillmor
On Thu, 19 Oct 2017, Daniel Kahn Gillmor <[hidden email]> wrote:

> On Thu 2017-10-19 11:01:53 -0400, Brian Sniffen wrote:
>> I put together something like this, visible at
>> https://github.com/briansniffen/notmuch/tree/nmweb/contrib/notmuch-web
>>
>> It's not much of a service.  I am pretty sure it is exploitable---that
>> content in text/html parts of messages can do Bad Things to your
>> session.
>
> I think this is the crux of the problem, right?  I was noticing the
> other day that notmuch's own mail archives are published in pipermail,
> which is *absolutely terrible* compared to dealing with a mailstore with
> notmuch as a frontend.  I'd love to be able to expose the archive to the
> public this way.

For the list archive, we could restrict to displaying text/plain only.

BR,
Jani.
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Daniel Kahn Gillmor Daniel Kahn Gillmor
Reply | Threaded
Open this post in threaded view
|

Re: web interface to notmuch

On Sat 2017-10-21 23:00:00 +0300, Jani Nikula wrote:
> For the list archive, we could restrict to displaying text/plain only.

and text/x-diff, surely :)

But yeah, good point.

Brian, what do you think about such a constraint?  would that make your
implementation safe enough to put on the public Internet for a read-only
archive?

    --dkg

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch

signature.asc (847 bytes) Download Attachment
Vladimir Panteleev-2 Vladimir Panteleev-2
Reply | Threaded
Open this post in threaded view
|

Re: web interface to notmuch

Hi,

Sorry to barge in, I noticed this thread and thought I'd try to have a
go at setting up a test DFeed instance.

Here it is:

http://dfeed-notmuch.k3.1azy.net/

There is some more info on the help page:

http://dfeed-notmuch.k3.1azy.net/help

Posting is supported, but it is currently (intentionally) unconfigured
for now.

What do you think?

On 2017-10-21 22:21, Daniel Kahn Gillmor wrote:

> On Sat 2017-10-21 23:00:00 +0300, Jani Nikula wrote:
>> For the list archive, we could restrict to displaying text/plain only.
>
> and text/x-diff, surely :)
>
> But yeah, good point.
>
> Brian, what do you think about such a constraint?  would that make your
> implementation safe enough to put on the public Internet for a read-only
> archive?
>
>      --dkg
>
>
>
> _______________________________________________
> notmuch mailing list
> [hidden email]
> https://notmuchmail.org/mailman/listinfo/notmuch
>

--
Best regards,
  Vladimir
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Matthew Lear Matthew Lear
Reply | Threaded
Open this post in threaded view
|

Re: web interface to notmuch

In reply to this post by Daniel Kahn Gillmor
Thanks for doing that, and thanks to all for the feedback and input so far. For the interface I want to set up, I'd like the ability to enter notmuch search syntax in an input box, and also show tags applied to messages.
The interface presented by the current version of notmuch-web ticks a lot of boxes for me. Speed of being able to enter free-form search syntax (or ideally, selecting from a list of favourite or predefined searches) and returning the results quickly, I think are critical for how I'd like the interface to be used. I see that the DFeed instance has an advanced search facility, but it's a few clicks away.
I've not looked seriously at the other suggestions so far in this thread, though.
Cheers,
--  Matt


On Tue, Oct 24, 2017 at 4:00 AM, Vladimir Panteleev <[hidden email]> wrote:
Hi,

Sorry to barge in, I noticed this thread and thought I'd try to have a go at setting up a test DFeed instance.

Here it is:

http://dfeed-notmuch.k3.1azy.net/

There is some more info on the help page:

http://dfeed-notmuch.k3.1azy.net/help

Posting is supported, but it is currently (intentionally) unconfigured for now.

What do you think?


On 2017-10-21 22:21, Daniel Kahn Gillmor wrote:
On Sat 2017-10-21 23:00:00 +0300, Jani Nikula wrote:
For the list archive, we could restrict to displaying text/plain only.

and text/x-diff, surely :)

But yeah, good point.

Brian, what do you think about such a constraint?  would that make your
implementation safe enough to put on the public Internet for a read-only
archive?

     --dkg



_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch


--
Best regards,
 Vladimir



_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Brian Sniffen-2 Brian Sniffen-2
Reply | Threaded
Open this post in threaded view
|

Re: web interface to notmuch

In reply to this post by Daniel Kahn Gillmor
That's inspiring!  Now there's a demo of nmweb at

https://nmweb.evenmere.org/


It's possible to get it to dump the whole mbox by clicking through the
obvious links; please consider exploring at
https://nmweb.evenmere.org/search/monkey instead.  There are not many
monkeys in the inbox.

-Brian

Vladimir Panteleev <[hidden email]> writes:

> Hi,
>
> Sorry to barge in, I noticed this thread and thought I'd try to have a
> go at setting up a test DFeed instance.
>
> Here it is:
>
> http://dfeed-notmuch.k3.1azy.net/
>
> There is some more info on the help page:
>
> http://dfeed-notmuch.k3.1azy.net/help
>
> Posting is supported, but it is currently (intentionally) unconfigured
> for now.
>
> What do you think?
>
> On 2017-10-21 22:21, Daniel Kahn Gillmor wrote:
>> On Sat 2017-10-21 23:00:00 +0300, Jani Nikula wrote:
>>> For the list archive, we could restrict to displaying text/plain only.
>>
>> and text/x-diff, surely :)
>>
>> But yeah, good point.
>>
>> Brian, what do you think about such a constraint?  would that make your
>> implementation safe enough to put on the public Internet for a read-only
>> archive?
>>
>>      --dkg
>>
>>
>>
>> _______________________________________________
>> notmuch mailing list
>> [hidden email]
>> https://notmuchmail.org/mailman/listinfo/notmuch
>>
>
> --
> Best regards,
>   Vladimir
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Daniel Kahn Gillmor Daniel Kahn Gillmor
Reply | Threaded
Open this post in threaded view
|

Re: web interface to notmuch

On Wed 2017-10-25 18:03:01 -0400, Brian Sniffen wrote:
> That's inspiring!  Now there's a demo of nmweb at
>
> https://nmweb.evenmere.org/

this is very nice, Brian.

Your URL highlighter seems a bit trigger-happy though:

   https://nmweb.evenmere.org/show/8760s7zr47.fsf%40zancas.localnet

I don't think bremner was trying to link to http://index.cc !

> It's possible to get it to dump the whole mbox by clicking through the
> obvious links; please consider exploring at
> https://nmweb.evenmere.org/search/monkey instead.

this is interesting because it shows me threads where some messages have
monkey in them, but i can't tell which messages actually have the
relevant search term.  Maybe it could highlight the found messages?

Also, once i'm looking at one message, i don't see an easy way to go
"next" in the thread.

see: you show off a cool trick, you get requests for more cool tricks :)

> There are not many monkeys in the inbox.

speak for your own inbox, please.

      --dkg

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch

signature.asc (847 bytes) Download Attachment
Brian Sniffen-2 Brian Sniffen-2
Reply | Threaded
Open this post in threaded view
|

Re: web interface to notmuch

Daniel Kahn Gillmor <[hidden email]> writes:

> On Wed 2017-10-25 18:03:01 -0400, Brian Sniffen wrote:
>> That's inspiring!  Now there's a demo of nmweb at
>>
>> https://nmweb.evenmere.org/
>
> this is very nice, Brian.

Thanks!  The part I'm happiest about is the speed: this is as fast as I
remember gmail being.  The Secret Ingredient is HTTP chunked encoding,
accessed through web.py's generators, and careful page design---almost
every byte from the server is renderable as it arrives, and later bytes
never disrupt placement of earlier objects.

> Your URL highlighter seems a bit trigger-happy though:
>
>    https://nmweb.evenmere.org/show/8760s7zr47.fsf%40zancas.localnet
>
> I don't think bremner was trying to link to http://index.cc !

As a wise soul once told me, use a library and then blame them.  This is
the Mozilla Bleach library, used for both sanitizing text/html parts and
for linkifying text/plain parts.  But since that supports filtering:
sure, this can only linkify things starting with 'http[s]://'

>> It's possible to get it to dump the whole mbox by clicking through the
>> obvious links; please consider exploring at
>> https://nmweb.evenmere.org/search/monkey instead.
>
> this is interesting because it shows me threads where some messages have
> monkey in them, but i can't tell which messages actually have the
> relevant search term.  Maybe it could highlight the found messages?

Very careful examination would have shown that the em-dashes between
author and subject were red for matches.  Now matches are in italics.

> Also, once i'm looking at one message, i don't see an easy way to go
> "next" in the thread.

Yup.  The thread object isn't accessible by then: it existed in the
scope of the search query, and is gone by the time we show the message.
get_replies isn't available.  So what's the alternative?
get_thread_id(), search for that thread id, identify this message *in*
that thread id, and then link to the next message with a "next" link?
While doing it, why not show the thread structure at the bottom of the
message, I guess.

With bleach integrated (all of five lines), I think this is safe enough
to let random notmuch users run it.  The worst they'll do is expose
their mailstore on tcp/8080.  Any interest in taking this into the
upstream contrib directory?

-Brian
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Daniel Kahn Gillmor Daniel Kahn Gillmor
Reply | Threaded
Open this post in threaded view
|

Re: web interface to notmuch

On Fri 2017-10-27 00:04:21 -0400, Brian Sniffen wrote:
> Thanks!  The part I'm happiest about is the speed:

amen, it feels very lightweight.

> Very careful examination would have shown that the em-dashes between
> author and subject were red for matches.  Now matches are in italics.

cool.  perhaps assigning a class to those elements and stashing some CSS
would make that easier for folks to experiment with (and probably reduce
the bytecount transfered)?

or would that hurt the rendering time for some reason i'm unaware of?  i
haven't thought about these mechanics as much as you have.

> Yup.  The thread object isn't accessible by then: it existed in the
> scope of the search query, and is gone by the time we show the message.
> get_replies isn't available.  So what's the alternative?
> get_thread_id(), search for that thread id, identify this message *in*
> that thread id, and then link to the next message with a "next" link?
> While doing it, why not show the thread structure at the bottom of the
> message, I guess.

yep, i think that's right.

> With bleach integrated (all of five lines), I think this is safe enough
> to let random notmuch users run it.  The worst they'll do is expose
> their mailstore on tcp/8080.  Any interest in taking this into the
> upstream contrib directory?

Yes, i think this should move into contrib/ upstream.  And we should
think about what might be the appropriate way to package it for debian,
too.

        --dkg

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch

signature.asc (847 bytes) Download Attachment
Daniel Kahn Gillmor Daniel Kahn Gillmor
Reply | Threaded
Open this post in threaded view
|

Re: web interface to notmuch

In reply to this post by Brian Sniffen-2
On Fri 2017-10-27 00:04:21 -0400, Brian Sniffen wrote:
> With bleach integrated (all of five lines), I think this is safe enough
> to let random notmuch users run it.

hm, bleach might be a little too aggressive.

jrollins just pointed toward:

https://nmweb.evenmere.org/show/87innmvvam.fsf%40ligo.caltech.edu

which i'm pretty sure had actual content initially
(id:[hidden email]) but it starts with stdin
redirection (using a left angle bracket) and then the rest of the
message is gone :/

        --dkg

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch

signature.asc (847 bytes) Download Attachment
Matthew Lear Matthew Lear
Reply | Threaded
Open this post in threaded view
|

Re: web interface to notmuch

In reply to this post by Daniel Kahn Gillmor

I've had a play with this this morning. It's great! The speed and page loading efficiency is fantastic. Would be really nice if we could go next/previous in the thread (yes I know I'm complaining about one extra mouse click). Also, if I select a date via the drop down I need to delete the timestamp that appears prior to searching, otherwise there is a xapian error.
This is definitely a candidate solution for me, though.
Thanks Brian!
Cheers,
  Matt


On Fri, 27 Oct 2017, 05:24 Daniel Kahn Gillmor, <[hidden email]> wrote:
On Fri 2017-10-27 00:04:21 -0400, Brian Sniffen wrote:
> Thanks!  The part I'm happiest about is the speed:

amen, it feels very lightweight.

> Very careful examination would have shown that the em-dashes between
> author and subject were red for matches.  Now matches are in italics.

cool.  perhaps assigning a class to those elements and stashing some CSS
would make that easier for folks to experiment with (and probably reduce
the bytecount transfered)?

or would that hurt the rendering time for some reason i'm unaware of?  i
haven't thought about these mechanics as much as you have.

> Yup.  The thread object isn't accessible by then: it existed in the
> scope of the search query, and is gone by the time we show the message.
> get_replies isn't available.  So what's the alternative?
> get_thread_id(), search for that thread id, identify this message *in*
> that thread id, and then link to the next message with a "next" link?
> While doing it, why not show the thread structure at the bottom of the
> message, I guess.

yep, i think that's right.

> With bleach integrated (all of five lines), I think this is safe enough
> to let random notmuch users run it.  The worst they'll do is expose
> their mailstore on tcp/8080.  Any interest in taking this into the
> upstream contrib directory?

Yes, i think this should move into contrib/ upstream.  And we should
think about what might be the appropriate way to package it for debian,
too.

        --dkg

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Brian Sniffen-2 Brian Sniffen-2
Reply | Threaded
Open this post in threaded view
|

Re: web interface to notmuch

In reply to this post by Daniel Kahn Gillmor
Daniel Kahn Gillmor <[hidden email]> writes:

> On Fri 2017-10-27 00:04:21 -0400, Brian Sniffen wrote:
>> With bleach integrated (all of five lines), I think this is safe enough
>> to let random notmuch users run it.
>
> hm, bleach might be a little too aggressive.
>
> jrollins just pointed toward:
>
> https://nmweb.evenmere.org/show/87innmvvam.fsf%40ligo.caltech.edu

That's fixed in 53403ecd, and there's some examples of bleach on a rope
at
https://nmweb.evenmere.org/show/20141107190321.GL23609%40odin.tremily.us

The mbox URL is linkified, the many other link-like texts aren't.


Next/prev links are at the bottom, and a thread listing.  I haven't
thought through how to get the body delivered immediately, but speed
seems acceptable.  Next up, some style revisions---and I'd love
proposals for something that looks less awful, or at least makes the
interface more clear.  UI design is a strong anti-specialty for me.
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Matthew Lear Matthew Lear
Reply | Threaded
Open this post in threaded view
|

Re: web interface to notmuch

On Fri, 27 Oct 2017, 18:52 Brian Sniffen, <[hidden email]> wrote:
Daniel Kahn Gillmor <[hidden email]> writes:

> On Fri 2017-10-27 00:04:21 -0400, Brian Sniffen wrote:
>> With bleach integrated (all of five lines), I think this is safe enough
>> to let random notmuch users run it.
>
> hm, bleach might be a little too aggressive.
>
> jrollins just pointed toward:
>
> https://nmweb.evenmere.org/show/87innmvvam.fsf%40ligo.caltech.edu

That's fixed in 53403ecd, and there's some examples of bleach on a rope
at
https://nmweb.evenmere.org/show/20141107190321.GL23609%40odin.tremily.us

The mbox URL is linkified, the many other link-like texts aren't.


Next/prev links are at the bottom, and a thread listing.  I haven't
thought through how to get the body delivered immediately, but speed
seems acceptable.  Next up, some style revisions---and I'd love
proposals for something that looks less awful, or at least makes the
interface more clear.  UI design is a strong anti-specialty for me.

I've been running this today - standalone on localhost port 80 with the built in CherryPy web server on my mail store. First impressions are it's terrific :-) As my intended 'target' mail store will be geared towards a 'internal work stuff knowledge collection', lots of emails contain html links to intranet pages and sites. I can adapt the bleach usage to suit (or just remove it), but along the way of searching and viewing mail, I've encountered quite a few occurrences of failing to UnicodeEncode. An example backtrace looks like this:

Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/web/application.py", line 239, in process
    return self.handle()
  File "/usr/lib/python2.7/dist-packages/web/application.py", line 230, in handle
    return self._delegate(fn, self.fvars, args)
  File "/usr/lib/python2.7/dist-packages/web/application.py", line 420, in _delegate
    return handle_class(cls)
  File "/usr/lib/python2.7/dist-packages/web/application.py", line 396, in handle_class
    return tocall(*args)
  File "/b/git/notmuch-brians.git/contrib/notmuch-web/nmweb.py", line 153, in GET
    sprefix=webprefix)
  File "/usr/lib/python2.7/dist-packages/jinja2/environment.py", line 989, in render
    return self.environment.handle_exception(exc_info, True)
  File "/usr/lib/python2.7/dist-packages/jinja2/environment.py", line 754, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "templates/show.html", line 1, in top-level template code
    {% extends "base.html" %}
  File "templates/base.html", line 32, in top-level template code
    {% block content %}
  File "templates/show.html", line 12, in block "content"
    {% for part in format_message(m.get_filename(),mid): %}{{ part|safe }}{% endfor %}
  File "/b/git/notmuch-brians.git/contrib/notmuch-web/nmweb.py", line 245, in format_message_walk
    tags=safe_tags).encode(part.get_content_charset('ascii')))
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201c' in position 1141: ordinal not in range(256)

127.0.0.1:60968 - - [31/Oct/2017 17:00:02] "HTTP/1.1 GET /show/[hidden email]" - 500 Internal Server Error

I'm no Python expert, but from a quick google it would seem like the cause of such an exception is related to not using utf-8.

Brian - do you think something needs modifying in nmweb.py to cater for this type of thing, or is this somehow related my own mailstore (not sure why that would be as my messages haven't been modified).
Cheers,
--  Matt


_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Tomas Nordin Tomas Nordin
Reply | Threaded
Open this post in threaded view
|

Re: web interface to notmuch

Hi Matthew

Sorry for just chiming in here out of the blue. I don't really know
anything on the code you are discussing, but I have some experience with
python.

Matthew Lear <[hidden email]> writes:

> Traceback (most recent call last):
>   File "/usr/lib/python2.7/dist-packages/web/application.py", line 239, in
> process
>     return self.handle()
>   File "/usr/lib/python2.7/dist-packages/web/application.py", line 230, in
> handle
>     return self._delegate(fn, self.fvars, args)
>   File "/usr/lib/python2.7/dist-packages/web/application.py", line 420, in
> _delegate
>     return handle_class(cls)
>   File "/usr/lib/python2.7/dist-packages/web/application.py", line 396, in
> handle_class
>     return tocall(*args)
>   File "/b/git/notmuch-brians.git/contrib/notmuch-web/nmweb.py", line 153,
> in GET
>     sprefix=webprefix)
>   File "/usr/lib/python2.7/dist-packages/jinja2/environment.py", line 989,
> in render
>     return self.environment.handle_exception(exc_info, True)
>   File "/usr/lib/python2.7/dist-packages/jinja2/environment.py", line 754,
> in handle_exception
>     reraise(exc_type, exc_value, tb)
>   File "templates/show.html", line 1, in top-level template code
>     {% extends "base.html" %}
>   File "templates/base.html", line 32, in top-level template code
>     {% block content %}
>   File "templates/show.html", line 12, in block "content"
>     {% for part in format_message(m.get_filename(),mid): %}{{ part|safe
> }}{% endfor %}
>   File "/b/git/notmuch-brians.git/contrib/notmuch-web/nmweb.py", line 245,
> in format_message_walk
>     tags=safe_tags).encode(part.get_content_charset('ascii')))

My guess is that the function part.get_content_charset is requesting the
encoding used for a message, providing 'ascii' as a backup if not found.
It is getting 'latin-1', which is hence tried for encoding output.

> UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201c'
> in position 1141: ordinal not in range(256)

Here is an interactive python session to reproduce:

>>> u = u'\u201c'
>>> u
u'\u201c'
>>> type(u)
<type 'unicode'> # (un-encoded)
>>> u.encode('utf-8')
'\xe2\x80\x9c'   # utf-8 for encoding work fine
>>> print u.encode('utf-8')

>>> print u.encode('latin-1')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201c' in position 0: ordinal not in range(256)

The character is not encodable with latin-1. So one should check that
the function getting the encoding is doing a proper job and if so blame
the message information.

Just my 2 cents

Best regards
--
Tomas
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Brian Sniffen-2 Brian Sniffen-2
Reply | Threaded
Open this post in threaded view
|

Re: web interface to notmuch

In reply to this post by Matthew Lear
> just remove it), but along the way of searching and viewing mail, I've
> encountered quite a few occurrences of failing to UnicodeEncode. An example
> backtrace looks like this:
>
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/dist-packages/web/application.py", line 239, in
> process
>     return self.handle()
>   File "/usr/lib/python2.7/dist-packages/web/application.py", line 230, in
> handle
>     return self._delegate(fn, self.fvars, args)
>   File "/usr/lib/python2.7/dist-packages/web/application.py", line 420, in
> _delegate
>     return handle_class(cls)
>   File "/usr/lib/python2.7/dist-packages/web/application.py", line 396, in
> handle_class
>     return tocall(*args)
>   File "/b/git/notmuch-brians.git/contrib/notmuch-web/nmweb.py", line 153,
> in GET
>     sprefix=webprefix)
>   File "/usr/lib/python2.7/dist-packages/jinja2/environment.py", line 989,
> in render
>     return self.environment.handle_exception(exc_info, True)
>   File "/usr/lib/python2.7/dist-packages/jinja2/environment.py", line 754,
> in handle_exception
>     reraise(exc_type, exc_value, tb)
>   File "templates/show.html", line 1, in top-level template code
>     {% extends "base.html" %}
>   File "templates/base.html", line 32, in top-level template code
>     {% block content %}
>   File "templates/show.html", line 12, in block "content"
>     {% for part in format_message(m.get_filename(),mid): %}{{ part|safe
> }}{% endfor %}
>   File "/b/git/notmuch-brians.git/contrib/notmuch-web/nmweb.py", line 245,
> in format_message_walk
>     tags=safe_tags).encode(part.get_content_charset('ascii')))
> UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201c' in
> position 1141: ordinal not in range(256)
>
> 127.0.0.1:60968 - - [31/Oct/2017 17:00:02] "HTTP/1.1 GET /show/
> [hidden email]" -
> 500 Internal Server Error
>
> I'm no Python expert, but from a quick google it would seem like the cause
> of such an exception is related to not using utf-8.

Neat.  So to get there, this has to be a text/html part.  It has to have
been decoded, either with the declared content type or with ascii.  If a
\u201c (left double quote) showed up, it didn't get decoded as
ascii---and indeed, it looks like the content-type specifies latin-1.
But now when we try to encode back, using the same latin-1, it fails?
That's really neat.

> Brian - do you think something needs modifying in nmweb.py to cater for
> this type of thing, or is this somehow related my own mailstore (not sure
> why that would be as my messages haven't been modified).

Lots of mail has busted encoding.  I've done some defensive work against
that---look at decodeAnyway and shed a tear for purity---but clearly not
enough.  Can you send me a message that causes the problem?

In the mean time, I think like 245 ought to be, appropriately indented:

    tags=safe_tags).encode(part.get_content_charset('ascii'),
    'xmlcharrefreplace'))

Thanks for the report---investigating it showed me that the search box
doesn't tolerate that character either.

-Brian
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
12