[PATCH] lib: add support for thread:<message-id> queries

classic Classic list List threaded Threaded
8 messages Options
Jani Nikula Jani Nikula
Reply | Threaded
Open this post in threaded view
|

[PATCH] lib: add support for thread:<message-id> queries

Add support for querying threads using message-ids in addition to
thread-ids. The main benefit is that thread queries via message-ids
are portable across databases, re-indexing, and thread joining, while
thread ids can be somewhat transient. A thread:<message-id> query can
be shared between notmuch users, while a thread:<thread-id> query may
cease to work after the regular delivery of a message that joins
threads.

What previously required:

$ notmuch search $(notmuch search --output=threads id:<message-id>)

can now be reduced to:

$ notmuch search thread:<message-id>

We limit the query to message-ids that have @ to optimize regular
thread-id queries and to avoid collisions with thread-ids which are
guaranteed (or which we can guarantee) to not contain @. This seems
reasonable, as valid message-ids will have @.

The performance penalty for regular thread-id queries seems to be
neglible, and thread queries via message-ids seem to be about 10%
slower than the direct thread-id query counterpart.
---
 doc/man7/notmuch-search-terms.rst |  9 ++++---
 lib/Makefile.local                |  1 +
 lib/database.cc                   |  6 ++++-
 lib/thread-fp.cc                  | 53 +++++++++++++++++++++++++++++++++++++++
 lib/thread-fp.h                   | 39 ++++++++++++++++++++++++++++
 5 files changed, 104 insertions(+), 4 deletions(-)
 create mode 100644 lib/thread-fp.cc
 create mode 100644 lib/thread-fp.h

diff --git a/doc/man7/notmuch-search-terms.rst b/doc/man7/notmuch-search-terms.rst
index b27f31f7545c..f3723ceb959b 100644
--- a/doc/man7/notmuch-search-terms.rst
+++ b/doc/man7/notmuch-search-terms.rst
@@ -52,7 +52,7 @@ indicate user-supplied values):
 
 -  id:<message-id>
 
--  thread:<thread-id>
+-  thread:<message-id> or thread:<thread-id>
 
 -  folder:<maildir-folder>
 
@@ -101,10 +101,13 @@ For **id:**, message ID values are the literal contents of the
 Message-ID: header of email messages, but without the '<', '>'
 delimiters.
 
-The **thread:** prefix can be used with the thread ID values that are
+The **thread:** prefix can be used with either the message ID of any
+of the messages in the thread or the thread ID values that are
 generated internally by notmuch (and do not appear in email messages).
 These thread ID values can be seen in the first column of output from
-**notmuch search**
+**notmuch search**. The thread ID values are transient. Thread queries
+by message ID are only available if notmuch is built with **Xapian
+Field Processors** (see below).
 
 The **path:** prefix searches for email messages that are in
 particular directories within the mail store. The directory must be
diff --git a/lib/Makefile.local b/lib/Makefile.local
index 8aa03891d775..123ef04c64a9 100644
--- a/lib/Makefile.local
+++ b/lib/Makefile.local
@@ -58,6 +58,7 @@ libnotmuch_cxx_srcs = \
  $(dir)/query-fp.cc      \
  $(dir)/config.cc \
  $(dir)/regexp-fields.cc \
+ $(dir)/thread-fp.cc \
  $(dir)/thread.cc
 
 libnotmuch_modules := $(libnotmuch_c_srcs:.c=.o) $(libnotmuch_cxx_srcs:.cc=.o)
diff --git a/lib/database.cc b/lib/database.cc
index 35c66939bdcf..82d2dcf22aa0 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -21,6 +21,7 @@
 #include "database-private.h"
 #include "parse-time-vrp.h"
 #include "query-fp.h"
+#include "thread-fp.h"
 #include "regexp-fields.h"
 #include "string-util.h"
 
@@ -258,7 +259,8 @@ prefix_t prefix_table[] = {
     { "directory", "XDIRECTORY", NOTMUCH_FIELD_NO_FLAGS },
     { "file-direntry", "XFDIRENTRY", NOTMUCH_FIELD_NO_FLAGS },
     { "directory-direntry", "XDDIRENTRY", NOTMUCH_FIELD_NO_FLAGS },
-    { "thread", "G", NOTMUCH_FIELD_EXTERNAL },
+    { "thread", "G", NOTMUCH_FIELD_EXTERNAL |
+ NOTMUCH_FIELD_PROCESSOR },
     { "tag", "K", NOTMUCH_FIELD_EXTERNAL |
  NOTMUCH_FIELD_PROCESSOR },
     { "is", "K", NOTMUCH_FIELD_EXTERNAL |
@@ -317,6 +319,8 @@ _setup_query_field (const prefix_t *prefix, notmuch_database_t *notmuch)
     fp = (new DateFieldProcessor())->release ();
  else if (STRNCMP_LITERAL(prefix->name, "query") == 0)
     fp = (new QueryFieldProcessor (*notmuch->query_parser, notmuch))->release ();
+ else if (STRNCMP_LITERAL(prefix->name, "thread") == 0)
+    fp = (new ThreadFieldProcessor (notmuch))->release ();
  else
     fp = (new RegexpFieldProcessor (prefix->name, prefix->flags,
     *notmuch->query_parser, notmuch))->release ();
diff --git a/lib/thread-fp.cc b/lib/thread-fp.cc
new file mode 100644
index 000000000000..f15aabba18c0
--- /dev/null
+++ b/lib/thread-fp.cc
@@ -0,0 +1,53 @@
+/* thread-fp.cc - "thread:" field processor glue
+ *
+ * This file is part of notmuch.
+ *
+ * Copyright © 2017 Jani Nikula
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see https://www.gnu.org/licenses/ .
+ */
+
+#include "database-private.h"
+#include "thread-fp.h"
+
+#if HAVE_XAPIAN_FIELD_PROCESSOR
+Xapian::Query
+ThreadFieldProcessor::operator() (const std::string & thread)
+{
+    std::string term_prefix = _find_prefix ("thread");
+    std::string thread_id;
+    notmuch_message_t *message = NULL;
+
+    /*
+     * Only support thread queries via message-ids that have @. The
+     * reason for this is twofold: First, it's an optimization for
+     * regular thread-id queries. Second, and more importantly, it
+     * prevents (potentially malicious) message-id and thread-id
+     * collisions. Regular thread-ids are guaranteed to not have @ in
+     * them, and we only lose the ability to query threads by invalid
+     * message-ids that don't have @ in them.
+     */
+    if (strchr (thread.c_str (), '@'))
+ notmuch_database_find_message (notmuch, thread.c_str (), &message);
+
+    if (message) {
+ thread_id = notmuch_message_get_thread_id (message);
+ notmuch_message_destroy (message);
+    } else {
+ thread_id = thread;
+    }
+
+    return Xapian::Query (term_prefix + thread_id);
+}
+#endif
diff --git a/lib/thread-fp.h b/lib/thread-fp.h
new file mode 100644
index 000000000000..3f88a5831b6b
--- /dev/null
+++ b/lib/thread-fp.h
@@ -0,0 +1,39 @@
+/* thread-fp.h - thread field processor glue
+ *
+ * This file is part of notmuch.
+ *
+ * Copyright © 2017 Jani Nikula
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see https://www.gnu.org/licenses/ .
+ */
+
+#ifndef NOTMUCH_THREAD_FP_H
+#define NOTMUCH_THREAD_FP_H
+
+#include <xapian.h>
+#include "notmuch.h"
+
+#if HAVE_XAPIAN_FIELD_PROCESSOR
+class ThreadFieldProcessor : public Xapian::FieldProcessor {
+ protected:
+    notmuch_database_t *notmuch;
+
+ public:
+    ThreadFieldProcessor (notmuch_database_t *notmuch_)
+ : notmuch(notmuch_) { };
+
+    Xapian::Query operator()(const std::string & str);
+};
+#endif
+#endif /* NOTMUCH_THREAD_FP_H */
--
2.11.0

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Daniel Kahn Gillmor Daniel Kahn Gillmor
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] lib: add support for thread:<message-id> queries

On Mon 2017-10-16 22:07:54 +0300, Jani Nikula wrote:

> Add support for querying threads using message-ids in addition to
> thread-ids. The main benefit is that thread queries via message-ids
> are portable across databases, re-indexing, and thread joining, while
> thread ids can be somewhat transient. A thread:<message-id> query can
> be shared between notmuch users, while a thread:<thread-id> query may
> cease to work after the regular delivery of a message that joins
> threads.
>
> What previously required:
>
> $ notmuch search $(notmuch search --output=threads id:<message-id>)
>
> can now be reduced to:
>
> $ notmuch search thread:<message-id>
>
> We limit the query to message-ids that have @ to optimize regular
> thread-id queries and to avoid collisions with thread-ids which are
> guaranteed (or which we can guarantee) to not contain @. This seems
> reasonable, as valid message-ids will have @.
>
> The performance penalty for regular thread-id queries seems to be
> neglible, and thread queries via message-ids seem to be about 10%
> slower than the direct thread-id query counterpart.
The rationale for this patch really good.  This is functionality we
should make available directly without the shell substitution
shenanigans described above, and i like the elegance of determining
whether it's a message-id or a thread id based on the presence of an '@'
symbol.

i have read the patch and it seems reasonable to me, though i'm not
particularly confident in my ability to catch any subtle bugs in logic
about manipulating Xapian.

I think we should adopt this proposal, though i welcome additional
review from people with more Xapian-fu than myself.

       --dkg

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch

signature.asc (847 bytes) Download Attachment
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] lib: add support for thread:<message-id> queries

In reply to this post by Jani Nikula
Jani Nikula <[hidden email]> writes:


> +    if (strchr (thread.c_str (), '@'))
> + notmuch_database_find_message (notmuch, thread.c_str (), &message);
> +
> +    if (message) {
> + thread_id = notmuch_message_get_thread_id (message);
> + notmuch_message_destroy (message);
> +    } else {
> + thread_id = thread;
> +    }
> +
> +    return Xapian::Query (term_prefix + thread_id);
> +}

The other field processors throw Xapian::QueryParserErrors in case bad
things happen (e.g. failure to read from the database). This seems to be
better than nothing as it allows transmitting some detail to the
user. See e.g. "regex error reporting" in test/T650-regexp-query.sh. And
of course, speaking of tests, this should have one.

d

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Mark Walters Mark Walters
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] lib: add support for thread:<message-id> queries

In reply to this post by Jani Nikula

Hi

On Mon, 16 Oct 2017, Jani Nikula <[hidden email]> wrote:
> Add support for querying threads using message-ids in addition to
> thread-ids. The main benefit is that thread queries via message-ids
> are portable across databases, re-indexing, and thread joining, while
> thread ids can be somewhat transient. A thread:<message-id> query can
> be shared between notmuch users, while a thread:<thread-id> query may
> cease to work after the regular delivery of a message that joins
> threads.

I like the idea.

Just one thought -- is any form of recursion allowed in this sort of
operator so could we have thread:'some query' which would expand to all
threads matching the query? I am guessing this may not be possible now
but would it be worth requiring the thread form to be
thread:id:<message id> rather than thread:<message id>?

Best wishes

Mark


_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] lib: add support for thread:<message-id> queries

Mark Walters <[hidden email]> writes:

> Hi
>
> On Mon, 16 Oct 2017, Jani Nikula <[hidden email]> wrote:
>> Add support for querying threads using message-ids in addition to
>> thread-ids. The main benefit is that thread queries via message-ids
>> are portable across databases, re-indexing, and thread joining, while
>> thread ids can be somewhat transient. A thread:<message-id> query can
>> be shared between notmuch users, while a thread:<thread-id> query may
>> cease to work after the regular delivery of a message that joins
>> threads.
>
> I like the idea.
>
> Just one thought -- is any form of recursion allowed in this sort of
> operator so could we have thread:'some query' which would expand to all
> threads matching the query? I am guessing this may not be possible now
> but would it be worth requiring the thread form to be
> thread:id:<message id> rather than thread:<message id>?

In id:[hidden email] I used thread:{} to
introduce an arbitrary query. That syntax seems ok to me, and doesn't
conflict with Jani's patch. I think Jani mentioned at the time that
thread:{id:foo} was a too much typing.

d
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Jani Nikula Jani Nikula
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] lib: add support for thread:<message-id> queries

In reply to this post by David Bremner-2
On Tue, 17 Oct 2017, David Bremner <[hidden email]> wrote:

> Jani Nikula <[hidden email]> writes:
>
>
>> +    if (strchr (thread.c_str (), '@'))
>> + notmuch_database_find_message (notmuch, thread.c_str (), &message);
>> +
>> +    if (message) {
>> + thread_id = notmuch_message_get_thread_id (message);
>> + notmuch_message_destroy (message);
>> +    } else {
>> + thread_id = thread;
>> +    }
>> +
>> +    return Xapian::Query (term_prefix + thread_id);
>> +}
>
> The other field processors throw Xapian::QueryParserErrors in case bad
> things happen (e.g. failure to read from the database). This seems to be
> better than nothing as it allows transmitting some detail to the
> user. See e.g. "regex error reporting" in test/T650-regexp-query.sh. And
> of course, speaking of tests, this should have one.

I'm not sure what you're suggesting. Do you mean I should open-code
notmuch_database_find_message() or notmuch_message_get_thread_id() to
get at the exceptions thrown within them?

BR,
Jani.
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] lib: add support for thread:<message-id> queries

Jani Nikula <[hidden email]> writes:

> On Tue, 17 Oct 2017, David Bremner <[hidden email]> wrote:
>> Jani Nikula <[hidden email]> writes:
>>
>> The other field processors throw Xapian::QueryParserErrors in case bad
>> things happen (e.g. failure to read from the database). This seems to be
>> better than nothing as it allows transmitting some detail to the
>> user. See e.g. "regex error reporting" in test/T650-regexp-query.sh. And
>> of course, speaking of tests, this should have one.
>
> I'm not sure what you're suggesting. Do you mean I should open-code
> notmuch_database_find_message() or notmuch_message_get_thread_id() to
> get at the exceptions thrown within them?
>
> BR,
> Jani.

nothing so fancy. Consider the following bit of query-fp.cc

    status = notmuch_database_get_config (notmuch, key.c_str (), &expansion);
    if (status) {
        throw Xapian::QueryParserError ("error looking up key" + name);
    }


That could no doubt be improved to report the specifics of `status`, but
it's already better than silently ignoring the error, as this will
eventually be reported back to library clients. I agree the exception ->
error-code -> exception -> error-code  path is pretty gross, but thats
the price we pay for mostly pretending we're not writing C++.


_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Jani Nikula Jani Nikula
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] lib: add support for thread:<message-id> queries

On Sat, 21 Oct 2017, David Bremner <[hidden email]> wrote:

> Jani Nikula <[hidden email]> writes:
>
>> On Tue, 17 Oct 2017, David Bremner <[hidden email]> wrote:
>>> Jani Nikula <[hidden email]> writes:
>>>
>>> The other field processors throw Xapian::QueryParserErrors in case bad
>>> things happen (e.g. failure to read from the database). This seems to be
>>> better than nothing as it allows transmitting some detail to the
>>> user. See e.g. "regex error reporting" in test/T650-regexp-query.sh. And
>>> of course, speaking of tests, this should have one.
>>
>> I'm not sure what you're suggesting. Do you mean I should open-code
>> notmuch_database_find_message() or notmuch_message_get_thread_id() to
>> get at the exceptions thrown within them?
>>
>> BR,
>> Jani.
>
> nothing so fancy. Consider the following bit of query-fp.cc
>
>     status = notmuch_database_get_config (notmuch, key.c_str (), &expansion);
>     if (status) {
> throw Xapian::QueryParserError ("error looking up key" + name);
>     }
>
>
> That could no doubt be improved to report the specifics of `status`, but
> it's already better than silently ignoring the error, as this will
> eventually be reported back to library clients. I agree the exception ->
> error-code -> exception -> error-code  path is pretty gross, but thats
> the price we pay for mostly pretending we're not writing C++.

Okay. Seems like we are in agreement we want this feature, so I'll fix
that up and write some tests when I have a moment.

Thanks,
Jani.
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch