Thread subqueries

classic Classic list List threaded Threaded
15 messages Options
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Thread subqueries

This is the first non-WIP version of this series. It adds a small
optimization (something like a 10% speedup on SSD), and some
documentation and tests.


_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

[PATCH 1/4] lib: add thread subqueries.

This change allows queries of the form

 thread:{from:me} and thread:{from:jian} and not thread:{from:dave}

This is still somewhat brute-force, but it's a big improvement over
both the shell script solution and the previous proposal [1], because it
does not build the whole thread structure just generate a
query. A further potential optimization is to replace the calls to
notmuch with more specialized Xapian code; in particular it's not
likely that reading all of the message metadata is a win here.

[1]: id:[hidden email]
---
 lib/Makefile.local           |  3 +-
 lib/database.cc              |  6 +++-
 lib/thread-fp.cc             | 67 ++++++++++++++++++++++++++++++++++++
 lib/thread-fp.h              | 42 ++++++++++++++++++++++
 test/T585-thread-subquery.sh | 46 +++++++++++++++++++++++++
 5 files changed, 162 insertions(+), 2 deletions(-)
 create mode 100644 lib/thread-fp.cc
 create mode 100644 lib/thread-fp.h
 create mode 100755 test/T585-thread-subquery.sh

diff --git a/lib/Makefile.local b/lib/Makefile.local
index 8aa03891..5dc057c0 100644
--- a/lib/Makefile.local
+++ b/lib/Makefile.local
@@ -58,7 +58,8 @@ libnotmuch_cxx_srcs = \
  $(dir)/query-fp.cc      \
  $(dir)/config.cc \
  $(dir)/regexp-fields.cc \
- $(dir)/thread.cc
+ $(dir)/thread.cc \
+ $(dir)/thread-fp.cc
 
 libnotmuch_modules := $(libnotmuch_c_srcs:.c=.o) $(libnotmuch_cxx_srcs:.cc=.o)
 
diff --git a/lib/database.cc b/lib/database.cc
index 02444e09..9cf8062c 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -21,6 +21,7 @@
 #include "database-private.h"
 #include "parse-time-vrp.h"
 #include "query-fp.h"
+#include "thread-fp.h"
 #include "regexp-fields.h"
 #include "string-util.h"
 
@@ -258,7 +259,8 @@ prefix_t prefix_table[] = {
     { "directory", "XDIRECTORY", NOTMUCH_FIELD_NO_FLAGS },
     { "file-direntry", "XFDIRENTRY", NOTMUCH_FIELD_NO_FLAGS },
     { "directory-direntry", "XDDIRENTRY", NOTMUCH_FIELD_NO_FLAGS },
-    { "thread", "G", NOTMUCH_FIELD_EXTERNAL },
+    { "thread", "G", NOTMUCH_FIELD_EXTERNAL |
+ NOTMUCH_FIELD_PROCESSOR },
     { "tag", "K", NOTMUCH_FIELD_EXTERNAL |
  NOTMUCH_FIELD_PROCESSOR },
     { "is", "K", NOTMUCH_FIELD_EXTERNAL |
@@ -317,6 +319,8 @@ _setup_query_field (const prefix_t *prefix, notmuch_database_t *notmuch)
     fp = (new DateFieldProcessor())->release ();
  else if (STRNCMP_LITERAL(prefix->name, "query") == 0)
     fp = (new QueryFieldProcessor (*notmuch->query_parser, notmuch))->release ();
+ else if (STRNCMP_LITERAL(prefix->name, "thread") == 0)
+    fp = (new ThreadFieldProcessor (*notmuch->query_parser, notmuch))->release ();
  else
     fp = (new RegexpFieldProcessor (prefix->name, prefix->flags,
     *notmuch->query_parser, notmuch))->release ();
diff --git a/lib/thread-fp.cc b/lib/thread-fp.cc
new file mode 100644
index 00000000..dd292bf6
--- /dev/null
+++ b/lib/thread-fp.cc
@@ -0,0 +1,67 @@
+/* thread-fp.cc - "thread:" field processor glue
+ *
+ * This file is part of notmuch.
+ *
+ * Copyright © 2018 David Bremner
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see https://www.gnu.org/licenses/ .
+ *
+ * Author: David Bremner <[hidden email]>
+ */
+
+#include "database-private.h"
+#include "thread-fp.h"
+#include <iostream>
+
+#if HAVE_XAPIAN_FIELD_PROCESSOR
+
+Xapian::Query
+ThreadFieldProcessor::operator() (const std::string & str)
+{
+    notmuch_status_t status;
+    const char *thread_prefix = _find_prefix ("thread");
+
+    if (str.at (0) == '{') {
+ if (str.length () > 1 && str.at (str.size () - 1) == '}') {
+    std::string subquery_str = str.substr (1, str.size () - 2);
+    notmuch_query_t *subquery = notmuch_query_create (notmuch, subquery_str.c_str ());
+    notmuch_messages_t *messages;
+    std::set<std::string> terms;
+
+    if (! subquery)
+ throw Xapian::QueryParserError ("failed to create subquery for '" + subquery_str + "'");
+
+    status = notmuch_query_search_messages (subquery, &messages);
+    if (status)
+ throw Xapian::QueryParserError ("failed to search messages for '" + subquery_str + "'");
+
+    for (; notmuch_messages_valid (messages); notmuch_messages_move_to_next (messages)) {
+ std::string term = thread_prefix;
+ notmuch_message_t *message;
+ message = notmuch_messages_get (messages);
+ term += notmuch_message_get_thread_id (message);
+ terms.insert (term);
+    }
+    return Xapian::Query (Xapian::Query::OP_OR, terms.begin (), terms.end ());
+ } else {
+    throw Xapian::QueryParserError ("missing } in '" + str + "'");
+ }
+    } else {
+ /* literal thread id */
+ std::string term = thread_prefix + str;
+ return Xapian::Query (term);
+    }
+
+}
+#endif
diff --git a/lib/thread-fp.h b/lib/thread-fp.h
new file mode 100644
index 00000000..13725978
--- /dev/null
+++ b/lib/thread-fp.h
@@ -0,0 +1,42 @@
+/* thread-fp.h - thread field processor glue
+ *
+ * This file is part of notmuch.
+ *
+ * Copyright © 2017 David Bremner
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see https://www.gnu.org/licenses/ .
+ *
+ * Author: David Bremner <[hidden email]>
+ */
+
+#ifndef NOTMUCH_THREAD_FP_H
+#define NOTMUCH_THREAD_FP_H
+
+#include <xapian.h>
+#include "notmuch.h"
+
+#if HAVE_XAPIAN_FIELD_PROCESSOR
+class ThreadFieldProcessor : public Xapian::FieldProcessor {
+ protected:
+    Xapian::QueryParser &parser;
+    notmuch_database_t *notmuch;
+
+ public:
+    ThreadFieldProcessor (Xapian::QueryParser &parser_, notmuch_database_t *notmuch_)
+ : parser(parser_), notmuch(notmuch_) { };
+
+    Xapian::Query operator()(const std::string & str);
+};
+#endif
+#endif /* NOTMUCH_THREAD_FP_H */
diff --git a/test/T585-thread-subquery.sh b/test/T585-thread-subquery.sh
new file mode 100755
index 00000000..71ced149
--- /dev/null
+++ b/test/T585-thread-subquery.sh
@@ -0,0 +1,46 @@
+#!/usr/bin/env bash
+#
+# Copyright (c) 2018 David Bremner
+#
+
+test_description='test of searching by using thread subqueries'
+
+. $(dirname "$0")/test-lib.sh || exit 1
+
+add_email_corpus
+
+test_begin_subtest "Basic query that matches no messages"
+count=$(notmuch count from:keithp and to:keithp)
+test_expect_equal 0 "$count"
+
+test_begin_subtest "Same query against threads"
+notmuch search thread:{from:keithp} and thread:{to:keithp} | notmuch_search_sanitize > OUTPUT
+cat<<EOF > EXPECTED
+thread:XXX   2009-11-18 [7/7] Lars Kellogg-Stedman, Mikhail Gusarov, Keith Packard, Carl Worth; [notmuch] Working with Maildir storage? (inbox signed unread)
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest "Mix thread and non-threads query"
+notmuch search thread:{from:keithp} and to:keithp | notmuch_search_sanitize > OUTPUT
+cat<<EOF > EXPECTED
+thread:XXX   2009-11-18 [1/7] Lars Kellogg-Stedman| Mikhail Gusarov, Keith Packard, Carl Worth; [notmuch] Working with Maildir storage? (inbox signed unread)
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest "Compound subquery"
+notmuch search 'thread:"{from:keithp and date:2009}" and thread:{to:keithp}' | notmuch_search_sanitize > OUTPUT
+cat<<EOF > EXPECTED
+thread:XXX   2009-11-18 [7/7] Lars Kellogg-Stedman, Mikhail Gusarov, Keith Packard, Carl Worth; [notmuch] Working with Maildir storage? (inbox signed unread)
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest "Syntax/quoting error in subquery"
+notmuch search 'thread:{from:keithp and date:2009} and thread:{to:keithp}' 1>OUTPUT 2>&1
+cat<<EOF > EXPECTED
+notmuch search: A Xapian exception occurred
+A Xapian exception occurred parsing query: missing } in '{from:keithp'
+Query string was: thread:{from:keithp and date:2009} and thread:{to:keithp}
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_done
--
2.17.0

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

[PATCH 2/4] perf-test: add simple test for thread subqueries

In reply to this post by David Bremner-2
This is not a particularly sensible query, but thread:{date:2010} is a
good way to generate fairly large intermediate queries.
---
 performance-test/T04-thread-subquery.sh | 13 +++++++++++++
 1 file changed, 13 insertions(+)
 create mode 100755 performance-test/T04-thread-subquery.sh

diff --git a/performance-test/T04-thread-subquery.sh b/performance-test/T04-thread-subquery.sh
new file mode 100755
index 00000000..665d5a64
--- /dev/null
+++ b/performance-test/T04-thread-subquery.sh
@@ -0,0 +1,13 @@
+#!/bin/bash
+
+test_description='thread subqueries'
+
+. $(dirname "$0")/perf-test-lib.sh || exit 1
+
+time_start
+
+time_run "search thread:{} ..." "notmuch search thread:{date:2010} and thread:{from:linus}"
+time_run "search thread:{} ..." "notmuch search thread:{date:2010} and thread:{from:linus}"
+time_run "search thread:{} ..." "notmuch search thread:{date:2010} and thread:{from:linus}"
+
+time_done
--
2.17.0

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

[PATCH 3/4] lib: define specialized get_thread_id for use in thread subquery

In reply to this post by David Bremner-2
The observation is that we are only using the messages to get there
thread_id, which is kindof a pessimal access pattern for the current
notmuch_message_get_thread_id
---
 lib/message.cc        | 17 +++++++++++++++++
 lib/notmuch-private.h |  4 ++++
 lib/thread-fp.cc      |  2 +-
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/lib/message.cc b/lib/message.cc
index d5db89b6..b2067076 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -318,6 +318,23 @@ _notmuch_message_get_term (notmuch_message_t *message,
     return value;
 }
 
+/*
+ * For special applications where we only want the thread id, reading
+ * in all metadata is a heavy I/O penalty.
+ */
+const char *
+_notmuch_message_get_thread_id_only (notmuch_message_t *message)
+{
+
+    Xapian::TermIterator i = message->doc.termlist_begin ();
+    Xapian::TermIterator end = message->doc.termlist_end ();
+
+    message->thread_id = _notmuch_message_get_term (message, i, end,
+    _find_prefix ("thread"));
+    return message->thread_id;
+}
+
+
 static void
 _notmuch_message_ensure_metadata (notmuch_message_t *message, void *field)
 {
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index 1093429c..4598577f 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -537,6 +537,10 @@ _notmuch_message_database (notmuch_message_t *message);
 
 void
 _notmuch_message_remove_unprefixed_terms (notmuch_message_t *message);
+
+const char *
+_notmuch_message_get_thread_id_only(notmuch_message_t *message);
+
 /* sha1.c */
 
 char *
diff --git a/lib/thread-fp.cc b/lib/thread-fp.cc
index dd292bf6..661d00dd 100644
--- a/lib/thread-fp.cc
+++ b/lib/thread-fp.cc
@@ -50,7 +50,7 @@ ThreadFieldProcessor::operator() (const std::string & str)
  std::string term = thread_prefix;
  notmuch_message_t *message;
  message = notmuch_messages_get (messages);
- term += notmuch_message_get_thread_id (message);
+ term += _notmuch_message_get_thread_id_only (message);
  terms.insert (term);
     }
     return Xapian::Query (Xapian::Query::OP_OR, terms.begin (), terms.end ());
--
2.17.0

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

[PATCH 4/4] doc: document thread subqueries

In reply to this post by David Bremner-2
Mention both performance and quoting issues.
---
 doc/man7/notmuch-search-terms.rst | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/doc/man7/notmuch-search-terms.rst b/doc/man7/notmuch-search-terms.rst
index 248444e3..ec999eed 100644
--- a/doc/man7/notmuch-search-terms.rst
+++ b/doc/man7/notmuch-search-terms.rst
@@ -83,6 +83,22 @@ thread:<thread-id>
     messages). These thread ID values can be seen in the first column
     of output from **notmuch search**
 
+thread:{<notmuch query>}
+    If notmuch is built with **Xapian Field Processors** (see below),
+    threads may be searched for indirectly by providing an arbitrary
+    notmuch query in **{}**. For example, the following returns
+    threads containing a message from mallory and one (not neccesarily
+    the same message) with Subject containing the word "crypto".
+
+    ::
+
+       % notmuch search 'thread:"{from:mallory}" and thread:"{subject:crypto}"'
+
+    The performance of such queries can vary wildly. To understand
+    this, the user should think of the query **thread:{<something>}**
+    as expanding to all of the thread IDs which match **<something>**;
+    notmuch then performs a second search using the expanded query.
+
 path:<directory-path> or path:<directory-path>/** or path:/<regex>/
     The **path:** prefix searches for email messages that are in
     particular directories within the mail store. The directory must
@@ -277,8 +293,8 @@ Quoting
 -------
 
 Double quotes are also used by the notmuch query parser to protect
-boolean terms or regular expressions containing spaces or other
-special characters, e.g.
+boolean terms, regular expressions, or subqueries containing spaces or
+other special characters, e.g.
 
 ::
 
@@ -288,12 +304,17 @@ special characters, e.g.
 
    folder:"/^.*/(Junk|Spam)$/"
 
+::
+
+   thread:"{from:mallory and date:2009}"
+
 As with phrases, you need to protect the double quotes from the shell
 e.g.
 
 ::
 
    % notmuch search 'folder:"/^.*/(Junk|Spam)$/"'
+   % notmuch search 'thread:"{from:mallory and date:2009}" and thread:{to:mallory}'
 
 DATE AND TIME SEARCH
 ====================
@@ -435,6 +456,7 @@ Currently the following features require field processor support:
 - non-range date queries, e.g. "date:today"
 - named queries e.g. "query:my_special_query"
 - regular expression searches, e.g. "subject:/^\\[SPAM\\]/"
+- thread subqueries, e.g. "thread:{from:bob}"
 
 SEE ALSO
 ========
--
2.17.0

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Jani Nikula Jani Nikula
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/4] lib: add thread subqueries.

In reply to this post by David Bremner-2
On Sat, 05 May 2018, David Bremner <[hidden email]> wrote:

> This change allows queries of the form
>
>  thread:{from:me} and thread:{from:jian} and not thread:{from:dave}
>
> This is still somewhat brute-force, but it's a big improvement over
> both the shell script solution and the previous proposal [1], because it
> does not build the whole thread structure just generate a
> query. A further potential optimization is to replace the calls to
> notmuch with more specialized Xapian code; in particular it's not
> likely that reading all of the message metadata is a win here.
>
> [1]: id:[hidden email]
> ---
>  lib/Makefile.local           |  3 +-
>  lib/database.cc              |  6 +++-
>  lib/thread-fp.cc             | 67 ++++++++++++++++++++++++++++++++++++
>  lib/thread-fp.h              | 42 ++++++++++++++++++++++
>  test/T585-thread-subquery.sh | 46 +++++++++++++++++++++++++
>  5 files changed, 162 insertions(+), 2 deletions(-)
>  create mode 100644 lib/thread-fp.cc
>  create mode 100644 lib/thread-fp.h
>  create mode 100755 test/T585-thread-subquery.sh
>
> diff --git a/lib/Makefile.local b/lib/Makefile.local
> index 8aa03891..5dc057c0 100644
> --- a/lib/Makefile.local
> +++ b/lib/Makefile.local
> @@ -58,7 +58,8 @@ libnotmuch_cxx_srcs = \
>   $(dir)/query-fp.cc      \
>   $(dir)/config.cc \
>   $(dir)/regexp-fields.cc \
> - $(dir)/thread.cc
> + $(dir)/thread.cc \
> + $(dir)/thread-fp.cc
>  
>  libnotmuch_modules := $(libnotmuch_c_srcs:.c=.o) $(libnotmuch_cxx_srcs:.cc=.o)
>  
> diff --git a/lib/database.cc b/lib/database.cc
> index 02444e09..9cf8062c 100644
> --- a/lib/database.cc
> +++ b/lib/database.cc
> @@ -21,6 +21,7 @@
>  #include "database-private.h"
>  #include "parse-time-vrp.h"
>  #include "query-fp.h"
> +#include "thread-fp.h"
>  #include "regexp-fields.h"
>  #include "string-util.h"
>  
> @@ -258,7 +259,8 @@ prefix_t prefix_table[] = {
>      { "directory", "XDIRECTORY", NOTMUCH_FIELD_NO_FLAGS },
>      { "file-direntry", "XFDIRENTRY", NOTMUCH_FIELD_NO_FLAGS },
>      { "directory-direntry", "XDDIRENTRY", NOTMUCH_FIELD_NO_FLAGS },
> -    { "thread", "G", NOTMUCH_FIELD_EXTERNAL },
> +    { "thread", "G", NOTMUCH_FIELD_EXTERNAL |
> + NOTMUCH_FIELD_PROCESSOR },
>      { "tag", "K", NOTMUCH_FIELD_EXTERNAL |
>   NOTMUCH_FIELD_PROCESSOR },
>      { "is", "K", NOTMUCH_FIELD_EXTERNAL |
> @@ -317,6 +319,8 @@ _setup_query_field (const prefix_t *prefix, notmuch_database_t *notmuch)
>      fp = (new DateFieldProcessor())->release ();
>   else if (STRNCMP_LITERAL(prefix->name, "query") == 0)
>      fp = (new QueryFieldProcessor (*notmuch->query_parser, notmuch))->release ();
> + else if (STRNCMP_LITERAL(prefix->name, "thread") == 0)
> +    fp = (new ThreadFieldProcessor (*notmuch->query_parser, notmuch))->release ();
>   else
>      fp = (new RegexpFieldProcessor (prefix->name, prefix->flags,
>      *notmuch->query_parser, notmuch))->release ();
> diff --git a/lib/thread-fp.cc b/lib/thread-fp.cc
> new file mode 100644
> index 00000000..dd292bf6
> --- /dev/null
> +++ b/lib/thread-fp.cc
> @@ -0,0 +1,67 @@
> +/* thread-fp.cc - "thread:" field processor glue
> + *
> + * This file is part of notmuch.
> + *
> + * Copyright © 2018 David Bremner
> + *
> + * This program is free software: you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation, either version 3 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see https://www.gnu.org/licenses/ .
> + *
> + * Author: David Bremner <[hidden email]>
> + */
> +
> +#include "database-private.h"
> +#include "thread-fp.h"
> +#include <iostream>
> +
> +#if HAVE_XAPIAN_FIELD_PROCESSOR
> +
> +Xapian::Query
> +ThreadFieldProcessor::operator() (const std::string & str)
> +{
> +    notmuch_status_t status;
> +    const char *thread_prefix = _find_prefix ("thread");
> +
> +    if (str.at (0) == '{') {
> + if (str.length () > 1 && str.at (str.size () - 1) == '}') {

IIUC .length() and .size() are the same thing, but it's confusing to see
them both used on the same line.

Nitpick, I always favor dealing with error cases first, so you can do
the happy day scenario with less indent. So I'd check the opposite,
throw the error, and continue without the else. YMMV.

Otherwise, LGTM.

> +    std::string subquery_str = str.substr (1, str.size () - 2);
> +    notmuch_query_t *subquery = notmuch_query_create (notmuch, subquery_str.c_str ());
> +    notmuch_messages_t *messages;
> +    std::set<std::string> terms;
> +
> +    if (! subquery)
> + throw Xapian::QueryParserError ("failed to create subquery for '" + subquery_str + "'");
> +
> +    status = notmuch_query_search_messages (subquery, &messages);
> +    if (status)
> + throw Xapian::QueryParserError ("failed to search messages for '" + subquery_str + "'");
> +
> +    for (; notmuch_messages_valid (messages); notmuch_messages_move_to_next (messages)) {
> + std::string term = thread_prefix;
> + notmuch_message_t *message;
> + message = notmuch_messages_get (messages);
> + term += notmuch_message_get_thread_id (message);
> + terms.insert (term);
> +    }
> +    return Xapian::Query (Xapian::Query::OP_OR, terms.begin (), terms.end ());
> + } else {
> +    throw Xapian::QueryParserError ("missing } in '" + str + "'");
> + }
> +    } else {
> + /* literal thread id */
> + std::string term = thread_prefix + str;
> + return Xapian::Query (term);
> +    }
> +
> +}
> +#endif
> diff --git a/lib/thread-fp.h b/lib/thread-fp.h
> new file mode 100644
> index 00000000..13725978
> --- /dev/null
> +++ b/lib/thread-fp.h
> @@ -0,0 +1,42 @@
> +/* thread-fp.h - thread field processor glue
> + *
> + * This file is part of notmuch.
> + *
> + * Copyright © 2017 David Bremner
> + *
> + * This program is free software: you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation, either version 3 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see https://www.gnu.org/licenses/ .
> + *
> + * Author: David Bremner <[hidden email]>
> + */
> +
> +#ifndef NOTMUCH_THREAD_FP_H
> +#define NOTMUCH_THREAD_FP_H
> +
> +#include <xapian.h>
> +#include "notmuch.h"
> +
> +#if HAVE_XAPIAN_FIELD_PROCESSOR
> +class ThreadFieldProcessor : public Xapian::FieldProcessor {
> + protected:
> +    Xapian::QueryParser &parser;
> +    notmuch_database_t *notmuch;
> +
> + public:
> +    ThreadFieldProcessor (Xapian::QueryParser &parser_, notmuch_database_t *notmuch_)
> + : parser(parser_), notmuch(notmuch_) { };
> +
> +    Xapian::Query operator()(const std::string & str);
> +};
> +#endif
> +#endif /* NOTMUCH_THREAD_FP_H */
> diff --git a/test/T585-thread-subquery.sh b/test/T585-thread-subquery.sh
> new file mode 100755
> index 00000000..71ced149
> --- /dev/null
> +++ b/test/T585-thread-subquery.sh
> @@ -0,0 +1,46 @@
> +#!/usr/bin/env bash
> +#
> +# Copyright (c) 2018 David Bremner
> +#
> +
> +test_description='test of searching by using thread subqueries'
> +
> +. $(dirname "$0")/test-lib.sh || exit 1
> +
> +add_email_corpus
> +
> +test_begin_subtest "Basic query that matches no messages"
> +count=$(notmuch count from:keithp and to:keithp)
> +test_expect_equal 0 "$count"
> +
> +test_begin_subtest "Same query against threads"
> +notmuch search thread:{from:keithp} and thread:{to:keithp} | notmuch_search_sanitize > OUTPUT
> +cat<<EOF > EXPECTED
> +thread:XXX   2009-11-18 [7/7] Lars Kellogg-Stedman, Mikhail Gusarov, Keith Packard, Carl Worth; [notmuch] Working with Maildir storage? (inbox signed unread)
> +EOF
> +test_expect_equal_file EXPECTED OUTPUT
> +
> +test_begin_subtest "Mix thread and non-threads query"
> +notmuch search thread:{from:keithp} and to:keithp | notmuch_search_sanitize > OUTPUT
> +cat<<EOF > EXPECTED
> +thread:XXX   2009-11-18 [1/7] Lars Kellogg-Stedman| Mikhail Gusarov, Keith Packard, Carl Worth; [notmuch] Working with Maildir storage? (inbox signed unread)
> +EOF
> +test_expect_equal_file EXPECTED OUTPUT
> +
> +test_begin_subtest "Compound subquery"
> +notmuch search 'thread:"{from:keithp and date:2009}" and thread:{to:keithp}' | notmuch_search_sanitize > OUTPUT
> +cat<<EOF > EXPECTED
> +thread:XXX   2009-11-18 [7/7] Lars Kellogg-Stedman, Mikhail Gusarov, Keith Packard, Carl Worth; [notmuch] Working with Maildir storage? (inbox signed unread)
> +EOF
> +test_expect_equal_file EXPECTED OUTPUT
> +
> +test_begin_subtest "Syntax/quoting error in subquery"
> +notmuch search 'thread:{from:keithp and date:2009} and thread:{to:keithp}' 1>OUTPUT 2>&1
> +cat<<EOF > EXPECTED
> +notmuch search: A Xapian exception occurred
> +A Xapian exception occurred parsing query: missing } in '{from:keithp'
> +Query string was: thread:{from:keithp and date:2009} and thread:{to:keithp}
> +EOF
> +test_expect_equal_file EXPECTED OUTPUT
> +
> +test_done
> --
> 2.17.0
>
> _______________________________________________
> notmuch mailing list
> [hidden email]
> https://notmuchmail.org/mailman/listinfo/notmuch
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Jani Nikula Jani Nikula
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 3/4] lib: define specialized get_thread_id for use in thread subquery

In reply to this post by David Bremner-2
On Sat, 05 May 2018, David Bremner <[hidden email]> wrote:
> The observation is that we are only using the messages to get there
> thread_id, which is kindof a pessimal access pattern for the current
> notmuch_message_get_thread_id

LGTM.

> ---
>  lib/message.cc        | 17 +++++++++++++++++
>  lib/notmuch-private.h |  4 ++++
>  lib/thread-fp.cc      |  2 +-
>  3 files changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/lib/message.cc b/lib/message.cc
> index d5db89b6..b2067076 100644
> --- a/lib/message.cc
> +++ b/lib/message.cc
> @@ -318,6 +318,23 @@ _notmuch_message_get_term (notmuch_message_t *message,
>      return value;
>  }
>  
> +/*
> + * For special applications where we only want the thread id, reading
> + * in all metadata is a heavy I/O penalty.
> + */
> +const char *
> +_notmuch_message_get_thread_id_only (notmuch_message_t *message)
> +{
> +
> +    Xapian::TermIterator i = message->doc.termlist_begin ();
> +    Xapian::TermIterator end = message->doc.termlist_end ();
> +
> +    message->thread_id = _notmuch_message_get_term (message, i, end,
> +    _find_prefix ("thread"));
> +    return message->thread_id;
> +}
> +
> +
>  static void
>  _notmuch_message_ensure_metadata (notmuch_message_t *message, void *field)
>  {
> diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
> index 1093429c..4598577f 100644
> --- a/lib/notmuch-private.h
> +++ b/lib/notmuch-private.h
> @@ -537,6 +537,10 @@ _notmuch_message_database (notmuch_message_t *message);
>  
>  void
>  _notmuch_message_remove_unprefixed_terms (notmuch_message_t *message);
> +
> +const char *
> +_notmuch_message_get_thread_id_only(notmuch_message_t *message);
> +
>  /* sha1.c */
>  
>  char *
> diff --git a/lib/thread-fp.cc b/lib/thread-fp.cc
> index dd292bf6..661d00dd 100644
> --- a/lib/thread-fp.cc
> +++ b/lib/thread-fp.cc
> @@ -50,7 +50,7 @@ ThreadFieldProcessor::operator() (const std::string & str)
>   std::string term = thread_prefix;
>   notmuch_message_t *message;
>   message = notmuch_messages_get (messages);
> - term += notmuch_message_get_thread_id (message);
> + term += _notmuch_message_get_thread_id_only (message);
>   terms.insert (term);
>      }
>      return Xapian::Query (Xapian::Query::OP_OR, terms.begin (), terms.end ());
> --
> 2.17.0
>
> _______________________________________________
> notmuch mailing list
> [hidden email]
> https://notmuchmail.org/mailman/listinfo/notmuch
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Jani Nikula Jani Nikula
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 4/4] doc: document thread subqueries

In reply to this post by David Bremner-2
On Sat, 05 May 2018, David Bremner <[hidden email]> wrote:

> Mention both performance and quoting issues.
> ---
>  doc/man7/notmuch-search-terms.rst | 26 ++++++++++++++++++++++++--
>  1 file changed, 24 insertions(+), 2 deletions(-)
>
> diff --git a/doc/man7/notmuch-search-terms.rst b/doc/man7/notmuch-search-terms.rst
> index 248444e3..ec999eed 100644
> --- a/doc/man7/notmuch-search-terms.rst
> +++ b/doc/man7/notmuch-search-terms.rst
> @@ -83,6 +83,22 @@ thread:<thread-id>
>      messages). These thread ID values can be seen in the first column
>      of output from **notmuch search**
>  
> +thread:{<notmuch query>}
> +    If notmuch is built with **Xapian Field Processors** (see below),
> +    threads may be searched for indirectly by providing an arbitrary
> +    notmuch query in **{}**. For example, the following returns
> +    threads containing a message from mallory and one (not neccesarily

neccesarily typo.

Otherwise LGTM.

> +    the same message) with Subject containing the word "crypto".
> +
> +    ::
> +
> +       % notmuch search 'thread:"{from:mallory}" and thread:"{subject:crypto}"'
> +
> +    The performance of such queries can vary wildly. To understand
> +    this, the user should think of the query **thread:{<something>}**
> +    as expanding to all of the thread IDs which match **<something>**;
> +    notmuch then performs a second search using the expanded query.
> +
>  path:<directory-path> or path:<directory-path>/** or path:/<regex>/
>      The **path:** prefix searches for email messages that are in
>      particular directories within the mail store. The directory must
> @@ -277,8 +293,8 @@ Quoting
>  -------
>  
>  Double quotes are also used by the notmuch query parser to protect
> -boolean terms or regular expressions containing spaces or other
> -special characters, e.g.
> +boolean terms, regular expressions, or subqueries containing spaces or
> +other special characters, e.g.
>  
>  ::
>  
> @@ -288,12 +304,17 @@ special characters, e.g.
>  
>     folder:"/^.*/(Junk|Spam)$/"
>  
> +::
> +
> +   thread:"{from:mallory and date:2009}"
> +
>  As with phrases, you need to protect the double quotes from the shell
>  e.g.
>  
>  ::
>  
>     % notmuch search 'folder:"/^.*/(Junk|Spam)$/"'
> +   % notmuch search 'thread:"{from:mallory and date:2009}" and thread:{to:mallory}'
>  
>  DATE AND TIME SEARCH
>  ====================
> @@ -435,6 +456,7 @@ Currently the following features require field processor support:
>  - non-range date queries, e.g. "date:today"
>  - named queries e.g. "query:my_special_query"
>  - regular expression searches, e.g. "subject:/^\\[SPAM\\]/"
> +- thread subqueries, e.g. "thread:{from:bob}"
>  
>  SEE ALSO
>  ========
> --
> 2.17.0
>
> _______________________________________________
> notmuch mailing list
> [hidden email]
> https://notmuchmail.org/mailman/listinfo/notmuch
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: Thread subqueries

In reply to this post by David Bremner-2
David Bremner <[hidden email]> writes:

> This is the first non-WIP version of this series. It adds a small
> optimization (something like a 10% speedup on SSD), and some
> documentation and tests.

pushed to master, with Jani's suggestions.

d
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Gaute Hope Gaute Hope
Reply | Threaded
Open this post in threaded view
|

Re: Thread subqueries


man. 7. mai 2018 kl. 14:09 skrev David Bremner <[hidden email]>:
David Bremner <[hidden email]> writes:

> This is the first non-WIP version of this series. It adds a small
> optimization (something like a 10% speedup on SSD), and some
> documentation and tests.

pushed to master, with Jani's suggestions.

Looking forward to test this! Great effort!

Gaute


_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Daniel Kahn Gillmor Daniel Kahn Gillmor
Reply | Threaded
Open this post in threaded view
|

Re: Thread subqueries

In reply to this post by David Bremner-2
On Mon 2018-05-07 09:09:35 -0300, David Bremner wrote:
> David Bremner <[hidden email]> writes:
>
>> This is the first non-WIP version of this series. It adds a small
>> optimization (something like a 10% speedup on SSD), and some
>> documentation and tests.
>
> pushed to master, with Jani's suggestions.

this is awesome.  thank you for pushing it forward!

I'm testing it out now and i am having trouble getting it to be properly
generic when the subquery has multiple terms.

0 dkg@alice:~$ notmuch count 'date:1month..now tag:dkg'
258
0 dkg@alice:~$ notmuch count 'thread:{date:1month..now tag:dkg}'
notmuch count: A Xapian exception occurred
A Xapian exception occurred parsing query: missing } in '{date:1month..now'
Query string was: thread:{date:1month..now tag:dkg}
1 dkg@alice:~$

What i really want is of course something like:

    thread:{date:1month..now tag:dkg} tag:inbox

to find all the replies to threads i've recently participated in, but
that fails with the same error.

What am i missing?

   --dkg

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch

signature.asc (233 bytes) Download Attachment
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: Thread subqueries

Daniel Kahn Gillmor <[hidden email]> writes:

> 0 dkg@alice:~$ notmuch count 'thread:{date:1month..now tag:dkg}'
> notmuch count: A Xapian exception occurred
> A Xapian exception occurred parsing query: missing } in '{date:1month..now'
> Query string was: thread:{date:1month..now tag:dkg}
> 1 dkg@alice:~$

Pretty sure what you want here is

        $ notmuch count 'thread:"{date:1month..now tag:dkg}"'

There is some related discussion in QUOTING in notmuch-search-terms(7),
and the thread:{} examples there all double quoting so they still work
if the terms are replaced by terms with spaces.

d



_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Daniel Kahn Gillmor Daniel Kahn Gillmor
Reply | Threaded
Open this post in threaded view
|

Re: Thread subqueries

On Fri 2018-05-11 07:15:41 -0300, David Bremner wrote:

> Daniel Kahn Gillmor <[hidden email]> writes:
>
>> 0 dkg@alice:~$ notmuch count 'thread:{date:1month..now tag:dkg}'
>> notmuch count: A Xapian exception occurred
>> A Xapian exception occurred parsing query: missing } in '{date:1month..now'
>> Query string was: thread:{date:1month..now tag:dkg}
>> 1 dkg@alice:~$
>
> Pretty sure what you want here is
>
>         $ notmuch count 'thread:"{date:1month..now tag:dkg}"'
Thanks, yes, that's it.  I still find the quoting/assembling rules for
notmuch queries non-intuitive, but maybe one day i'll wrap my head
around them some day.  I certainly don't have any specific suggestions
for improvement.

This is a really useful feature, much appreciated!

        --dkg

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch

signature.asc (233 bytes) Download Attachment
Tomi Ollila-2 Tomi Ollila-2
Reply | Threaded
Open this post in threaded view
|

Re: Thread subqueries

On Fri, May 11 2018, Daniel Kahn Gillmor wrote:

> On Fri 2018-05-11 07:15:41 -0300, David Bremner wrote:
>> Daniel Kahn Gillmor <[hidden email]> writes:
>>
>>> 0 dkg@alice:~$ notmuch count 'thread:{date:1month..now tag:dkg}'
>>> notmuch count: A Xapian exception occurred
>>> A Xapian exception occurred parsing query: missing } in '{date:1month..now'
>>> Query string was: thread:{date:1month..now tag:dkg}
>>> 1 dkg@alice:~$
>>
>> Pretty sure what you want here is
>>
>>         $ notmuch count 'thread:"{date:1month..now tag:dkg}"'

question: how does these differ (processing-wise):

         $ notmuch count  'thread:"date:1month..now tag:dkg"'
         $ notmuch count  'thread:{date:1month..now tag:dkg}'
         $ notmuch count 'thread:"{date:1month..now tag:dkg}"'

understanding the reasons behind these might help to use these in desired
ways (or we could just say use "{...}" to get this to work).

Tomi

> Thanks, yes, that's it.  I still find the quoting/assembling rules for
> notmuch queries non-intuitive, but maybe one day i'll wrap my head
> around them some day.  I certainly don't have any specific suggestions
> for improvement.
>
> This is a really useful feature, much appreciated!
>
>         --dkg
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: Thread subqueries

Tomi Ollila <[hidden email]> writes:
>
> question: how does these differ (processing-wise):
>
>          $ notmuch count  'thread:"date:1month..now tag:dkg"'

the thread field processor receives the string "date:1month..now tag:dkg"
(without the quotes) which it treats as a thread id, and doesn't match
anything

>          $ notmuch count  'thread:{date:1month..now tag:dkg}'

the t.f.p. receives the string "{date:1month..now"
(without quotes) because the top level query parser splits at spaces,
unless prevented by "". This it considers syntactically invalid, rather
than silently dropping the second term.

>          $ notmuch count 'thread:"{date:1month..now tag:dkg}"'

The t.f.p. receives the string "{date:1month..now tag:dkg}" (without
quotes). It notes the first and last character, and triggers a subquery
expansion.

The thing to keep in mind is that we have no control over the top level
"tokenization" by Xapian, except for using "".
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch