v3 of regexp search for mid/folder/path

classic Classic list List threaded Threaded
6 messages Options
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

v3 of regexp search for mid/folder/path

No sooner posted than I realized it had a bug: the previous version
compared against the prefixed term so anchored searches failed.

I've also included some tests for the new features in this version.

Below is an interdiff against v1

diff --git a/lib/regexp-fields.cc b/lib/regexp-fields.cc
index 26b22fe2..084bc8c0 100644
--- a/lib/regexp-fields.cc
+++ b/lib/regexp-fields.cc
@@ -156,12 +156,17 @@ RegexpFieldProcessor::RegexpFieldProcessor (std::string prefix,
 Xapian::Query
 RegexpFieldProcessor::operator() (const std::string & str)
 {
-    if (str.size () == 0)
- return Xapian::Query(Xapian::Query::OP_AND_NOT,
+    if (str.empty ()) {
+ if (options & NOTMUCH_FIELD_PROBABILISTIC) {
+    return Xapian::Query(Xapian::Query::OP_AND_NOT,
      Xapian::Query::MatchAll,
      Xapian::Query (Xapian::Query::OP_WILDCARD, term_prefix));
+ } else {
+    return Xapian::Query (term_prefix);
+ }
+    }
 
-    if (str.length() > 0 && str.at (0) == '/') {
+    if (str.at (0) == '/') {
  if (str.length() > 1 && str.at (str.size () - 1) == '/'){
     std::string regexp_str = str.substr(1,str.size () - 2);
     if (slot != Xapian::BAD_VALUENO) {
@@ -174,7 +179,8 @@ RegexpFieldProcessor::operator() (const std::string & str)
  compile_regex(regexp, regexp_str.c_str ());
  for (Xapian::TermIterator it = notmuch->xapian_db->allterms_begin (term_prefix);
      it != notmuch->xapian_db->allterms_end (); ++it) {
-    if (regexec (&regexp, (*it).c_str (), 0, NULL, 0) == 0)
+    if (regexec (&regexp, (*it).c_str () + term_prefix.size(),
+ 0, NULL, 0) == 0)
  terms.push_back(*it);
  }
  return Xapian::Query (Xapian::Query::OP_OR, terms.begin(), terms.end());
diff --git a/test/T650-regexp-query.sh b/test/T650-regexp-query.sh
index 27fc9ab9..b7bdda11 100755
--- a/test/T650-regexp-query.sh
+++ b/test/T650-regexp-query.sh
@@ -2,13 +2,54 @@
 test_description='regular expression searches'
 . ./test-lib.sh || exit 1
 
-add_email_corpus
-
-
 if [ $NOTMUCH_HAVE_XAPIAN_FIELD_PROCESSOR -eq 0 ]; then
     test_done
 fi
 
+add_message '[dir]=bad' '[subject]="To the bone"'
+add_message '[dir]=.' '[subject]="Top level"'
+add_message '[dir]=bad/news' '[subject]="Bears"'
+mkdir -p "${MAIL_DIR}/duplicate/bad/news"
+cp "$gen_msg_filename" "${MAIL_DIR}/duplicate/bad/news"
+
+add_message '[dir]=things' '[subject]="These are a few"'
+add_message '[dir]=things/favorite' '[subject]="Raindrops, whiskers, kettles"'
+add_message '[dir]=things/bad' '[subject]="Bites, stings, sad feelings"'
+
+test_begin_subtest "empty path:// search"
+notmuch search 'path:""' > EXPECTED
+notmuch search 'path:/^$/' > OUTPUT
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest "empty folder:// search"
+notmuch search 'folder:""' > EXPECTED
+notmuch search 'folder:/^$/' > OUTPUT
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest "unanchored folder:// specification"
+output=$(notmuch search folder:/bad/ | notmuch_search_sanitize)
+test_expect_equal "$output" "thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; To the bone (inbox unread)
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; Bears (inbox unread)
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; Bites, stings, sad feelings (inbox unread)"
+
+test_begin_subtest "anchored folder:// search"
+output=$(notmuch search 'folder:/^bad$/' | notmuch_search_sanitize)
+test_expect_equal "$output" "thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; To the bone (inbox unread)"
+
+test_begin_subtest "unanchored path:// specification"
+output=$(notmuch search path:/bad/ | notmuch_search_sanitize)
+test_expect_equal "$output" "thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; To the bone (inbox unread)
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; Bears (inbox unread)
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; Bites, stings, sad feelings (inbox unread)"
+
+test_begin_subtest "anchored path:// search"
+output=$(notmuch search 'path:/^bad$/' | notmuch_search_sanitize)
+test_expect_equal "$output" "thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; To the bone (inbox unread)"
+
+# Use "standard" corpus from here on.
+rm -rf $MAIL_DIR
+add_email_corpus
+
 notmuch search --output=messages from:cworth > cworth.msg-ids
 
 # these headers will generate no document terms
@@ -120,4 +161,15 @@ thread:XXX   2009-11-18 [1/2] Carl Worth| Jan Janak; [notmuch] [PATCH] Older ver
 EOF
 test_expect_equal_file EXPECTED OUTPUT
 
+test_begin_subtest "unanchored tag search"
+notmuch search tag:signed or tag:inbox > EXPECTED
+notmuch search tag:/i/ > OUTPUT
+test_expect_equal_file EXPECTED OUTPUT
+
+notmuch tag +testsi '*'
+test_begin_subtest "anchored tag search"
+notmuch search tag:signed > EXPECTED
+notmuch search tag:/^si/ > OUTPUT
+test_expect_equal_file EXPECTED OUTPUT
+
 test_done

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

[patch v3 1/2] lib: Add regexp searching for mid: prefix

The bulk of the change is passing in the field options to the regexp
field processor, so that we can properly handle the
fallback (non-regexp case).
---
 lib/database.cc           |  6 ++++--
 lib/regexp-fields.cc      | 36 +++++++++++++++++++++++++-----------
 lib/regexp-fields.h       |  4 +++-
 test/T650-regexp-query.sh | 16 ++++++++++++++++
 4 files changed, 48 insertions(+), 14 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index 5bc131a3..49b3849c 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -262,7 +262,8 @@ prefix_t prefix_table[] = {
     { "tag", "K", NOTMUCH_FIELD_EXTERNAL },
     { "is", "K", NOTMUCH_FIELD_EXTERNAL },
     { "id", "Q", NOTMUCH_FIELD_EXTERNAL },
-    { "mid", "Q", NOTMUCH_FIELD_EXTERNAL },
+    { "mid", "Q", NOTMUCH_FIELD_EXTERNAL |
+ NOTMUCH_FIELD_PROCESSOR },
     { "path", "P", NOTMUCH_FIELD_EXTERNAL },
     { "property", "XPROPERTY", NOTMUCH_FIELD_EXTERNAL },
     /*
@@ -313,7 +314,8 @@ _setup_query_field (const prefix_t *prefix, notmuch_database_t *notmuch)
  else if (STRNCMP_LITERAL(prefix->name, "query") == 0)
     fp = (new QueryFieldProcessor (*notmuch->query_parser, notmuch))->release ();
  else
-    fp = (new RegexpFieldProcessor (prefix->name, *notmuch->query_parser, notmuch))->release ();
+    fp = (new RegexpFieldProcessor (prefix->name, prefix->flags,
+    *notmuch->query_parser, notmuch))->release ();
 
  /* we treat all field-processor fields as boolean in order to get the raw input */
  notmuch->query_parser->add_boolean_prefix (prefix->name, fp);
diff --git a/lib/regexp-fields.cc b/lib/regexp-fields.cc
index 1651677c..7ae55e70 100644
--- a/lib/regexp-fields.cc
+++ b/lib/regexp-fields.cc
@@ -135,13 +135,21 @@ static inline Xapian::valueno _find_slot (std::string prefix)
  return NOTMUCH_VALUE_FROM;
     else if (prefix == "subject")
  return NOTMUCH_VALUE_SUBJECT;
+    else if (prefix == "mid")
+ return NOTMUCH_VALUE_MESSAGE_ID;
     else
  throw Xapian::QueryParserError ("unsupported regexp field '" + prefix + "'");
 }
 
-RegexpFieldProcessor::RegexpFieldProcessor (std::string prefix, Xapian::QueryParser &parser_, notmuch_database_t *notmuch_)
- : slot (_find_slot (prefix)), term_prefix (_find_prefix (prefix.c_str ())),
-  parser (parser_), notmuch (notmuch_)
+RegexpFieldProcessor::RegexpFieldProcessor (std::string prefix,
+    notmuch_field_flag_t options_,
+    Xapian::QueryParser &parser_,
+    notmuch_database_t *notmuch_)
+ : slot (_find_slot (prefix)),
+  term_prefix (_find_prefix (prefix.c_str ())),
+  options (options_),
+  parser (parser_),
+  notmuch (notmuch_)
 {
 };
 
@@ -161,16 +169,22 @@ RegexpFieldProcessor::operator() (const std::string & str)
     throw Xapian::QueryParserError ("unmatched regex delimiter in '" + str + "'");
  }
     } else {
- /* TODO replace this with a nicer API level triggering of
- * phrase parsing, when possible */
- std::string query_str;
+ if (options & NOTMUCH_FIELD_PROBABILISTIC) {
+    /* TODO replace this with a nicer API level triggering of
+     * phrase parsing, when possible */
+    std::string query_str;
 
- if (str.find (' ') != std::string::npos)
-    query_str = '"' + str + '"';
- else
-    query_str = str;
+    if (str.find (' ') != std::string::npos)
+ query_str = '"' + str + '"';
+    else
+ query_str = str;
 
- return parser.parse_query (query_str, NOTMUCH_QUERY_PARSER_FLAGS, term_prefix);
+    return parser.parse_query (query_str, NOTMUCH_QUERY_PARSER_FLAGS, term_prefix);
+ } else {
+    /* Boolean prefix */
+    std::string term = term_prefix + str;
+    return Xapian::Query (term);
+ }
     }
 }
 #endif
diff --git a/lib/regexp-fields.h b/lib/regexp-fields.h
index a4ba7ad8..d5f93445 100644
--- a/lib/regexp-fields.h
+++ b/lib/regexp-fields.h
@@ -65,11 +65,13 @@ class RegexpFieldProcessor : public Xapian::FieldProcessor {
  protected:
     Xapian::valueno slot;
     std::string term_prefix;
+    notmuch_field_flag_t options;
     Xapian::QueryParser &parser;
     notmuch_database_t *notmuch;
 
  public:
-    RegexpFieldProcessor (std::string prefix, Xapian::QueryParser &parser_, notmuch_database_t *notmuch_);
+    RegexpFieldProcessor (std::string prefix, notmuch_field_flag_t options,
+  Xapian::QueryParser &parser_, notmuch_database_t *notmuch_);
 
     ~RegexpFieldProcessor () { };
 
diff --git a/test/T650-regexp-query.sh b/test/T650-regexp-query.sh
index 9599c104..27fc9ab9 100755
--- a/test/T650-regexp-query.sh
+++ b/test/T650-regexp-query.sh
@@ -104,4 +104,20 @@ Query string was: from:/unbalanced[/
 EOF
 test_expect_equal_file EXPECTED OUTPUT
 
+test_begin_subtest "empty mid search"
+notmuch search --output=messages mid:yoom > OUTPUT
+cp /dev/null EXPECTED
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest "non-empty mid regex search"
+notmuch search --output=messages mid:/yoom/ > OUTPUT
+test_expect_equal_file cworth.msg-ids OUTPUT
+
+test_begin_subtest "combine regexp mid and subject"
+notmuch search  subject:/-C/ and mid:/y..m/ | notmuch_search_sanitize > OUTPUT
+cat <<EOF > EXPECTED
+thread:XXX   2009-11-18 [1/2] Carl Worth| Jan Janak; [notmuch] [PATCH] Older versions of install do not support -C. (inbox unread)
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
 test_done
--
2.11.0

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

[patch v3 2/2] lib: Add regexp expansion for for tags and paths

In reply to this post by David Bremner-2
From a ui perspective this looks similar to what was already provided
for from, subject, and mid, but the implimentation is quite
different. It uses the database's list of terms to construct a term
based query equivalent to the passed regular expression.
---
 lib/database.cc           | 12 ++++++----
 lib/regexp-fields.cc      | 32 +++++++++++++++++++++-----
 test/T650-regexp-query.sh | 58 ++++++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 89 insertions(+), 13 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index 49b3849c..5b13f541 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -259,12 +259,15 @@ prefix_t prefix_table[] = {
     { "file-direntry", "XFDIRENTRY", NOTMUCH_FIELD_NO_FLAGS },
     { "directory-direntry", "XDDIRENTRY", NOTMUCH_FIELD_NO_FLAGS },
     { "thread", "G", NOTMUCH_FIELD_EXTERNAL },
-    { "tag", "K", NOTMUCH_FIELD_EXTERNAL },
-    { "is", "K", NOTMUCH_FIELD_EXTERNAL },
+    { "tag", "K", NOTMUCH_FIELD_EXTERNAL |
+ NOTMUCH_FIELD_PROCESSOR },
+    { "is", "K", NOTMUCH_FIELD_EXTERNAL |
+        NOTMUCH_FIELD_PROCESSOR },
     { "id", "Q", NOTMUCH_FIELD_EXTERNAL },
     { "mid", "Q", NOTMUCH_FIELD_EXTERNAL |
  NOTMUCH_FIELD_PROCESSOR },
-    { "path", "P", NOTMUCH_FIELD_EXTERNAL },
+    { "path", "P", NOTMUCH_FIELD_EXTERNAL|
+ NOTMUCH_FIELD_PROCESSOR },
     { "property", "XPROPERTY", NOTMUCH_FIELD_EXTERNAL },
     /*
      * Unconditionally add ':' to reduce potential ambiguity with
@@ -272,7 +275,8 @@ prefix_t prefix_table[] = {
      * letters. See Xapian document termprefixes.html for related
      * discussion.
      */
-    { "folder", "XFOLDER:", NOTMUCH_FIELD_EXTERNAL },
+    { "folder", "XFOLDER:", NOTMUCH_FIELD_EXTERNAL |
+ NOTMUCH_FIELD_PROCESSOR },
 #if HAVE_XAPIAN_FIELD_PROCESSOR
     { "date", NULL, NOTMUCH_FIELD_EXTERNAL |
  NOTMUCH_FIELD_PROCESSOR },
diff --git a/lib/regexp-fields.cc b/lib/regexp-fields.cc
index 7ae55e70..084bc8c0 100644
--- a/lib/regexp-fields.cc
+++ b/lib/regexp-fields.cc
@@ -138,7 +138,7 @@ static inline Xapian::valueno _find_slot (std::string prefix)
     else if (prefix == "mid")
  return NOTMUCH_VALUE_MESSAGE_ID;
     else
- throw Xapian::QueryParserError ("unsupported regexp field '" + prefix + "'");
+ return Xapian::BAD_VALUENO;
 }
 
 RegexpFieldProcessor::RegexpFieldProcessor (std::string prefix,
@@ -156,15 +156,35 @@ RegexpFieldProcessor::RegexpFieldProcessor (std::string prefix,
 Xapian::Query
 RegexpFieldProcessor::operator() (const std::string & str)
 {
-    if (str.size () == 0)
- return Xapian::Query(Xapian::Query::OP_AND_NOT,
+    if (str.empty ()) {
+ if (options & NOTMUCH_FIELD_PROBABILISTIC) {
+    return Xapian::Query(Xapian::Query::OP_AND_NOT,
      Xapian::Query::MatchAll,
      Xapian::Query (Xapian::Query::OP_WILDCARD, term_prefix));
+ } else {
+    return Xapian::Query (term_prefix);
+ }
+    }
 
     if (str.at (0) == '/') {
- if (str.at (str.size () - 1) == '/'){
-    RegexpPostingSource *postings = new RegexpPostingSource (slot, str.substr(1,str.size () - 2));
-    return Xapian::Query (postings->release ());
+ if (str.length() > 1 && str.at (str.size () - 1) == '/'){
+    std::string regexp_str = str.substr(1,str.size () - 2);
+    if (slot != Xapian::BAD_VALUENO) {
+ RegexpPostingSource *postings = new RegexpPostingSource (slot, regexp_str);
+ return Xapian::Query (postings->release ());
+    } else {
+ std::vector<std::string> terms;
+ regex_t regexp;
+
+ compile_regex(regexp, regexp_str.c_str ());
+ for (Xapian::TermIterator it = notmuch->xapian_db->allterms_begin (term_prefix);
+     it != notmuch->xapian_db->allterms_end (); ++it) {
+    if (regexec (&regexp, (*it).c_str () + term_prefix.size(),
+ 0, NULL, 0) == 0)
+ terms.push_back(*it);
+ }
+ return Xapian::Query (Xapian::Query::OP_OR, terms.begin(), terms.end());
+    }
  } else {
     throw Xapian::QueryParserError ("unmatched regex delimiter in '" + str + "'");
  }
diff --git a/test/T650-regexp-query.sh b/test/T650-regexp-query.sh
index 27fc9ab9..b7bdda11 100755
--- a/test/T650-regexp-query.sh
+++ b/test/T650-regexp-query.sh
@@ -2,13 +2,54 @@
 test_description='regular expression searches'
 . ./test-lib.sh || exit 1
 
-add_email_corpus
-
-
 if [ $NOTMUCH_HAVE_XAPIAN_FIELD_PROCESSOR -eq 0 ]; then
     test_done
 fi
 
+add_message '[dir]=bad' '[subject]="To the bone"'
+add_message '[dir]=.' '[subject]="Top level"'
+add_message '[dir]=bad/news' '[subject]="Bears"'
+mkdir -p "${MAIL_DIR}/duplicate/bad/news"
+cp "$gen_msg_filename" "${MAIL_DIR}/duplicate/bad/news"
+
+add_message '[dir]=things' '[subject]="These are a few"'
+add_message '[dir]=things/favorite' '[subject]="Raindrops, whiskers, kettles"'
+add_message '[dir]=things/bad' '[subject]="Bites, stings, sad feelings"'
+
+test_begin_subtest "empty path:// search"
+notmuch search 'path:""' > EXPECTED
+notmuch search 'path:/^$/' > OUTPUT
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest "empty folder:// search"
+notmuch search 'folder:""' > EXPECTED
+notmuch search 'folder:/^$/' > OUTPUT
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest "unanchored folder:// specification"
+output=$(notmuch search folder:/bad/ | notmuch_search_sanitize)
+test_expect_equal "$output" "thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; To the bone (inbox unread)
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; Bears (inbox unread)
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; Bites, stings, sad feelings (inbox unread)"
+
+test_begin_subtest "anchored folder:// search"
+output=$(notmuch search 'folder:/^bad$/' | notmuch_search_sanitize)
+test_expect_equal "$output" "thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; To the bone (inbox unread)"
+
+test_begin_subtest "unanchored path:// specification"
+output=$(notmuch search path:/bad/ | notmuch_search_sanitize)
+test_expect_equal "$output" "thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; To the bone (inbox unread)
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; Bears (inbox unread)
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; Bites, stings, sad feelings (inbox unread)"
+
+test_begin_subtest "anchored path:// search"
+output=$(notmuch search 'path:/^bad$/' | notmuch_search_sanitize)
+test_expect_equal "$output" "thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; To the bone (inbox unread)"
+
+# Use "standard" corpus from here on.
+rm -rf $MAIL_DIR
+add_email_corpus
+
 notmuch search --output=messages from:cworth > cworth.msg-ids
 
 # these headers will generate no document terms
@@ -120,4 +161,15 @@ thread:XXX   2009-11-18 [1/2] Carl Worth| Jan Janak; [notmuch] [PATCH] Older ver
 EOF
 test_expect_equal_file EXPECTED OUTPUT
 
+test_begin_subtest "unanchored tag search"
+notmuch search tag:signed or tag:inbox > EXPECTED
+notmuch search tag:/i/ > OUTPUT
+test_expect_equal_file EXPECTED OUTPUT
+
+notmuch tag +testsi '*'
+test_begin_subtest "anchored tag search"
+notmuch search tag:signed > EXPECTED
+notmuch search tag:/^si/ > OUTPUT
+test_expect_equal_file EXPECTED OUTPUT
+
 test_done
--
2.11.0

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: v3 of regexp search for mid/folder/path

In reply to this post by David Bremner-2
David Bremner <[hidden email]> writes:

> No sooner posted than I realized it had a bug: the previous version
> compared against the prefixed term so anchored searches failed.
>
> I've also included some tests for the new features in this version.
>
> Below is an interdiff against v1

Gauteh reported success with these patches on IRC. Anyone want more time
to review?

d
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Tomi Ollila-2 Tomi Ollila-2
Reply | Threaded
Open this post in threaded view
|

Re: v3 of regexp search for mid/folder/path

On Sun, May 07 2017, David Bremner wrote:

> David Bremner <[hidden email]> writes:
>
>> No sooner posted than I realized it had a bug: the previous version
>> compared against the prefixed term so anchored searches failed.
>>
>> I've also included some tests for the new features in this version.
>>
>> Below is an interdiff against v1
>
> Gauteh reported success with these patches on IRC. Anyone want more time
> to review?

Nope, but fix s/implimentation/implementation/ in 2/2 commit message :D

Tomi

>
> d
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: v3 of regexp search for mid/folder/path

Tomi Ollila <[hidden email]> writes:

> On Sun, May 07 2017, David Bremner wrote:
>>
>> Gauteh reported success with these patches on IRC. Anyone want more time
>> to review?
>
> Nope, but fix s/implimentation/implementation/ in 2/2 commit message :D

Done and pushed to master. Ispell provided my morning amusement by
suggesting I replace ui with Uzi.

d
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch