Quantcast

fixes for regexp field processors

classic Classic list List threaded Threaded
7 messages Options
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

fixes for regexp field processors

This series fixes two bugs introduces in notmuch 0.24

1) queries of the form  'from:foo*' and 'subject:bar*' no longer work (the * is just silently dropped)
2) queries of the form 'from:""' and 'subject:""' crashes with an uncaught exception.

Unless there objections, I'll probably roll these patches (along with
perhaps the two memory fixes recently posted) into a bugfix release
0.24.1. Tentative release date early next week.

d


_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 1/4] test: add known broken test for empty from: and subject: query

These queries currently fail with field processors enabled because the
code expects a non-empty string.
---
 test/T650-regexp-query.sh | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/test/T650-regexp-query.sh b/test/T650-regexp-query.sh
index df48ab82..b1d84439 100755
--- a/test/T650-regexp-query.sh
+++ b/test/T650-regexp-query.sh
@@ -11,6 +11,15 @@ fi
 
 notmuch search --output=messages from:cworth > cworth.msg-ids
 
+test_begin_subtest "empty from: search"
+test_subtest_known_broken
+notmuch search --output=messages 'from:""' and from:cworth > OUTPUT
+test_expect_equal_file cworth.msg-ids OUTPUT
+
+test_begin_subtest "empty subject: search"
+test_subtest_known_broken
+notmuch search --output=messages 'subject:""' and from:cworth > OUTPUT
+test_expect_equal_file cworth.msg-ids OUTPUT
 test_begin_subtest "regexp from search, case sensitive"
 notmuch search --output=messages from:/carl/ > OUTPUT
 test_expect_equal_file /dev/null OUTPUT
--
2.11.0

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 2/4] lib: handle empty string in regexp field processors

In reply to this post by David Bremner-2
The non-field processor behaviour is is convert the corresponding
queries into a search for the unprefixed terms. This yields pretty
surprising results so decided to match any message.
---
 lib/regexp-fields.cc      | 3 +++
 test/T650-regexp-query.sh | 2 --
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/lib/regexp-fields.cc b/lib/regexp-fields.cc
index 8e740a81..42239a66 100644
--- a/lib/regexp-fields.cc
+++ b/lib/regexp-fields.cc
@@ -148,6 +148,9 @@ RegexpFieldProcessor::RegexpFieldProcessor (std::string prefix, Xapian::QueryPar
 Xapian::Query
 RegexpFieldProcessor::operator() (const std::string & str)
 {
+    if (str.size () == 0)
+ return Xapian::Query::MatchAll;
+
     if (str.at (0) == '/') {
  if (str.at (str.size () - 1) == '/'){
     RegexpPostingSource *postings = new RegexpPostingSource (slot, str.substr(1,str.size () - 2));
diff --git a/test/T650-regexp-query.sh b/test/T650-regexp-query.sh
index b1d84439..049477b4 100755
--- a/test/T650-regexp-query.sh
+++ b/test/T650-regexp-query.sh
@@ -12,12 +12,10 @@ fi
 notmuch search --output=messages from:cworth > cworth.msg-ids
 
 test_begin_subtest "empty from: search"
-test_subtest_known_broken
 notmuch search --output=messages 'from:""' and from:cworth > OUTPUT
 test_expect_equal_file cworth.msg-ids OUTPUT
 
 test_begin_subtest "empty subject: search"
-test_subtest_known_broken
 notmuch search --output=messages 'subject:""' and from:cworth > OUTPUT
 test_expect_equal_file cworth.msg-ids OUTPUT
 test_begin_subtest "regexp from search, case sensitive"
--
2.11.0

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 3/4] test: add known broken tests wildcard search in from and subject

In reply to this post by David Bremner-2
This was broken by the addition of regexp searching. The detection of
wildcards is not currently done in the recursive call to parse_query,
because of quoting issues.
---
 test/T650-regexp-query.sh | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/test/T650-regexp-query.sh b/test/T650-regexp-query.sh
index 049477b4..ba4a64e0 100755
--- a/test/T650-regexp-query.sh
+++ b/test/T650-regexp-query.sh
@@ -18,6 +18,16 @@ test_expect_equal_file cworth.msg-ids OUTPUT
 test_begin_subtest "empty subject: search"
 notmuch search --output=messages 'subject:""' and from:cworth > OUTPUT
 test_expect_equal_file cworth.msg-ids OUTPUT
+
+test_begin_subtest "xapian wildcard search for from:"
+test_subtest_known_broken
+notmuch search --output=messages 'from:cwo*' > OUTPUT
+test_expect_equal_file cworth.msg-ids OUTPUT
+
+test_begin_subtest "xapian wildcard search for subject:"
+test_subtest_known_broken
+test_expect_equal $(notmuch count 'subject:count*') 1
+
 test_begin_subtest "regexp from search, case sensitive"
 notmuch search --output=messages from:/carl/ > OUTPUT
 test_expect_equal_file /dev/null OUTPUT
--
2.11.0

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[PATCH 4/4] lib: only trigger phrase processing for regexp fields when needed

In reply to this post by David Bremner-2
The argument is that if the string passed to the field processor has
no spaces, then the added quotes won't have any benefit except for
disabling wildcards. But disabling wildcards doesn't seem very useful
in the normal Xapian query parser, since they're stripped before
generating terms anyway. It does mean that the query 'from:"foo*"' will
not be precisely equivalent to 'from:foo' as it is for the non
field-processor version.
---
 lib/regexp-fields.cc      | 10 ++++++++--
 test/T650-regexp-query.sh |  2 --
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/lib/regexp-fields.cc b/lib/regexp-fields.cc
index 42239a66..08c6ccb5 100644
--- a/lib/regexp-fields.cc
+++ b/lib/regexp-fields.cc
@@ -161,8 +161,14 @@ RegexpFieldProcessor::operator() (const std::string & str)
     } else {
  /* TODO replace this with a nicer API level triggering of
  * phrase parsing, when possible */
- std::string quoted='"' + str + '"';
- return parser.parse_query (quoted, NOTMUCH_QUERY_PARSER_FLAGS, term_prefix);
+ std::string query_str;
+
+ if (str.find (' ') != std::string::npos)
+    query_str = '"' + str + '"';
+ else
+    query_str = str;
+
+ return parser.parse_query (query_str, NOTMUCH_QUERY_PARSER_FLAGS, term_prefix);
     }
 }
 #endif
diff --git a/test/T650-regexp-query.sh b/test/T650-regexp-query.sh
index ba4a64e0..c2b99d1b 100755
--- a/test/T650-regexp-query.sh
+++ b/test/T650-regexp-query.sh
@@ -20,12 +20,10 @@ notmuch search --output=messages 'subject:""' and from:cworth > OUTPUT
 test_expect_equal_file cworth.msg-ids OUTPUT
 
 test_begin_subtest "xapian wildcard search for from:"
-test_subtest_known_broken
 notmuch search --output=messages 'from:cwo*' > OUTPUT
 test_expect_equal_file cworth.msg-ids OUTPUT
 
 test_begin_subtest "xapian wildcard search for subject:"
-test_subtest_known_broken
 test_expect_equal $(notmuch count 'subject:count*') 1
 
 test_begin_subtest "regexp from search, case sensitive"
--
2.11.0

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 2/4] lib: handle empty string in regexp field processors

In reply to this post by David Bremner-2
David Bremner <[hidden email]> writes:

> The non-field processor behaviour is is convert the corresponding
> queries into a search for the unprefixed terms. This yields pretty
> surprising results so decided to match any message.
> ---
>  lib/regexp-fields.cc      | 3 +++
>  test/T650-regexp-query.sh | 2 --
>  2 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/lib/regexp-fields.cc b/lib/regexp-fields.cc
> index 8e740a81..42239a66 100644
> --- a/lib/regexp-fields.cc
> +++ b/lib/regexp-fields.cc
> @@ -148,6 +148,9 @@ RegexpFieldProcessor::RegexpFieldProcessor (std::string prefix, Xapian::QueryPar
>  Xapian::Query
>  RegexpFieldProcessor::operator() (const std::string & str)
>  {
> +    if (str.size () == 0)
> + return Xapian::Query::MatchAll;
> +

For things like file:, it actually makes more sense to return
Xapian::Query(term_prefix). I'm leaning towards something like the following

    if (str.size () == 0) {
       if (options & NOTMUCH_FIELD_PROBABILISTIC)
           return Xapian::Query::MatchAll;
       else
           return Xapian::Query(term_prefix);
    }
 

but this patch can stay as is for now, as there are not yet any boolean
fields being matched (mid would be the first).

I thought a bit about unconditionalling returning Xapian::Query
(term_prefix), but I think for the probabilistic (term based) fields,
we'll never add terms for only the prefix, i.e. and empty subject would
just not add any XSUBJECT terms. So that would effectively match no
messages.

d
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [PATCH 3/4] test: add known broken tests wildcard search in from and subject

In reply to this post by David Bremner-2
David Bremner <[hidden email]> writes:

> This was broken by the addition of regexp searching. The detection of
> wildcards is not currently done in the recursive call to parse_query,
> because of quoting issues.

Patches 3 and 4 pushed to release and master.

d
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Loading...