[PATCH RFC] index: add body: search query term

classic Classic list List threaded Threaded
4 messages Options
William Casarin William Casarin
Reply | Threaded
Open this post in threaded view
|

[PATCH RFC] index: add body: search query term

This adds the ability to search specifically on the body

eg.

    notmuch search tag:notmuch and body:PATCH

Signed-off-by: William Casarin <[hidden email]>
---

Hey there,

I'm looking to add the ability to search specifically on the body. I
was poking around in the indexer, added these lines and reindexed a
few tags. It appears to work!

I was just wondering if there's anything I'm missing? That seemed a
bit too easy. I noticed there are some NOTMUCH_FIELDS that I'm not
sure what they do.

If anyone has any xapian knowledge that could shine some insight into
what the next steps might be, if any.

Thanks!
Will


 lib/database.cc | 3 +++
 lib/index.cc    | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/lib/database.cc b/lib/database.cc
index 9cf8062c..0b085b21 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -297,6 +297,9 @@ prefix_t prefix_table[] = {
     { "subject", "XSUBJECT", NOTMUCH_FIELD_EXTERNAL |
  NOTMUCH_FIELD_PROBABILISTIC |
  NOTMUCH_FIELD_PROCESSOR},
+    { "body", "XBODY", NOTMUCH_FIELD_EXTERNAL |
+ NOTMUCH_FIELD_PROBABILISTIC |
+ NOTMUCH_FIELD_PROCESSOR},
 };
 
 static void
diff --git a/lib/index.cc b/lib/index.cc
index 3f694387..299b8770 100644
--- a/lib/index.cc
+++ b/lib/index.cc
@@ -506,7 +506,7 @@ _index_mime_part (notmuch_message_t *message,
     body = (char *) g_byte_array_free (byte_array, false);
 
     if (body) {
- _notmuch_message_gen_terms (message, NULL, body);
+ _notmuch_message_gen_terms (message, "body", body);
 
  free (body);
     }
--
2.19.0

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH RFC] index: add body: search query term

William Casarin <[hidden email]> writes:

>
>  lib/database.cc | 3 +++
>  lib/index.cc    | 2 +-
>  2 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/lib/database.cc b/lib/database.cc
> index 9cf8062c..0b085b21 100644
> --- a/lib/database.cc
> +++ b/lib/database.cc
> @@ -297,6 +297,9 @@ prefix_t prefix_table[] = {
>      { "subject", "XSUBJECT", NOTMUCH_FIELD_EXTERNAL |
>   NOTMUCH_FIELD_PROBABILISTIC |
>   NOTMUCH_FIELD_PROCESSOR},
> +    { "body", "XBODY", NOTMUCH_FIELD_EXTERNAL |
> + NOTMUCH_FIELD_PROBABILISTIC |
> + NOTMUCH_FIELD_PROCESSOR},
>  };
>  
>  static void
> diff --git a/lib/index.cc b/lib/index.cc
> index 3f694387..299b8770 100644
> --- a/lib/index.cc
> +++ b/lib/index.cc
> @@ -506,7 +506,7 @@ _index_mime_part (notmuch_message_t *message,
>      body = (char *) g_byte_array_free (byte_array, false);
>  
>      if (body) {
> - _notmuch_message_gen_terms (message, NULL, body);
> + _notmuch_message_gen_terms (message, "body", body);
>  
>   free (body);
>      }
> --

I think you'll find you broke non-prefixed queries. Does the test suite
still pass? If so, we need more tests. Anyway, if you add a second set
of terms I'd be intersted how much this bloats the index. Ideally with
the performance corpus so we can all reproduce the experiment.

d
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
William Casarin William Casarin
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH RFC] index: add body: search query term

David Bremner <[hidden email]> writes:

> William Casarin <[hidden email]> writes:

> I think you'll find you broke non-prefixed queries. Does the test suite
> still pass? If so, we need more tests.

yeah they seem to pass. but you're right, something seems a bit off:

    ./notmuch count subject:github or body:github and tag:notmuch
    3271

    ./notmuch count github and tag:notmuch
    665

> of terms I'd be intersted how much this bloats the index. Ideally with
> the performance corpus so we can all reproduce the experiment.

sounds good, I was wondering that as well.

I wonder if it's all worth the effort though, since a workaround could
be:

    notmuch search <query> and not subject:<query>

If it's too annoying to have a body prefix, due to index bloat or
performance issues, would doing something hacky such as translating
'body:<query>' to '<query> and not subject:<query>' make sense?

Will

--
https://jb55.com
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
William Casarin William Casarin
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH RFC] index: add body: search query term

William Casarin <[hidden email]> writes:

> I wonder if it's all worth the effort though, since a workaround could
> be:
>
>     notmuch search <query> and not subject:<query>
>
> If it's too annoying to have a body prefix, due to index bloat or
> performance issues, would doing something hacky such as translating
> 'body:<query>' to '<query> and not subject:<query>' make sense?

Thinking about this some more, this is not exactly the same, since this
would explicitly exclude subjects, whereas the body query wouldn't care
what the subject was.

--
https://jb55.com
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch