how to search for Morse code?

classic Classic list List threaded Threaded
15 messages Options
Gregor Zattler Gregor Zattler
Reply | Threaded
Open this post in threaded view
|

how to search for Morse code?

Hello,

today I searched for emails containing

-... --- .-. . -.. ..--.. ...-.-

tried with

notmuch search "-... --- .-. . -.. ..--.. ...-.-"

and

notmuch search '-... --- .-. . -.. ..--.. ...-.-'

and even

notmuch search '"-... --- .-. . -.. ..--.. ...-.-"'

and also with double dashes in front of the search term:

notmuch search -- "-... --- .-. . -.. ..--.. ...-.-"


All these searches produce

notmuch search: A Xapian exception occurred
A Xapian exception occurred parsing query: Unknown range operation
Query string was: "-... --- .-. . -.. ..--.. ...-.-"


Is it possible to search for emails containing my supposedly
funny signature?


Obviously this is not much of a problem for me, but perhaps I hit
a hidden bug?


Ciao; Gregor
--
 -... --- .-. . -.. ..--.. ...-.-

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Ben Oliver Ben Oliver
Reply | Threaded
Open this post in threaded view
|

Re: how to search for Morse code?

On 18-07-23 14:20:41, Gregor Zattler wrote:
>Hello,
>
>today I searched for emails containing
>
>-... --- .-. . -.. ..--.. ...-.-
>

Heh

I suppose the problem is that xapian won't take two periods ".." even in
quotes.

I asked on their IRC about how to escape it but it's quiet

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch

signature.asc (499 bytes) Download Attachment
Ben Oliver Ben Oliver
Reply | Threaded
Open this post in threaded view
|

Re: how to search for Morse code?

On 18-07-23 15:16:07, Ben Oliver wrote:

>On 18-07-23 14:20:41, Gregor Zattler wrote:
>>Hello,
>>
>>today I searched for emails containing
>>
>>-... --- .-. . -.. ..--.. ...-.-
>>
>
>Heh
>
>I suppose the problem is that xapian won't take two periods ".." even
>in quotes.
>
>I asked on their IRC about how to escape it but it's quiet
So it seems like morse code would not be indexed, which makes sense.

Sorry!

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch

signature.asc (499 bytes) Download Attachment
Gregor Zattler Gregor Zattler
Reply | Threaded
Open this post in threaded view
|

how to search for hyphenated words? (was: how to search for Morse code?)

In reply to this post by Gregor Zattler
Hello,
* Gregor Zattler <[hidden email]> [2018-07-23; 14:20]:
> today I searched for emails containing
>
> -... --- .-. . -.. ..--.. ...-.-

today I searched for emails containing "org-notmuch" (which
supports org links to notmuch searches), e.g. with

notmuch search org-notmuch
notmuch search -- org-notmuch
notmuch search -- "org-notmuch"
notmuch search -- '"org-notmuch"'
notmuch search -- '+"org-notmuch"'
notmuch search -- org ADJ/1 notmuch

all these resulted in very many hits most or all of which do not
contain the string "org-notmuch", one found email was e.g.

id:[hidden email]


How would one search for hyphenated words with notmuch?

Ciao; Gregor
--
 -... --- .-. . -.. ..--.. ...-.-

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Carl Worth-2 Carl Worth-2
Reply | Threaded
Open this post in threaded view
|

Re: how to search for hyphenated words? (was: how to search for Morse code?)

Hi Gregor,

The trick here is that when notmuch is indexing body text it feeds it
into a Xapian function that parses the text by finding "terms" in the
text. And this parser considers both punctuation and whitespace as
separators between terms.

So your messages are not being indexed in a way to let you distinguish
between "org notmuch" and "org-notmuch".

(Of note, the query parser applies the same parsing to your query---so
that even when you think you're typing an exact phrase like
"org-notmuch" that gets parsed into separate terms "org" and "notmuch"
for searching.)

> all these resulted in very many hits most or all of which do not
> contain the string "org-notmuch", one found email was e.g.
>
> id:[hidden email]

That message does contain the following:

   +test_emacs '(notmuch-tree "id:[hidden email]")
   +           (notmuch-test-wait)

Where you will notice that there's a term "org" followed (after some
punctuation and whitespace separators) by a term "notmuch".

> How would one search for hyphenated words with notmuch?

You would need to arrange to have the indexer consider the hyphen as a
letter-like character to be made part of terms. Or be extra clever and
index something like "notmuch-test-wait" in multiple ways (such as a
single term "notmuch-test-wait" as well as three adjacent terms
"notmuch", "test", and "wait" as notmuch is doing currently).

-Carl

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch

signature.asc (847 bytes) Download Attachment
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: how to search for hyphenated words? (was: how to search for Morse code?)

In reply to this post by Gregor Zattler
Gregor Zattler <[hidden email]> writes:

>
> How would one search for hyphenated words with notmuch?
>

In special cases, explained in notmuch-search-terms(7), one can use
regexp searches, which are slower, but don't drop punctuation.

d
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Gregor Zattler Gregor Zattler
Reply | Threaded
Open this post in threaded view
|

Re: how to search for hyphenated words? (was: how to search for Morse code?)

Hi David, notmuch developers,
* David Bremner <[hidden email]> [2019-03-10; 20:22]:
> Gregor Zattler <[hidden email]> writes:
>> How would one search for hyphenated words with notmuch?
>>
>
> In special cases, explained in notmuch-search-terms(7), one can use
> regexp searches, which are slower, but don't drop punctuation.

thanks, this works for the subject: field, which helps a lot.

Regexes do not work on the body of messages and I assume they
will not work with the upcoming "body:" field?


Thanks for your attention, Gregor


_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: how to search for hyphenated words? (was: how to search for Morse code?)

In reply to this post by David Bremner-2
Gregor Zattler <[hidden email]> writes:

> Hi David, notmuch developers,
> * David Bremner <[hidden email]> [2019-03-10; 20:22]:
>> Gregor Zattler <[hidden email]> writes:
>>> How would one search for hyphenated words with notmuch?
>>>
>>
>> In special cases, explained in notmuch-search-terms(7), one can use
>> regexp searches, which are slower, but don't drop punctuation.
>
> thanks, this works for the subject: field, which helps a lot.
>
> Regexes do not work on the body of messages and I assume they
> will not work with the upcoming "body:" field?

That's correct.

d
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Matt Armstrong Matt Armstrong
Reply | Threaded
Open this post in threaded view
|

Re: how to search for hyphenated words? (was: how to search for Morse code?)

In reply to this post by Carl Worth-2
Carl Worth <[hidden email]> writes:

> Hi Gregor,
>
> The trick here is that when notmuch is indexing body text it feeds it
> into a Xapian function that parses the text by finding "terms" in the
> text. And this parser considers both punctuation and whitespace as
> separators between terms.

I notice that Xapian supports something called "phrase searches",
documented as:

  "A phrase surrounded with double quotes ("") matches documents
  containing that exact phrase. Hyphenated words are also treated as
  phrases, as are cases such as filenames and email addresses
  (e.g. /etc/passwd or [hidden email])."

I assume that this particular Xapian feature is unavailable in notmuch?
If so, I wonder if enabling has ever been considered?

Being able to "drop down" to do things like exact phrase matches is one
reason why I use notmuch, because the precision sometimes matters.  I
currently do this by fetching the mail message itself and using
old-school mail processing tools on the message file.
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: how to search for hyphenated words? (was: how to search for Morse code?)

Matt Armstrong <[hidden email]> writes:

> Carl Worth <[hidden email]> writes:
>
>> Hi Gregor,
>>
>> The trick here is that when notmuch is indexing body text it feeds it
>> into a Xapian function that parses the text by finding "terms" in the
>> text. And this parser considers both punctuation and whitespace as
>> separators between terms.
>
> I notice that Xapian supports something called "phrase searches",
> documented as:
>
>   "A phrase surrounded with double quotes ("") matches documents
>   containing that exact phrase. Hyphenated words are also treated as
>   phrases, as are cases such as filenames and email addresses
>   (e.g. /etc/passwd or [hidden email])."
>
> I assume that this particular Xapian feature is unavailable in notmuch?
> If so, I wonder if enabling has ever been considered?

It is enabled, and documented in notmuch-search-terms(7). Unfortunately
I don't think it's related to the original request. The mention of
hyphenated words is about the input to the query parser, not the
(necessarily) the retrieved text.

d

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Gregor Zattler Gregor Zattler
Reply | Threaded
Open this post in threaded view
|

Re: how to search for hyphenated words? (was: how to search for Morse code?)

Hi David, Matt, Carl, notmuch developers,
* David Bremner <[hidden email]> [2019-03-11; 22:13]:

> Matt Armstrong <[hidden email]> writes:
>> Carl Worth <[hidden email]> writes:
>>> The trick here is that when notmuch is indexing body text it feeds it
>>> into a Xapian function that parses the text by finding "terms" in the
>>> text. And this parser considers both punctuation and whitespace as
>>> separators between terms.
>>
>> I notice that Xapian supports something called "phrase searches",
>> documented as:
>>
>>   "A phrase surrounded with double quotes ("") matches documents
>>   containing that exact phrase. Hyphenated words are also treated as
>>   phrases, as are cases such as filenames and email addresses
>>   (e.g. /etc/passwd or [hidden email])."
>>
>> I assume that this particular Xapian feature is unavailable in notmuch?
>> If so, I wonder if enabling has ever been considered?
>
> It is enabled, and documented in notmuch-search-terms(7). Unfortunately
> I don't think it's related to the original request. The mention of
> hyphenated words is about the input to the query parser, not the
> (necessarily) the retrieved text.
what I do not understand is that it dosn't matter if I search for

org-notmuch

or

"org-notmuch"

'"org-notmuch"'

or even

org ADJ/1 notmuch

$ notmuch count --output=messages '"org-notmuch"'
581
$ notmuch count --output=messages 'org-notmuch'
581
$ notmuch count --output=messages org-notmuch
581
$ notmuch count --output=messages org ADJ/1 notmuch
581

a typical example of a matched message is the attached one.
Somehow the search matches the address of this very mailing list
in the body of the email (I assume).


But obviously there are much more emails with this address in
them:

$ notmuch count --output=messages '[hidden email]'
27396
$ notmuch count --output=messages '"[hidden email]"'
27396

Or with a naive search (no decoding of possible base64 encoded
parts) there are

$ find /home/grfz/Mail/~ml/[hidden email] /home/grfz/Mail/~ml/[hidden email]* -type f -print0 | xargs -0r grep -l -- '[hidden email]' | xargs -IXXXX sh -c "cat XXXX | sed -e '1,/^$/ d' | grep -c [hidden email] " | egrep -c "1|2|3|4|5|6|7|8|9"
16795

emails with the address at least once in the body.


Therefore I wonder why notmuch matches 581 messages.



A naive search for org-notmuch on the files (no decoding of
possible base64 encoded parts) only shows 79 files (77 unique
emails):

mkdir -vp /tmp/test/{cur,new,tmp}

$ find /home/grfz/Mail/~ml/[hidden email] /home/grfz/Mail/~ml/[hidden email]* -type f -print0 | xargs -0r grep -l -- 'org-notmuch' | xargs ln -vs --target-directory=/tmp/kolp/cur/ | wc -l
79


Therefore I wonder why notmuch matches 581 messages, not 16795
messages or 77 messages.


Somehow these numbers do not fit!?


Ciao; Gregor
--
 -... --- .-. . -.. ..--.. ...-.-

Date: Thu, 28 Dec 2017 21:04:52 -0500
From: Maxim Cournoyer <[hidden email]>
To: [hidden email]
Subject: Re: Gnus and emails sent by me
----------------------------------------------------------
Date: Thu, 28 Dec 2017 22:00:56 -0400
From: David Bremner <[hidden email]>
To: David Edmondson <[hidden email]>, [hidden email]
Subject: Re: Xapian exception leading to database corruption
----------------------------------------------------------


_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: how to search for hyphenated words? (was: how to search for Morse code?)

Gregor Zattler <[hidden email]> writes:


> From: [hidden email] (Cron Daemon)
> Subject: Cron <grfz@len> ~/bin/mailwiederdurchschleusen
> To: root@localhost
> Date: Fri, 29 Dec 2017 17:00:09 +0100
>
> Date: Thu, 28 Dec 2017 21:04:52 -0500
> From: Maxim Cournoyer <[hidden email]>
> To: [hidden email]
> Subject: Re: Gnus and emails sent by me
> ----------------------------------------------------------
> Date: Thu, 28 Dec 2017 22:00:56 -0400
> From: David Bremner <[hidden email]>
> To: David Edmondson <[hidden email]>, [hidden email]
> Subject: Re: Xapian exception leading to database corruption
> ----------------------------------------------------------

The line

To: David Edmondson <[hidden email]>, [hidden email]

contains the phrase "org notmuch". You can see this easier by stripping
all the punctuation.
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Carl Worth-2 Carl Worth-2
Reply | Threaded
Open this post in threaded view
|

Re: how to search for hyphenated words? (was: how to search for Morse code?)

In reply to this post by Gregor Zattler
On Tue, Mar 12 2019, Gregor Zattler wrote:

> what I do not understand is that it dosn't matter if I search for
>
> org-notmuch
>
> or
>
> "org-notmuch"
>
> '"org-notmuch"'
>
> or even
>
> org ADJ/1 notmuch
Correct. All four of those forms are giving you phrase searches, (so a
term "org" followed immediately by a term "notmuch").

> a typical example of a matched message is the attached one.
> Somehow the search matches the address of this very mailing list
> in the body of the email (I assume).

No, I don't think you are seeing a match on the mailing-list address
itself, (which has "notmuch" two terms before "org").

> Therefore I wonder why notmuch matches 581 messages, not 16795
> messages or 77 messages.

David showed you one example from the message you copied:

> To: David Edmondson <[hidden email]>, [hidden email]

And I showed one earlier in the thread.

In each case, the message includes "org" followed (after some amount of
punctuation and whitespace, perhaps including newlines) by "notmuch".

-Carl

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch

signature.asc (847 bytes) Download Attachment
Gregor Zattler Gregor Zattler
Reply | Threaded
Open this post in threaded view
|

Re: how to search for hyphenated words? (was: how to search for Morse code?)

In reply to this post by David Bremner-2
Hi David,
* David Bremner <[hidden email]> [2019-03-12; 07:41]:

> Gregor Zattler <[hidden email]> writes:
>
>
>> From: [hidden email] (Cron Daemon)
>> Subject: Cron <grfz@len> ~/bin/mailwiederdurchschleusen
>> To: root@localhost
>> Date: Fri, 29 Dec 2017 17:00:09 +0100
>>
>> Date: Thu, 28 Dec 2017 21:04:52 -0500
>> From: Maxim Cournoyer <[hidden email]>
>> To: [hidden email]
>> Subject: Re: Gnus and emails sent by me
>> ----------------------------------------------------------
>> Date: Thu, 28 Dec 2017 22:00:56 -0400
>> From: David Bremner <[hidden email]>
>> To: David Edmondson <[hidden email]>, [hidden email]
>> Subject: Re: Xapian exception leading to database corruption
>> ----------------------------------------------------------
>
> The line
>
> To: David Edmondson <[hidden email]>, [hidden email]
>
> contains the phrase "org notmuch". You can see this easier by stripping
> all the punctuation.


Thanks, now I see (the light :-)

Ciao; Gregor
--
 -... --- .-. . -.. ..--.. ...-.-

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Matt Armstrong Matt Armstrong
Reply | Threaded
Open this post in threaded view
|

Re: how to search for hyphenated words? (was: how to search for Morse code?)

In reply to this post by David Bremner-2
David Bremner <[hidden email]> writes:

> Matt Armstrong <[hidden email]> writes:
>
>> Carl Worth <[hidden email]> writes:
>>
>>> Hi Gregor,
>>>
>>> The trick here is that when notmuch is indexing body text it feeds it
>>> into a Xapian function that parses the text by finding "terms" in the
>>> text. And this parser considers both punctuation and whitespace as
>>> separators between terms.
>>
>> I notice that Xapian supports something called "phrase searches",
>> documented as:
>>
>>   "A phrase surrounded with double quotes ("") matches documents
>>   containing that exact phrase. Hyphenated words are also treated as
>>   phrases, as are cases such as filenames and email addresses
>>   (e.g. /etc/passwd or [hidden email])."
>>
>> I assume that this particular Xapian feature is unavailable in notmuch?
>> If so, I wonder if enabling has ever been considered?
>
> It is enabled, and documented in notmuch-search-terms(7). Unfortunately
> I don't think it's related to the original request. The mention of
> hyphenated words is about the input to the query parser, not the
> (necessarily) the retrieved text.

Ah, so it boils down to the Xapian definition of "exact phrase."
Notably, "exact phrase" is not "identical sequence of characters" as
some people might expect.

Quick tests with various search engines reveal their phrase search as
operating the same way.  E.g. searching for "org notmuch" finds all
sorts of results:

  org-notmuch.el
  notmuchmail.org/notmuch-emacs/
  to:[hidden email] notmuch tag +inbox +unread -new
  (require 'org-notmuch nil t)
  https://notmuchmail.org/notmuch-emacs/. *
  imaps://mail.example.org/Notmuch/search

For what it is worth, one thing I've taken to doing is using period
separators in the notmuch phrase searches I use in scripts and even
interactively.  Using periods is generally immune to confusing issues
related to quoting double quoted things, and always remains a single
shell "word."  They are also, most often, clearly not the exact content
I'm searching for, so they make it clear than the match algorithm is
inexact.  E.g.

  subject:notmuch.is.wonderful

instead of:

  subject:"notmuch is wonderful"
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch