'notmuch search thread:<>' lists multiple threads

classic Classic list List threaded Threaded
9 messages Options
Naveen N. Rao Naveen N. Rao
Reply | Threaded
Open this post in threaded view
|

'notmuch search thread:<>' lists multiple threads

Greetings--
If I search for threads matching a specific thread-id, I am seeing
multiple results:

$ notmuch search --output=threads thread:00000000000c4d20
thread:00000000000c4d1e
thread:00000000000c4d20

If I list the messages from both those threads, they do belong to the
same original mailing list thread. It isn't clear why notmuch is
assigning different thread IDs. Is that to be expected under some
scenarios?

Also, it is a bit weird to see multiple threads being listed when
searching for a specific thread ID. Again, is this something to be
expected?


- Naveen


_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Naveen N. Rao Naveen N. Rao
Reply | Threaded
Open this post in threaded view
|

Re: 'notmuch search thread:<>' lists multiple threads

Naveen N. Rao wrote:
> Greetings--
> If I search for threads matching a specific thread-id, I am seeing
> multiple results:
>
> $ notmuch search --output=threads thread:00000000000c4d20
> thread:00000000000c4d1e
> thread:00000000000c4d20

Expanding on this:

[04/06 15:37:59 ~]$ notmuch search --output=messages thread:00000000000c4d1e
id:[hidden email]
[04/06 15:49:34 ~]$
[04/06 15:38:01 ~]$ notmuch search --output=messages thread:00000000000c4d20
id:[hidden email]
id:[hidden email]
id:[hidden email]
id:[hidden email]
id:[hidden email]
id:[hidden email]
[04/06 15:49:34 ~]$
[04/06 15:49:26 ~]$ notmuch show --format=raw id:[hidden email] | grep -e "In-Reply-To" -e "References" -A2
In-Reply-To:
 <[hidden email]>
References: <[hidden email]>
 <[hidden email]>
 <[hidden email]>
[04/06 15:50:01 ~]$
[04/06 15:50:02 ~]$ notmuch show --format=raw id:[hidden email] | grep -e "In-Reply-To" -e "References" -A1
In-Reply-To: <[hidden email]>
References: <[hidden email]>
 <[hidden email]>


- Naveen


_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: 'notmuch search thread:<>' lists multiple threads

In reply to this post by Naveen N. Rao
"Naveen N. Rao" <[hidden email]> writes:

> Greetings--
> If I search for threads matching a specific thread-id, I am seeing
> multiple results:
>
> $ notmuch search --output=threads thread:00000000000c4d20
> thread:00000000000c4d1e
> thread:00000000000c4d20

This looks like a bug to me. I was able to replicate it in my own mail
store with the script at the end of the message. I haven't completely
analyzed the situation yet, but one thing I noticed is that in all
"bad threads", there are files with duplicate message-ids. Typical
output looks like

╭─ zancas:software/upstream/notmuch/test
╰─ (git)-[master]-% notmuch search thread:000000000001760a
thread:00000000000175e5  November 03 [1/2(3)] [hidden email]; Bug#846042: VTK 8 (unread)
thread:000000000001760a   2016-11-27 [1/2(3)] [hidden email]; Bug#846042: virtual/meta package for python-vtk (unread)

At least some of this mail data is public, but I'm not sure if the bad
threading is reproducible or not; I want to run a complete census
overnight before I reindex.

Even if the bug is non-deterministic, it probably lives in lib/add-message.cc

----------------------------------------------------------------------

count=0
success=0
for id in $(notmuch search --output=threads '*'); do
    count=$((count +1))
    matches=$((`notmuch search --output=threads "$id" | wc -l`))
    if [ "$matches" = 1 ]; then
        success=$((success + 1))
    else
        echo "bad thread: $id"
    fi
    if [ $((count % 1000)) -eq 0 ]; then
        echo $count;
    fi
done

echo "count=$count success=$success"
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: 'notmuch search thread:<>' lists multiple threads

David Bremner <[hidden email]> writes:

> At least some of this mail data is public, but I'm not sure if the bad
> threading is reproducible or not; I want to run a complete census
> overnight before I reindex.
>
> Even if the bug is non-deterministic, it probably lives in lib/add-message.cc

I have a reproducible test for this bug now

  http://pivot.cs.unb.ca/git?p=notmuch.git;a=shortlog;h=refs/heads/fix/thread-search

I still need to analyze the mails a bit more, but it looks like at least
one of the strange results is caused by multiple mail files sharing the
same message-id, but with different References headers (and no
In-Reply-To headers).

d
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

[PATCH] devel: add new tool to draw thread structure

This is useful for understanding the case where different
message-files with the same message-id have distinct reference
headers.
---
 devel/draw-thread | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)
 create mode 100755 devel/draw-thread

diff --git a/devel/draw-thread b/devel/draw-thread
new file mode 100755
index 00000000..628dcff4
--- /dev/null
+++ b/devel/draw-thread
@@ -0,0 +1,35 @@
+#!/bin/bash
+
+# This script can be used like
+# NOTMUCH_CONFIG=test/tmp.T580-thread-search/notmuch-config \
+#    devel/draw-thread thread:0000000000000002 | dot -Tpdf > thread2.pdf
+
+# In addition to notmuch, you will need the following tools installed
+# - graphviz
+# - formail (part of procmail)
+
+threadid=$1
+
+declare -a edges
+
+declare -a dest
+echo "digraph \"$threadid\" {"
+for messageid in $(notmuch search --output=messages $threadid); do
+    echo "subgraph \"cluster_$messageid\" {"
+    printf "\"%s\" [shape=folder];\n" ${messageid#id:}
+    for file in $(notmuch search --output=files $messageid); do
+        node=$(basename $file)
+        printf "\"%s\" [shape=note];\n" $node
+
+        mapfile -t dest < <(formail -x references < $file | tr '<>,' '"" ')
+        edge="\"$node\" -> { ${dest[*]} }"
+        edges+=($edge)
+    done
+    echo "}"
+done
+
+for edge in "${edges[*]}"; do
+    echo $edge
+done
+
+echo "}"
--
2.16.3

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Naveen N. Rao Naveen N. Rao
Reply | Threaded
Open this post in threaded view
|

Re: 'notmuch search thread:<>' lists multiple threads

In reply to this post by David Bremner-2
David Bremner wrote:

> David Bremner <[hidden email]> writes:
>
>> At least some of this mail data is public, but I'm not sure if the bad
>> threading is reproducible or not; I want to run a complete census
>> overnight before I reindex.
>>
>> Even if the bug is non-deterministic, it probably lives in lib/add-message.cc
>
> I have a reproducible test for this bug now
>
>   http://pivot.cs.unb.ca/git?p=notmuch.git;a=shortlog;h=refs/heads/fix/thread-search

Thanks for looking into this.

>
> I still need to analyze the mails a bit more, but it looks like at least
> one of the strange results is caused by multiple mail files sharing the
> same message-id, but with different References headers (and no
> In-Reply-To headers).

In my case, I seem to be having the In-Reply-To headers. I end up with
two files per message: one from my inbox and one from the gmane archive
that I pull in. All the messages from the gmane archive seem to have a
re-written 'In-Reply-To' header, but 'Message-Id' and 'References' are
the same.

In the problematic email thread, all other files/messages get allotted a
single thread except for one of the messages. The offending message has
3 references compared to 1 or 2 references for the rest, but I don't
know if that's relevant here.

- Naveen


_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: 'notmuch search thread:<>' lists multiple threads

"Naveen N. Rao" <[hidden email]> writes:

> In my case, I seem to be having the In-Reply-To headers. I end up with
> two files per message: one from my inbox and one from the gmane archive
> that I pull in. All the messages from the gmane archive seem to have a
> re-written 'In-Reply-To' header, but 'Message-Id' and 'References' are
> the same.

That sounds like essentially the same issue, due to the fact that
notmuch prefers In-Reply-To when choosing a parent for a message.

Currently the database is correct (or at least one not-crazy definition
of correct): all of the reference and in-reply-to terms are attached to
the message document in the database. On the other hand, the in memory
data structures currently assume that In-reply-to is a unique value
(with ties broken at indexing time).

It might be that the solution is to read a list of in-reply-to values
and use all of them in threading. At a quick glance, that looks doable;
I'm just not sure about unintended consequences.

d
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Naveen N. Rao Naveen N. Rao
Reply | Threaded
Open this post in threaded view
|

Re: 'notmuch search thread:<>' lists multiple threads

Hi David,

David Bremner wrote:

> "Naveen N. Rao" <[hidden email]> writes:
>
>> In my case, I seem to be having the In-Reply-To headers. I end up with
>> two files per message: one from my inbox and one from the gmane archive
>> that I pull in. All the messages from the gmane archive seem to have a
>> re-written 'In-Reply-To' header, but 'Message-Id' and 'References' are
>> the same.
>
> That sounds like essentially the same issue, due to the fact that
> notmuch prefers In-Reply-To when choosing a parent for a message.
>
> Currently the database is correct (or at least one not-crazy definition
> of correct): all of the reference and in-reply-to terms are attached to
> the message document in the database. On the other hand, the in memory
> data structures currently assume that In-reply-to is a unique value
> (with ties broken at indexing time).
>
> It might be that the solution is to read a list of in-reply-to values
> and use all of them in threading. At a quick glance, that looks doable;
> I'm just not sure about unintended consequences.

Were you able to look into this again?
Using a list of in-reply-to values sounds like a good option, though I
clearly have no idea about other consequences from that. If you have a
patch, I can help test that.

Thanks,
Naveen


_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: 'notmuch search thread:<>' lists multiple threads

"Naveen N. Rao" <[hidden email]> writes:

>
> Were you able to look into this again?
> Using a list of in-reply-to values sounds like a good option, though I
> clearly have no idea about other consequences from that. If you have a
> patch, I can help test that.
>

Sorry I haven't made any progress on this. Thanks for the reminder.

d
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch