Duplicate message ids

classic Classic list List threaded Threaded
9 messages Options
Mark Walters Mark Walters
Reply | Threaded
Open this post in threaded view
|

Duplicate message ids


Hi

A concern for notmuch is some form of attack via someone sending a
message with a duplicate message id. I think I have seen it asserted
that it is not so much of a problem as the first message received by
notmuch will have priority.

However, I believe that this is not the case. The script below gives a
demonstration (on my system at least). I have written it as a "test" so
(I hope) it doesn't mess up the system. It should work if you put it in
the test directory and execute it.

It adds a message, runs notmuch new, adds a message with the same id,
runs notmuch new, and then runs notmuch search <id> and notmuch show
<id>. The former shows the subject of the first message, and the latter
the subject of the second message.

Best wishes

Mark


_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch

T710-duplicate-id.sh (624 bytes) Download Attachment
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

[PATCH] test: check for subject consistency between search and show

In [1] Mark showed that the the current code (d7a49e81) is not
consistent in it's handling of subjects of messages with duplicate
message-ids (or in notmuch-speak, of messages with multiple files).
notmuch-search uses indexing order and explicitedly preserves the
first. notmuch-show (apparently) uses alphabetical (or at least xapian
term order) of filenames. In a perfect world we would probably report
all subjects in the json output; at the very least we should be
consistent.

[1]: id:[hidden email]
---
 test/T670-duplicate-mid.sh | 62 +++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 56 insertions(+), 6 deletions(-)

diff --git a/test/T670-duplicate-mid.sh b/test/T670-duplicate-mid.sh
index decbc0a4..55ebde38 100755
--- a/test/T670-duplicate-mid.sh
+++ b/test/T670-duplicate-mid.sh
@@ -12,6 +12,29 @@ EOF
 notmuch search id:duplicate | notmuch_search_sanitize > OUTPUT
 test_expect_equal_file EXPECTED OUTPUT
 
+test_begin_subtest 'First subject preserved in notmuch-show (json)'
+output=$(notmuch show --body=false --format=json id:duplicate | notmuch_json_show_sanitize)
+expected='[[[{
+    "id": "XXXXX",
+    "match": true,
+    "excluded": false,
+    "filename": [
+        "'"${MAIL_DIR}"/copy1'",
+        "'"${MAIL_DIR}"/copy2'"
+    ],
+    "timestamp": 42,
+    "date_relative": "2001-01-05",
+    "tags": ["inbox","unread"],
+    "headers": {
+        "Subject": "message 1",
+        "From": "Notmuch Test Suite <[hidden email]>",
+        "To": "Notmuch Test Suite <[hidden email]>",
+        "Date": "GENERATED_DATE"
+    }
+ },
+[]]]]'
+test_expect_equal_json "$output" "$expected"
+
 test_begin_subtest 'Search for second subject'
 cat <<EOF >EXPECTED
 MAIL_DIR/copy1
@@ -37,25 +60,52 @@ notmuch reindex '*'
 notmuch search --output=files "sekrit" | notmuch_dir_sanitize > OUTPUT
 test_expect_equal_file EXPECTED OUTPUT
 
-rm ${MAIL_DIR}/copy1
+add_message '[id]="duplicate"' '[subject]="reorder file names"' '[filename]=00-copy4'
+test_begin_subtest 'First subject preserved in notmuch-show (json), file order'
+test_subtest_known_broken
+output=$(notmuch show --body=false --format=json id:duplicate | notmuch_json_show_sanitize)
+expected='[[[{
+    "id": "XXXXX",
+    "match": true,
+    "excluded": false,
+    "filename": [
+        "'"${MAIL_DIR}"/00-copy4'",
+        "'"${MAIL_DIR}"/copy1'",
+        "'"${MAIL_DIR}"/copy2'"
+    ],
+    "timestamp": 42,
+    "date_relative": "2001-01-05",
+    "tags": ["inbox","unread"],
+    "headers": {
+        "Subject": "message 1",
+        "From": "Notmuch Test Suite <[hidden email]>",
+        "To": "Notmuch Test Suite <[hidden email]>",
+        "Date": "GENERATED_DATE"
+    }
+ },
+[]]]]'
+test_expect_equal_json "$output" "$expected"
+
+rm ${MAIL_DIR}/00-copy4
 test_begin_subtest 'Deleted first duplicate file does not stop notmuch show from working'
-output=$(notmuch show --body=false --format=json id:duplicate)
+output=$(notmuch show --body=false --format=json id:duplicate | notmuch_json_show_sanitize)
 expected='[[[{
-    "id": "'duplicate'",
+    "id": "XXXXX",
     "match": true,
     "excluded": false,
     "filename": [
+        "'"${MAIL_DIR}"/00-copy4'",
         "'"${MAIL_DIR}"/copy1'",
         "'"${MAIL_DIR}"/copy2'"
     ],
-    "timestamp": 978709435,
+    "timestamp": 42,
     "date_relative": "2001-01-05",
     "tags": ["inbox","unread"],
     "headers": {
-        "Subject": "message 2",
+        "Subject": "message 1",
         "From": "Notmuch Test Suite <[hidden email]>",
         "To": "Notmuch Test Suite <[hidden email]>",
-        "Date": "Fri, 05 Jan 2001 15:43:55 +0000"
+        "Date": "GENERATED_DATE"
     }
  },
 []]]]'
--
2.14.1

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Mark Walters Mark Walters
Reply | Threaded
Open this post in threaded view
|

[PATCH] test: duplicate mids: add an extra broken search test

---

Hi

Many thanks to bremner for the parent patch. I thought it might be
worth adding a search test after the broken show test demonstrating
the distinction between show and search. When I added it I found it
actually gives the other file as the answer!

This applies on top of the parent patch.

Best wishes

Mark


test/T670-duplicate-mid.sh | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/test/T670-duplicate-mid.sh b/test/T670-duplicate-mid.sh
index 55ebde3..0de2bc0 100755
--- a/test/T670-duplicate-mid.sh
+++ b/test/T670-duplicate-mid.sh
@@ -86,6 +86,14 @@ expected='[[[{
 []]]]'
 test_expect_equal_json "$output" "$expected"
 
+test_begin_subtest 'First subject preserved in notmuch-search file order'
+test_subtest_known_broken
+notmuch search id:duplicate | notmuch_search_sanitize > OUTPUT
+cat <<EOF > EXPECTED
+thread:XXX   2001-01-05 [1/1(3)] Notmuch Test Suite; message 1 (inbox unread)
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
 rm ${MAIL_DIR}/00-copy4
 test_begin_subtest 'Deleted first duplicate file does not stop notmuch show from working'
 output=$(notmuch show --body=false --format=json id:duplicate | notmuch_json_show_sanitize)
--
2.1.4

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

[PATCH 1/5] test: make fallback to duplicate test more robust.

The original intent of this test was to verify that notmuch show was
not crashing when the first file (where headers are being read from)
was deleted. Run the output through some sanitization so that as we
add and delete copies we don't have to update this test.
---
 test/T670-duplicate-mid.sh | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/test/T670-duplicate-mid.sh b/test/T670-duplicate-mid.sh
index decbc0a4..3495e63d 100755
--- a/test/T670-duplicate-mid.sh
+++ b/test/T670-duplicate-mid.sh
@@ -39,23 +39,24 @@ test_expect_equal_file EXPECTED OUTPUT
 
 rm ${MAIL_DIR}/copy1
 test_begin_subtest 'Deleted first duplicate file does not stop notmuch show from working'
-output=$(notmuch show --body=false --format=json id:duplicate)
+output=$(notmuch show --body=false --format=json id:duplicate |
+     notmuch_json_show_sanitize | sed 's/message [0-9]/A_SUBJECT/')
 expected='[[[{
-    "id": "'duplicate'",
+    "id": "XXXXX",
     "match": true,
     "excluded": false,
     "filename": [
         "'"${MAIL_DIR}"/copy1'",
         "'"${MAIL_DIR}"/copy2'"
     ],
-    "timestamp": 978709435,
+    "timestamp": 42,
     "date_relative": "2001-01-05",
     "tags": ["inbox","unread"],
     "headers": {
-        "Subject": "message 2",
+        "Subject": "A_SUBJECT",
         "From": "Notmuch Test Suite <[hidden email]>",
         "To": "Notmuch Test Suite <[hidden email]>",
-        "Date": "Fri, 05 Jan 2001 15:43:55 +0000"
+        "Date": "GENERATED_DATE"
     }
  },
 []]]]'
--
2.13.2

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

[PATCH 2/5] test/duplicate-mid: clarify index order vs filename order

The existing test for notmuch search had the first in filename order
the same as the first indexed, which made it harder to understand what
the underlying behaviour is. Add a file with a lexicographically
smaller name, but later index time to clarify this.
---
 test/T670-duplicate-mid.sh | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/test/T670-duplicate-mid.sh b/test/T670-duplicate-mid.sh
index 3495e63d..d2f89432 100755
--- a/test/T670-duplicate-mid.sh
+++ b/test/T670-duplicate-mid.sh
@@ -5,15 +5,17 @@ test_description="duplicate message ids"
 add_message '[id]="duplicate"' '[subject]="message 1" [filename]=copy1'
 add_message '[id]="duplicate"' '[subject]="message 2" [filename]=copy2'
 
-test_begin_subtest 'First subject preserved'
+add_message '[id]="duplicate"' '[subject]="message 0" [filename]=copy0'
+test_begin_subtest 'search: first indexed subject preserved'
 cat <<EOF > EXPECTED
-thread:XXX   2001-01-05 [1/1(2)] Notmuch Test Suite; message 1 (inbox unread)
+thread:XXX   2001-01-05 [1/1(3)] Notmuch Test Suite; message 1 (inbox unread)
 EOF
 notmuch search id:duplicate | notmuch_search_sanitize > OUTPUT
 test_expect_equal_file EXPECTED OUTPUT
 
 test_begin_subtest 'Search for second subject'
 cat <<EOF >EXPECTED
+MAIL_DIR/copy0
 MAIL_DIR/copy1
 MAIL_DIR/copy2
 EOF
@@ -23,6 +25,7 @@ test_expect_equal_file EXPECTED OUTPUT
 add_message '[id]="duplicate"' '[body]="sekrit" [filename]=copy3'
 test_begin_subtest 'search for body in duplicate file'
 cat <<EOF >EXPECTED
+MAIL_DIR/copy0
 MAIL_DIR/copy1
 MAIL_DIR/copy2
 MAIL_DIR/copy3
@@ -37,7 +40,7 @@ notmuch reindex '*'
 notmuch search --output=files "sekrit" | notmuch_dir_sanitize > OUTPUT
 test_expect_equal_file EXPECTED OUTPUT
 
-rm ${MAIL_DIR}/copy1
+rm ${MAIL_DIR}/copy0
 test_begin_subtest 'Deleted first duplicate file does not stop notmuch show from working'
 output=$(notmuch show --body=false --format=json id:duplicate |
      notmuch_json_show_sanitize | sed 's/message [0-9]/A_SUBJECT/')
@@ -46,6 +49,7 @@ expected='[[[{
     "match": true,
     "excluded": false,
     "filename": [
+        "'"${MAIL_DIR}"/copy0'",
         "'"${MAIL_DIR}"/copy1'",
         "'"${MAIL_DIR}"/copy2'"
     ],
--
2.13.2

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

[PATCH 3/5] test: known broken test for subject after reindexing

In reply to this post by David Bremner-2
In [1], Mark gave a test that was behaving strangly. This turns out to
be specific to reindexing.  I suppose one could argue that picking the
lexicographically last file name is a defensible choice, but it's
almost as easy to take the first, which seems more intuitive. So mark
the current situation as broken.

[1]: id:[hidden email]
---
 test/T670-duplicate-mid.sh | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/test/T670-duplicate-mid.sh b/test/T670-duplicate-mid.sh
index d2f89432..d4c1d1c2 100755
--- a/test/T670-duplicate-mid.sh
+++ b/test/T670-duplicate-mid.sh
@@ -40,6 +40,14 @@ notmuch reindex '*'
 notmuch search --output=files "sekrit" | notmuch_dir_sanitize > OUTPUT
 test_expect_equal_file EXPECTED OUTPUT
 
+test_begin_subtest 'reindex choses subject from first filename'
+test_subtest_known_broken
+cat <<EOF > EXPECTED
+thread:XXX   2001-01-05 [1/1(3)] Notmuch Test Suite; message 0 (inbox unread)
+EOF
+notmuch search id:duplicate | notmuch_search_sanitize > OUTPUT
+test_expect_equal_file EXPECTED OUTPUT
+
 rm ${MAIL_DIR}/copy0
 test_begin_subtest 'Deleted first duplicate file does not stop notmuch show from working'
 output=$(notmuch show --body=false --format=json id:duplicate |
--
2.13.2

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

[PATCH 4/5] lib: enforce that n_message_reindex takes headers from first file

In reply to this post by David Bremner-2
This is still a bit stopgap to be only choosing one set of headers,
but this seems like a more defensible set of headers to choose.
---
 lib/message.cc             | 4 +++-
 test/T670-duplicate-mid.sh | 1 -
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/lib/message.cc b/lib/message.cc
index 98a88d3a..051362d4 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -2012,7 +2012,9 @@ notmuch_message_reindex (notmuch_message_t *message,
     thread_id = orig_thread_id;
 
  _notmuch_message_add_term (message, "thread", thread_id);
- _notmuch_message_set_header_values (message, date, from, subject);
+ /* Take header values only from first filename */
+ if (found == 0)
+    _notmuch_message_set_header_values (message, date, from, subject);
 
  ret = _notmuch_message_index_file (message, message_file);
 
diff --git a/test/T670-duplicate-mid.sh b/test/T670-duplicate-mid.sh
index d4c1d1c2..ce010cf7 100755
--- a/test/T670-duplicate-mid.sh
+++ b/test/T670-duplicate-mid.sh
@@ -41,7 +41,6 @@ notmuch search --output=files "sekrit" | notmuch_dir_sanitize > OUTPUT
 test_expect_equal_file EXPECTED OUTPUT
 
 test_begin_subtest 'reindex choses subject from first filename'
-test_subtest_known_broken
 cat <<EOF > EXPECTED
 thread:XXX   2001-01-05 [1/1(3)] Notmuch Test Suite; message 0 (inbox unread)
 EOF
--
2.13.2

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

[PATCH 5/5] test/duplicate-mid: check for subject with notmuch-show

In reply to this post by David Bremner-2
In [1] Mark showed that the the current code (d7a49e81) is not
consistent in it's handling of subjects of messages with duplicate
message-ids (or in notmuch-speak, of messages with multiple files).
notmuch-search uses indexing order and explicitedly preserves the
first. notmuch-show (apparently) uses alphabetical (or at least xapian
term order) of filenames. In a perfect world we would probably report
all subjects in the json output; at the very least we should be
consistent.

[1]: id:[hidden email]
---
 test/T670-duplicate-mid.sh | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/test/T670-duplicate-mid.sh b/test/T670-duplicate-mid.sh
index ce010cf7..21a9689a 100755
--- a/test/T670-duplicate-mid.sh
+++ b/test/T670-duplicate-mid.sh
@@ -13,6 +13,31 @@ EOF
 notmuch search id:duplicate | notmuch_search_sanitize > OUTPUT
 test_expect_equal_file EXPECTED OUTPUT
 
+test_begin_subtest 'First subject preserved in notmuch-show (json)'
+test_subtest_known_broken
+output=$(notmuch show --body=false --format=json id:duplicate | notmuch_json_show_sanitize)
+expected='[[[{
+    "id": "XXXXX",
+    "match": true,
+    "excluded": false,
+    "filename": [
+        "'"${MAIL_DIR}"/copy0'",
+        "'"${MAIL_DIR}"/copy1'",
+        "'"${MAIL_DIR}"/copy2'"
+    ],
+    "timestamp": 42,
+    "date_relative": "2001-01-05",
+    "tags": ["inbox","unread"],
+    "headers": {
+        "Subject": "message 1",
+        "From": "Notmuch Test Suite <[hidden email]>",
+        "To": "Notmuch Test Suite <[hidden email]>",
+        "Date": "GENERATED_DATE"
+    }
+ },
+[]]]]'
+test_expect_equal_json "$output" "$expected"
+
 test_begin_subtest 'Search for second subject'
 cat <<EOF >EXPECTED
 MAIL_DIR/copy0
--
2.13.2

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/5] test: make fallback to duplicate test more robust.

In reply to this post by David Bremner-2
David Bremner <[hidden email]> writes:

> The original intent of this test was to verify that notmuch show was
> not crashing when the first file (where headers are being read from)
> was deleted. Run the output through some sanitization so that as we
> add and delete copies we don't have to update this test.

Series pushed.

d
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch