second round of indexing all files

classic Classic list List threaded Threaded
14 messages Options
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

second round of indexing all files

This adds in a "notmuch reindex" command so that deleting the terms
from deleted files can be accomplished.  There are still several UI
issues to deal with (i.e. we return an arbitrary file, not necessarily
the one matched).

The reindex command is a simplified version of one the that dkg
originally wrote for his series on indexing encrypted messages. I've
ripped out all the encryption related stuff here.

I've also postulated (but not yet written) a more generic way of
handling index options, roughly modeled on our command-line-options
code. I hope that this will allow fewer functions, and a more static
API at the library level; at this point it's just a sketch of an idea.


_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

[rfc patch v2 1/5] lib: add definitions for notmuch_param_t

This is not an opaque struct because we envision using static
initialization much like the command-line-options.h structures.
---
 lib/notmuch.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/lib/notmuch.h b/lib/notmuch.h
index d374dc96..fc00f96d 100644
--- a/lib/notmuch.h
+++ b/lib/notmuch.h
@@ -219,6 +219,23 @@ typedef struct _notmuch_filenames notmuch_filenames_t;
 typedef struct _notmuch_config_list notmuch_config_list_t;
 #endif /* __DOXYGEN__ */
 
+enum notmuch_param_type {
+    NOTMUCH_PARAM_END = 0,
+    NOTMUCH_PARAM_BOOLEAN,
+    NOTMUCH_PARAM_INT,
+    NOTMUCH_PARAM_STRING
+};
+
+typedef struct notmuch_param_desc {
+    enum notmuch_param_type param_type;
+    int key;
+    union {
+ notmuch_bool_t bool_val;
+ int int_val;
+ const char *string_val;
+    };
+} notmuch_param_t;
+
 /**
  * Create a new, empty notmuch database located at 'path'.
  *
--
2.11.0

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

[rfc patch v2 2/5] added notmuch_message_reindex

In reply to this post by David Bremner-2
From: Daniel Kahn Gillmor <[hidden email]>

This new function asks the database to reindex a given message.
The parameter `indexopts` is currently ignored, but is intended to
provide an extensible API to support e.g. changing the encryption or
filtering status (e.g. whether and how certain non-plaintext parts are
indexed).

Since we have no way of distinguising terms added (without prefix)
from the headers and terms added from the body, we just save the tags
and properties, remove the message from the database entirely, and add
it back into the database in full, re-adding tags and properties as
needed.
---
 lib/message.cc | 102 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 lib/notmuch.h  |  14 ++++++++
 2 files changed, 115 insertions(+), 1 deletion(-)

diff --git a/lib/message.cc b/lib/message.cc
index f8215a49..d68e4c66 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -579,7 +579,9 @@ void
 _notmuch_message_remove_terms (notmuch_message_t *message, const char *prefix)
 {
     Xapian::TermIterator i;
-    size_t prefix_len = strlen (prefix);
+    size_t prefix_len = 0;
+
+    prefix_len = strlen (prefix);
 
     while (1) {
  i = message->doc.termlist_begin ();
@@ -1872,3 +1874,101 @@ _notmuch_message_frozen (notmuch_message_t *message)
 {
     return message->frozen;
 }
+
+notmuch_status_t
+notmuch_message_reindex (notmuch_message_t *message,
+ notmuch_param_t unused (*indexopts))
+{
+    notmuch_database_t *notmuch = NULL;
+    notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS, status;
+    notmuch_tags_t *tags = NULL;
+    notmuch_message_properties_t *properties = NULL;
+    notmuch_filenames_t *filenames, *orig_filenames = NULL;
+    const char *filename = NULL, *tag = NULL, *propkey = NULL;
+    notmuch_message_t *newmsg = NULL;
+    notmuch_bool_t readded = FALSE, skip;
+    const char *autotags[] = {
+    "attachment",
+    "encrypted",
+    "signed" };
+
+    if (message == NULL)
+ return NOTMUCH_STATUS_NULL_POINTER;
+
+    notmuch = _notmuch_message_database (message);
+
+    /* cache tags, properties, and filenames */
+    tags = notmuch_message_get_tags (message);
+    properties = notmuch_message_get_properties (message, "", FALSE);
+    filenames = notmuch_message_get_filenames (message);
+    orig_filenames = notmuch_message_get_filenames (message);
+
+    /* walk through filenames, removing them until the message is gone */
+    for ( ; notmuch_filenames_valid (filenames);
+  notmuch_filenames_move_to_next (filenames)) {
+ filename = notmuch_filenames_get (filenames);
+
+ ret = notmuch_database_remove_message (notmuch, filename);
+ if (ret != NOTMUCH_STATUS_SUCCESS &&
+    ret != NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID)
+    return ret;
+    }
+    if (ret != NOTMUCH_STATUS_SUCCESS)
+ return ret;
+
+    /* re-add the filenames with the associated indexopts */
+    for (; notmuch_filenames_valid (orig_filenames);
+ notmuch_filenames_move_to_next (orig_filenames)) {
+ filename = notmuch_filenames_get (orig_filenames);
+
+ status = notmuch_database_add_message(notmuch,
+      filename,
+      readded ? NULL : &newmsg);
+ if (status == NOTMUCH_STATUS_SUCCESS ||
+    status == NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID) {
+    if (!readded) {
+ /* re-add tags */
+ for (; notmuch_tags_valid (tags);
+     notmuch_tags_move_to_next (tags)) {
+    tag = notmuch_tags_get (tags);
+    skip = FALSE;
+
+    for (size_t i = 0; i < ARRAY_SIZE (autotags); i++)
+ if (strcmp (tag, autotags[i]) == 0)
+    skip = TRUE;
+
+    if (!skip) {
+ status = notmuch_message_add_tag (newmsg, tag);
+ if (status != NOTMUCH_STATUS_SUCCESS)
+    ret = status;
+    }
+ }
+ /* re-add properties */
+ for (; notmuch_message_properties_valid (properties);
+     notmuch_message_properties_move_to_next (properties)) {
+    propkey = notmuch_message_properties_key (properties);
+    skip = FALSE;
+
+    if (!skip) {
+ status = notmuch_message_add_property (newmsg, propkey,
+       notmuch_message_properties_value (properties));
+ if (status != NOTMUCH_STATUS_SUCCESS)
+    ret = status;
+    }
+ }
+ readded = TRUE;
+    }
+ } else {
+    /* if we failed to add this filename, go ahead and try the
+     * next one as though it were first, but report the
+     * error... */
+    ret = status;
+ }
+    }
+    if (newmsg)
+ notmuch_message_destroy (newmsg);
+
+    /* should we also destroy the incoming message object?  at the
+     * moment, we leave that to the caller */
+    return ret;
+}
diff --git a/lib/notmuch.h b/lib/notmuch.h
index fc00f96d..1f31efed 100644
--- a/lib/notmuch.h
+++ b/lib/notmuch.h
@@ -1389,6 +1389,20 @@ notmuch_filenames_t *
 notmuch_message_get_filenames (notmuch_message_t *message);
 
 /**
+ * Re-index the e-mail corresponding to 'message' using the supplied index options
+ *
+ * Returns the status of the re-index operation.  (see the return
+ * codes documented in notmuch_database_add_message)
+ *
+ * After reindexing, the user should discard the message object passed
+ * in here by calling notmuch_message_destroy, since it refers to the
+ * original message, not to the reindexed message.
+ */
+notmuch_status_t
+notmuch_message_reindex (notmuch_message_t *message,
+ notmuch_param_t *indexopts);
+
+/**
  * Message flags.
  */
 typedef enum _notmuch_message_flag {
--
2.11.0

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

[rfc patch v2 3/5] add "notmuch reindex" subcommand

In reply to this post by David Bremner-2
From: Daniel Kahn Gillmor <[hidden email]>

This new subcommand takes a set of search terms, and re-indexes the
list of matching messages.
---
 Makefile.local                    |   1 +
 doc/conf.py                       |   4 ++
 doc/index.rst                     |   1 +
 doc/man1/notmuch-reindex.rst      |  29 +++++++++
 doc/man1/notmuch.rst              |   4 +-
 doc/man7/notmuch-search-terms.rst |   7 +-
 notmuch-client.h                  |   3 +
 notmuch-reindex.c                 | 132 ++++++++++++++++++++++++++++++++++++++
 notmuch.c                         |   2 +
 test/T700-reindex.sh              |  21 ++++++
 10 files changed, 200 insertions(+), 4 deletions(-)
 create mode 100644 doc/man1/notmuch-reindex.rst
 create mode 100644 notmuch-reindex.c
 create mode 100755 test/T700-reindex.sh

diff --git a/Makefile.local b/Makefile.local
index 03eafaaa..c6e272bc 100644
--- a/Makefile.local
+++ b/Makefile.local
@@ -222,6 +222,7 @@ notmuch_client_srcs = \
  notmuch-dump.c \
  notmuch-insert.c \
  notmuch-new.c \
+ notmuch-reindex.c       \
  notmuch-reply.c \
  notmuch-restore.c \
  notmuch-search.c \
diff --git a/doc/conf.py b/doc/conf.py
index a3d82696..aa864b3c 100644
--- a/doc/conf.py
+++ b/doc/conf.py
@@ -95,6 +95,10 @@ man_pages = [
      u'incorporate new mail into the notmuch database',
      [notmuch_authors], 1),
 
+    ('man1/notmuch-reindex', 'notmuch-reindex',
+     u're-index matching messages',
+     [notmuch_authors], 1),
+
     ('man1/notmuch-reply', 'notmuch-reply',
      u'constructs a reply template for a set of messages',
      [notmuch_authors], 1),
diff --git a/doc/index.rst b/doc/index.rst
index 344606d9..aa6c9f40 100644
--- a/doc/index.rst
+++ b/doc/index.rst
@@ -18,6 +18,7 @@ Contents:
    man5/notmuch-hooks
    man1/notmuch-insert
    man1/notmuch-new
+   man1/notmuch-reindex
    man1/notmuch-reply
    man1/notmuch-restore
    man1/notmuch-search
diff --git a/doc/man1/notmuch-reindex.rst b/doc/man1/notmuch-reindex.rst
new file mode 100644
index 00000000..6c786b85
--- /dev/null
+++ b/doc/man1/notmuch-reindex.rst
@@ -0,0 +1,29 @@
+===========
+notmuch-reindex
+===========
+
+SYNOPSIS
+========
+
+**notmuch** **reindex** [*option* ...] <*search-term*> ...
+
+DESCRIPTION
+===========
+
+Re-index all messages matching the search terms.
+
+See **notmuch-search-terms(7)** for details of the supported syntax for
+<*search-term*\ >.
+
+The **reindex** command searches for all messages matching the
+supplied search terms, and re-creates the full-text index on these
+messages using the supplied options.
+
+SEE ALSO
+========
+
+**notmuch(1)**, **notmuch-config(1)**, **notmuch-count(1)**,
+**notmuch-dump(1)**, **notmuch-hooks(5)**, **notmuch-insert(1)**,
+**notmuch-new(1)**,
+**notmuch-reply(1)**, **notmuch-restore(1)**, **notmuch-search(1)**,
+**notmuch-search-terms(7)**, **notmuch-show(1)**, **notmuch-tag(1)**
diff --git a/doc/man1/notmuch.rst b/doc/man1/notmuch.rst
index fbd7f381..b2a8376e 100644
--- a/doc/man1/notmuch.rst
+++ b/doc/man1/notmuch.rst
@@ -149,8 +149,8 @@ SEE ALSO
 
 **notmuch-address(1)**, **notmuch-compact(1)**, **notmuch-config(1)**,
 **notmuch-count(1)**, **notmuch-dump(1)**, **notmuch-hooks(5)**,
-**notmuch-insert(1)**, **notmuch-new(1)**, **notmuch-reply(1)**,
-**notmuch-restore(1)**, **notmuch-search(1)**,
+**notmuch-insert(1)**, **notmuch-new(1)**, **notmuch-reindex(1)**,
+**notmuch-reply(1)**, **notmuch-restore(1)**, **notmuch-search(1)**,
 **notmuch-search-terms(7)**, **notmuch-show(1)**, **notmuch-tag(1)**
 
 The notmuch website: **https://notmuchmail.org**
diff --git a/doc/man7/notmuch-search-terms.rst b/doc/man7/notmuch-search-terms.rst
index 47cab48d..dd76972e 100644
--- a/doc/man7/notmuch-search-terms.rst
+++ b/doc/man7/notmuch-search-terms.rst
@@ -9,6 +9,8 @@ SYNOPSIS
 
 **notmuch** **dump** [--format=(batch-tag|sup)] [--] [--output=<*file*>] [--] [<*search-term*> ...]
 
+**notmuch** **reindex** [option ...] <*search-term*> ...
+
 **notmuch** **search** [option ...] <*search-term*> ...
 
 **notmuch** **show** [option ...] <*search-term*> ...
@@ -421,5 +423,6 @@ SEE ALSO
 
 **notmuch(1)**, **notmuch-config(1)**, **notmuch-count(1)**,
 **notmuch-dump(1)**, **notmuch-hooks(5)**, **notmuch-insert(1)**,
-**notmuch-new(1)**, **notmuch-reply(1)**, **notmuch-restore(1)**,
-**notmuch-search(1)**, **notmuch-show(1)**, **notmuch-tag(1)**
+**notmuch-new(1)**, **notmuch-reindex(1)**, **notmuch-reply(1)**,
+**notmuch-restore(1)**, **notmuch-search(1)**, **notmuch-show(1)**,
+**notmuch-tag(1)**
diff --git a/notmuch-client.h b/notmuch-client.h
index a6f70eae..ab7138c6 100644
--- a/notmuch-client.h
+++ b/notmuch-client.h
@@ -196,6 +196,9 @@ int
 notmuch_insert_command (notmuch_config_t *config, int argc, char *argv[]);
 
 int
+notmuch_reindex_command (notmuch_config_t *config, int argc, char *argv[]);
+
+int
 notmuch_reply_command (notmuch_config_t *config, int argc, char *argv[]);
 
 int
diff --git a/notmuch-reindex.c b/notmuch-reindex.c
new file mode 100644
index 00000000..836a90a1
--- /dev/null
+++ b/notmuch-reindex.c
@@ -0,0 +1,132 @@
+/* notmuch - Not much of an email program, (just index and search)
+ *
+ * Copyright © 2016 Daniel Kahn Gillmor
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see http://www.gnu.org/licenses/ .
+ *
+ * Author: Daniel Kahn Gillmor <[hidden email]>
+ */
+
+#include "notmuch-client.h"
+#include "string-util.h"
+
+static volatile sig_atomic_t interrupted;
+
+static void
+handle_sigint (unused (int sig))
+{
+    static char msg[] = "Stopping...         \n";
+
+    /* This write is "opportunistic", so it's okay to ignore the
+     * result.  It is not required for correctness, and if it does
+     * fail or produce a short write, we want to get out of the signal
+     * handler as quickly as possible, not retry it. */
+    IGNORE_RESULT (write (2, msg, sizeof (msg) - 1));
+    interrupted = 1;
+}
+
+/* reindex all messages matching 'query_string' using the passed-in indexopts
+ */
+static int
+reindex_query (notmuch_database_t *notmuch, const char *query_string,
+       notmuch_param_t *indexopts)
+{
+    notmuch_query_t *query;
+    notmuch_messages_t *messages;
+    notmuch_message_t *message;
+    notmuch_status_t status;
+
+    int ret = NOTMUCH_STATUS_SUCCESS;
+
+    query = notmuch_query_create (notmuch, query_string);
+    if (query == NULL) {
+ fprintf (stderr, "Out of memory.\n");
+ return 1;
+    }
+
+    /* reindexing is not interested in any special sort order */
+    notmuch_query_set_sort (query, NOTMUCH_SORT_UNSORTED);
+
+    status = notmuch_query_search_messages (query, &messages);
+    if (print_status_query ("notmuch reindex", query, status))
+ return status;
+
+    for (;
+ notmuch_messages_valid (messages) && ! interrupted;
+ notmuch_messages_move_to_next (messages)) {
+ message = notmuch_messages_get (messages);
+
+ notmuch_message_reindex(message, indexopts);
+ notmuch_message_destroy (message);
+ if (ret != NOTMUCH_STATUS_SUCCESS)
+    break;
+    }
+
+    notmuch_query_destroy (query);
+
+    return ret || interrupted;
+}
+
+int
+notmuch_reindex_command (notmuch_config_t *config, int argc, char *argv[])
+{
+    char *query_string = NULL;
+    notmuch_database_t *notmuch;
+    struct sigaction action;
+    notmuch_bool_t try_decrypt = FALSE;
+    int opt_index;
+    int ret;
+    notmuch_param_t *indexopts = NULL;
+
+    /* Set up our handler for SIGINT */
+    memset (&action, 0, sizeof (struct sigaction));
+    action.sa_handler = handle_sigint;
+    sigemptyset (&action.sa_mask);
+    action.sa_flags = SA_RESTART;
+    sigaction (SIGINT, &action, NULL);
+
+    notmuch_opt_desc_t options[] = {
+ { NOTMUCH_OPT_INHERIT, (void *) &notmuch_shared_options, NULL, 0, 0 },
+ { 0, 0, 0, 0, 0 }
+    };
+
+    opt_index = parse_arguments (argc, argv, options, 1);
+    if (opt_index < 0)
+ return EXIT_FAILURE;
+
+    notmuch_process_shared_options (argv[0]);
+
+    if (notmuch_database_open (notmuch_config_get_database_path (config),
+       NOTMUCH_DATABASE_MODE_READ_WRITE, &notmuch))
+ return EXIT_FAILURE;
+
+    notmuch_exit_if_unmatched_db_uuid (notmuch);
+
+    query_string = query_string_from_args (config, argc-opt_index, argv+opt_index);
+    if (query_string == NULL) {
+ fprintf (stderr, "Out of memory\n");
+ return EXIT_FAILURE;
+    }
+
+    if (*query_string == '\0') {
+ fprintf (stderr, "Error: notmuch reindex requires at least one search term.\n");
+ return EXIT_FAILURE;
+    }
+    
+    ret = reindex_query (notmuch, query_string, indexopts);
+
+    notmuch_database_destroy (notmuch);
+
+    return ret || interrupted ? EXIT_FAILURE : EXIT_SUCCESS;
+}
diff --git a/notmuch.c b/notmuch.c
index 8e332ce6..201c7454 100644
--- a/notmuch.c
+++ b/notmuch.c
@@ -123,6 +123,8 @@ static command_t commands[] = {
       "Restore the tags from the given dump file (see 'dump')." },
     { "compact", notmuch_compact_command, NOTMUCH_CONFIG_OPEN,
       "Compact the notmuch database." },
+    { "reindex", notmuch_reindex_command, NOTMUCH_CONFIG_OPEN,
+      "Re-index all messages matching the search terms." },
     { "config", notmuch_config_command, NOTMUCH_CONFIG_OPEN,
       "Get or set settings in the notmuch configuration file." },
     { "help", notmuch_help_command, NOTMUCH_CONFIG_CREATE, /* create but don't save config */
diff --git a/test/T700-reindex.sh b/test/T700-reindex.sh
new file mode 100755
index 00000000..32385a72
--- /dev/null
+++ b/test/T700-reindex.sh
@@ -0,0 +1,21 @@
+#!/usr/bin/env bash
+test_description='reindexing messages'
+. ./test-lib.sh || exit 1
+
+add_email_corpus
+
+test_begin_subtest 'reindex preserves message-ids'
+notmuch search --output=messages '*' > EXPECTED
+# remove duplicate file
+rm $MAIL_DIR/bar/18:2,
+notmuch reindex '*'
+notmuch search --output=messages '*' > OUTPUT
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest 'reindex preserves tags'
+notmuch dump > EXPECTED
+notmuch reindex '*'
+notmuch dump > OUTPUT
+test_expect_equal_file EXPECTED OUTPUT
+
+test_done
--
2.11.0

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

[rfc patch v2 4/5] test: add known broken test for duplicate message id

In reply to this post by David Bremner-2
There are many other problems that could be tested, but this one we
have some hope of fixing because it doesn't require UI changes, just
indexing changes.
---
 test/T670-duplicate-mid.sh | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)
 create mode 100755 test/T670-duplicate-mid.sh

diff --git a/test/T670-duplicate-mid.sh b/test/T670-duplicate-mid.sh
new file mode 100755
index 00000000..88bd12cb
--- /dev/null
+++ b/test/T670-duplicate-mid.sh
@@ -0,0 +1,17 @@
+#!/usr/bin/env bash
+test_description="duplicate message ids"
+. ./test-lib.sh || exit 1
+
+add_message '[id]="id:duplicate"' '[subject]="message 1"'
+add_message '[id]="id:duplicate"' '[subject]="message 2"'
+
+test_begin_subtest 'Search for second subject'
+test_subtest_known_broken
+cat <<EOF >EXPECTED
+MAIL_DIR/msg-001
+MAIL_DIR/msg-002
+EOF
+notmuch search --output=files subject:'"message 2"' | notmuch_dir_sanitize > OUTPUT
+test_expect_equal_file EXPECTED OUTPUT
+
+test_done
--
2.11.0

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

[rfc patch v2 5/5] lib: index message files with duplicate message-ids

In reply to this post by David Bremner-2
The corresponding xapian document just gets more terms added to it,
but this doesn't seem to break anything.
---
 lib/database.cc            |  3 +++
 test/T670-duplicate-mid.sh | 22 +++++++++++++++++++---
 2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index 5bc131a3..3b9f7828 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -2582,6 +2582,9 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
     if (ret)
  goto DONE;
  } else {
+    ret = _notmuch_message_index_file (message, message_file);
+    if (ret)
+ goto DONE;
     ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
  }
 
diff --git a/test/T670-duplicate-mid.sh b/test/T670-duplicate-mid.sh
index 88bd12cb..2c77e11e 100755
--- a/test/T670-duplicate-mid.sh
+++ b/test/T670-duplicate-mid.sh
@@ -2,11 +2,10 @@
 test_description="duplicate message ids"
 . ./test-lib.sh || exit 1
 
-add_message '[id]="id:duplicate"' '[subject]="message 1"'
-add_message '[id]="id:duplicate"' '[subject]="message 2"'
+add_message '[id]="duplicate"' '[subject]="message 1"'
+add_message '[id]="duplicate"' '[subject]="message 2"'
 
 test_begin_subtest 'Search for second subject'
-test_subtest_known_broken
 cat <<EOF >EXPECTED
 MAIL_DIR/msg-001
 MAIL_DIR/msg-002
@@ -14,4 +13,21 @@ EOF
 notmuch search --output=files subject:'"message 2"' | notmuch_dir_sanitize > OUTPUT
 test_expect_equal_file EXPECTED OUTPUT
 
+add_message '[id]="duplicate"' '[body]="sekrit"'
+test_begin_subtest 'search for body in duplicate file'
+cat <<EOF >EXPECTED
+MAIL_DIR/msg-001
+MAIL_DIR/msg-002
+MAIL_DIR/msg-003
+EOF
+notmuch search --output=files "sekrit" | notmuch_dir_sanitize > OUTPUT
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest 'reindex removes terms from duplicate file'
+rm $MAIL_DIR/msg-003
+notmuch reindex id:duplicate
+cp /dev/null EXPECTED
+notmuch search --output=files "sekrit" | notmuch_dir_sanitize > OUTPUT
+test_expect_equal_file EXPECTED OUTPUT
+
 test_done
--
2.11.0

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

third round of indexing all files

In reply to this post by David Bremner-2
It seems noticeably faster (on the order of 30-50% faster) and the
code is quite a bit simpler to adapt the approach in [1] to only
delete the terms we are going to re-add via indexing.

This obsoletes the previous series at [2]. It still has all of the
issues mentioned there UI-wise, and the question of the index options
design probably needs more thought.

This is new in this round

     [rfc patch v3 2/6] lib: add _notmuch_message_remove_indexed_terms

This is has been pretty drastically rewritten compared to daniel's version [3]

     [rfc patch v3 3/6] added notmuch_message_reindex

This is the same, except I added simple performance tests

     [rfc patch v3 4/6] add "notmuch reindex" subcommand


[1]: id:[hidden email]
[2]: id:[hidden email]
[3]: id:[hidden email]
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

[rfc patch v3 1/6] lib: add definitions for notmuch_param_t

This is not an opaque struct because we envision using static
initialization much like the command-line-options.h structures.
---
 lib/notmuch.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/lib/notmuch.h b/lib/notmuch.h
index d374dc96..fc00f96d 100644
--- a/lib/notmuch.h
+++ b/lib/notmuch.h
@@ -219,6 +219,23 @@ typedef struct _notmuch_filenames notmuch_filenames_t;
 typedef struct _notmuch_config_list notmuch_config_list_t;
 #endif /* __DOXYGEN__ */
 
+enum notmuch_param_type {
+    NOTMUCH_PARAM_END = 0,
+    NOTMUCH_PARAM_BOOLEAN,
+    NOTMUCH_PARAM_INT,
+    NOTMUCH_PARAM_STRING
+};
+
+typedef struct notmuch_param_desc {
+    enum notmuch_param_type param_type;
+    int key;
+    union {
+ notmuch_bool_t bool_val;
+ int int_val;
+ const char *string_val;
+    };
+} notmuch_param_t;
+
 /**
  * Create a new, empty notmuch database located at 'path'.
  *
--
2.11.0

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

[rfc patch v3 2/6] lib: add _notmuch_message_remove_indexed_terms

In reply to this post by David Bremner-2
Testing will be provided via use in notmuch_message_reindex
---
 lib/message.cc        | 44 ++++++++++++++++++++++++++++++++++++++++++++
 lib/notmuch-private.h |  2 ++
 lib/notmuch.h         |  4 ++++
 3 files changed, 50 insertions(+)

diff --git a/lib/message.cc b/lib/message.cc
index f8215a49..a7bd38ac 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -599,6 +599,50 @@ _notmuch_message_remove_terms (notmuch_message_t *message, const char *prefix)
     }
 }
 
+
+/* Remove all terms generated by indexing, i.e. not tags or
+ * properties, along with any automatic tags*/
+notmuch_private_status_t
+_notmuch_message_remove_indexed_terms (notmuch_message_t *message)
+{
+    Xapian::TermIterator i;
+
+    const std::string tag_prefix = _find_prefix ("tag");
+    const std::string property_prefix = _find_prefix ("property");
+
+    for (i = message->doc.termlist_begin ();
+ i != message->doc.termlist_end (); i++) {
+
+ const std::string term = *i;
+
+ if (term.compare (0, property_prefix.size (), property_prefix) == 0)
+    continue;
+
+ if (term.compare (0, tag_prefix.size (), tag_prefix) == 0 &&
+    term.compare (1, strlen("encrypted"), "encrypted") != 0 &&
+    term.compare (1, strlen("signed"), "signed") != 0 &&
+    term.compare (1, strlen("attachment"), "attachment") != 0)
+    continue;
+
+ try {
+    message->doc.remove_term ((*i));
+    message->modified = TRUE;
+ } catch (const Xapian::InvalidArgumentError) {
+    /* Ignore failure to remove non-existent term. */
+ } catch (const Xapian::Error &error) {
+    notmuch_database_t *notmuch = message->notmuch;
+
+    if (!notmuch->exception_reported) {
+ _notmuch_database_log(_notmuch_message_database (message), "A Xapian exception occurred creating message: %s\n",
+      error.get_msg().c_str());
+ notmuch->exception_reported = TRUE;
+    }
+    return NOTMUCH_PRIVATE_STATUS_XAPIAN_EXCEPTION;
+ }
+    }
+    return NOTMUCH_PRIVATE_STATUS_SUCCESS;
+}
+
 /* Return true if p points at "new" or "cur". */
 static bool is_maildir (const char *p)
 {
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index 8587e86c..1198d932 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -509,6 +509,8 @@ _notmuch_message_add_reply (notmuch_message_t *message,
 notmuch_database_t *
 _notmuch_message_database (notmuch_message_t *message);
 
+void
+_notmuch_message_remove_unprefixed_terms (notmuch_message_t *message);
 /* sha1.c */
 
 char *
diff --git a/lib/notmuch.h b/lib/notmuch.h
index fc00f96d..33e9fd24 100644
--- a/lib/notmuch.h
+++ b/lib/notmuch.h
@@ -1685,6 +1685,10 @@ notmuch_message_thaw (notmuch_message_t *message);
 void
 notmuch_message_destroy (notmuch_message_t *message);
 
+/* for testing */
+
+void
+notmuch_test_clear_terms(notmuch_message_t *message);
 /**
  * @name Message Properties
  *
--
2.11.0

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

[rfc patch v3 3/6] added notmuch_message_reindex

In reply to this post by David Bremner-2
From: Daniel Kahn Gillmor <[hidden email]>

This new function asks the database to reindex a given message.
The parameter `indexopts` is currently ignored, but is intended to
provide an extensible API to support e.g. changing the encryption or
filtering status (e.g. whether and how certain non-plaintext parts are
indexed).
---
 lib/message.cc | 46 +++++++++++++++++++++++++++++++++++++++++++++-
 lib/notmuch.h  | 14 ++++++++++++++
 2 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/lib/message.cc b/lib/message.cc
index a7bd38ac..193eedb2 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -579,7 +579,9 @@ void
 _notmuch_message_remove_terms (notmuch_message_t *message, const char *prefix)
 {
     Xapian::TermIterator i;
-    size_t prefix_len = strlen (prefix);
+    size_t prefix_len = 0;
+
+    prefix_len = strlen (prefix);
 
     while (1) {
  i = message->doc.termlist_begin ();
@@ -1916,3 +1918,45 @@ _notmuch_message_frozen (notmuch_message_t *message)
 {
     return message->frozen;
 }
+
+notmuch_status_t
+notmuch_message_reindex (notmuch_message_t *message,
+ notmuch_param_t unused (*indexopts))
+{
+    notmuch_database_t *notmuch = NULL;
+    notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS, status;
+    notmuch_private_status_t private_status;
+    notmuch_filenames_t *orig_filenames = NULL;
+    const char *filename = NULL;
+
+    if (message == NULL)
+ return NOTMUCH_STATUS_NULL_POINTER;
+
+    notmuch = _notmuch_message_database (message);
+
+    orig_filenames = notmuch_message_get_filenames (message);
+
+    private_status = _notmuch_message_remove_indexed_terms (message);
+    if (private_status)
+ return COERCE_STATUS(private_status, "error removing terms");
+
+    /* re-add the filenames with the associated indexopts */
+    for (; notmuch_filenames_valid (orig_filenames);
+ notmuch_filenames_move_to_next (orig_filenames)) {
+ filename = notmuch_filenames_get (orig_filenames);
+
+ status = notmuch_database_add_message(notmuch,
+      filename,
+      &message);
+ if (status != NOTMUCH_STATUS_SUCCESS &&
+    status != NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID) {
+    /* if we failed to add this filename, go ahead and try the
+     * next one as though it were first, but report the
+     * error... */
+    ret = status;
+ }
+    }
+
+    /* XXX TODO destroy orig_filenames? */
+    return ret;
+}
diff --git a/lib/notmuch.h b/lib/notmuch.h
index 33e9fd24..11818018 100644
--- a/lib/notmuch.h
+++ b/lib/notmuch.h
@@ -1389,6 +1389,20 @@ notmuch_filenames_t *
 notmuch_message_get_filenames (notmuch_message_t *message);
 
 /**
+ * Re-index the e-mail corresponding to 'message' using the supplied index options
+ *
+ * Returns the status of the re-index operation.  (see the return
+ * codes documented in notmuch_database_add_message)
+ *
+ * After reindexing, the user should discard the message object passed
+ * in here by calling notmuch_message_destroy, since it refers to the
+ * original message, not to the reindexed message.
+ */
+notmuch_status_t
+notmuch_message_reindex (notmuch_message_t *message,
+ notmuch_param_t *indexopts);
+
+/**
  * Message flags.
  */
 typedef enum _notmuch_message_flag {
--
2.11.0

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

[rfc patch v3 4/6] add "notmuch reindex" subcommand

In reply to this post by David Bremner-2
From: Daniel Kahn Gillmor <[hidden email]>

This new subcommand takes a set of search terms, and re-indexes the
list of matching messages.
---
 Makefile.local                    |   1 +
 doc/conf.py                       |   4 ++
 doc/index.rst                     |   1 +
 doc/man1/notmuch-reindex.rst      |  29 +++++++++
 doc/man1/notmuch.rst              |   4 +-
 doc/man7/notmuch-search-terms.rst |   7 +-
 notmuch-client.h                  |   3 +
 notmuch-reindex.c                 | 131 ++++++++++++++++++++++++++++++++++++++
 notmuch.c                         |   2 +
 performance-test/M04-reindex.sh   |  11 ++++
 performance-test/T03-reindex.sh   |  13 ++++
 test/T700-reindex.sh              |  21 ++++++
 12 files changed, 223 insertions(+), 4 deletions(-)
 create mode 100644 doc/man1/notmuch-reindex.rst
 create mode 100644 notmuch-reindex.c
 create mode 100755 performance-test/M04-reindex.sh
 create mode 100755 performance-test/T03-reindex.sh
 create mode 100755 test/T700-reindex.sh

diff --git a/Makefile.local b/Makefile.local
index 03eafaaa..c6e272bc 100644
--- a/Makefile.local
+++ b/Makefile.local
@@ -222,6 +222,7 @@ notmuch_client_srcs = \
  notmuch-dump.c \
  notmuch-insert.c \
  notmuch-new.c \
+ notmuch-reindex.c       \
  notmuch-reply.c \
  notmuch-restore.c \
  notmuch-search.c \
diff --git a/doc/conf.py b/doc/conf.py
index a3d82696..aa864b3c 100644
--- a/doc/conf.py
+++ b/doc/conf.py
@@ -95,6 +95,10 @@ man_pages = [
      u'incorporate new mail into the notmuch database',
      [notmuch_authors], 1),
 
+    ('man1/notmuch-reindex', 'notmuch-reindex',
+     u're-index matching messages',
+     [notmuch_authors], 1),
+
     ('man1/notmuch-reply', 'notmuch-reply',
      u'constructs a reply template for a set of messages',
      [notmuch_authors], 1),
diff --git a/doc/index.rst b/doc/index.rst
index 344606d9..aa6c9f40 100644
--- a/doc/index.rst
+++ b/doc/index.rst
@@ -18,6 +18,7 @@ Contents:
    man5/notmuch-hooks
    man1/notmuch-insert
    man1/notmuch-new
+   man1/notmuch-reindex
    man1/notmuch-reply
    man1/notmuch-restore
    man1/notmuch-search
diff --git a/doc/man1/notmuch-reindex.rst b/doc/man1/notmuch-reindex.rst
new file mode 100644
index 00000000..6c786b85
--- /dev/null
+++ b/doc/man1/notmuch-reindex.rst
@@ -0,0 +1,29 @@
+===========
+notmuch-reindex
+===========
+
+SYNOPSIS
+========
+
+**notmuch** **reindex** [*option* ...] <*search-term*> ...
+
+DESCRIPTION
+===========
+
+Re-index all messages matching the search terms.
+
+See **notmuch-search-terms(7)** for details of the supported syntax for
+<*search-term*\ >.
+
+The **reindex** command searches for all messages matching the
+supplied search terms, and re-creates the full-text index on these
+messages using the supplied options.
+
+SEE ALSO
+========
+
+**notmuch(1)**, **notmuch-config(1)**, **notmuch-count(1)**,
+**notmuch-dump(1)**, **notmuch-hooks(5)**, **notmuch-insert(1)**,
+**notmuch-new(1)**,
+**notmuch-reply(1)**, **notmuch-restore(1)**, **notmuch-search(1)**,
+**notmuch-search-terms(7)**, **notmuch-show(1)**, **notmuch-tag(1)**
diff --git a/doc/man1/notmuch.rst b/doc/man1/notmuch.rst
index fbd7f381..b2a8376e 100644
--- a/doc/man1/notmuch.rst
+++ b/doc/man1/notmuch.rst
@@ -149,8 +149,8 @@ SEE ALSO
 
 **notmuch-address(1)**, **notmuch-compact(1)**, **notmuch-config(1)**,
 **notmuch-count(1)**, **notmuch-dump(1)**, **notmuch-hooks(5)**,
-**notmuch-insert(1)**, **notmuch-new(1)**, **notmuch-reply(1)**,
-**notmuch-restore(1)**, **notmuch-search(1)**,
+**notmuch-insert(1)**, **notmuch-new(1)**, **notmuch-reindex(1)**,
+**notmuch-reply(1)**, **notmuch-restore(1)**, **notmuch-search(1)**,
 **notmuch-search-terms(7)**, **notmuch-show(1)**, **notmuch-tag(1)**
 
 The notmuch website: **https://notmuchmail.org**
diff --git a/doc/man7/notmuch-search-terms.rst b/doc/man7/notmuch-search-terms.rst
index 47cab48d..dd76972e 100644
--- a/doc/man7/notmuch-search-terms.rst
+++ b/doc/man7/notmuch-search-terms.rst
@@ -9,6 +9,8 @@ SYNOPSIS
 
 **notmuch** **dump** [--format=(batch-tag|sup)] [--] [--output=<*file*>] [--] [<*search-term*> ...]
 
+**notmuch** **reindex** [option ...] <*search-term*> ...
+
 **notmuch** **search** [option ...] <*search-term*> ...
 
 **notmuch** **show** [option ...] <*search-term*> ...
@@ -421,5 +423,6 @@ SEE ALSO
 
 **notmuch(1)**, **notmuch-config(1)**, **notmuch-count(1)**,
 **notmuch-dump(1)**, **notmuch-hooks(5)**, **notmuch-insert(1)**,
-**notmuch-new(1)**, **notmuch-reply(1)**, **notmuch-restore(1)**,
-**notmuch-search(1)**, **notmuch-show(1)**, **notmuch-tag(1)**
+**notmuch-new(1)**, **notmuch-reindex(1)**, **notmuch-reply(1)**,
+**notmuch-restore(1)**, **notmuch-search(1)**, **notmuch-show(1)**,
+**notmuch-tag(1)**
diff --git a/notmuch-client.h b/notmuch-client.h
index a6f70eae..ab7138c6 100644
--- a/notmuch-client.h
+++ b/notmuch-client.h
@@ -196,6 +196,9 @@ int
 notmuch_insert_command (notmuch_config_t *config, int argc, char *argv[]);
 
 int
+notmuch_reindex_command (notmuch_config_t *config, int argc, char *argv[]);
+
+int
 notmuch_reply_command (notmuch_config_t *config, int argc, char *argv[]);
 
 int
diff --git a/notmuch-reindex.c b/notmuch-reindex.c
new file mode 100644
index 00000000..8b536375
--- /dev/null
+++ b/notmuch-reindex.c
@@ -0,0 +1,131 @@
+/* notmuch - Not much of an email program, (just index and search)
+ *
+ * Copyright © 2016 Daniel Kahn Gillmor
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see http://www.gnu.org/licenses/ .
+ *
+ * Author: Daniel Kahn Gillmor <[hidden email]>
+ */
+
+#include "notmuch-client.h"
+#include "string-util.h"
+
+static volatile sig_atomic_t interrupted;
+
+static void
+handle_sigint (unused (int sig))
+{
+    static char msg[] = "Stopping...         \n";
+
+    /* This write is "opportunistic", so it's okay to ignore the
+     * result.  It is not required for correctness, and if it does
+     * fail or produce a short write, we want to get out of the signal
+     * handler as quickly as possible, not retry it. */
+    IGNORE_RESULT (write (2, msg, sizeof (msg) - 1));
+    interrupted = 1;
+}
+
+/* reindex all messages matching 'query_string' using the passed-in indexopts
+ */
+static int
+reindex_query (notmuch_database_t *notmuch, const char *query_string,
+       notmuch_param_t *indexopts)
+{
+    notmuch_query_t *query;
+    notmuch_messages_t *messages;
+    notmuch_message_t *message;
+    notmuch_status_t status;
+
+    int ret = NOTMUCH_STATUS_SUCCESS;
+
+    query = notmuch_query_create (notmuch, query_string);
+    if (query == NULL) {
+ fprintf (stderr, "Out of memory.\n");
+ return 1;
+    }
+
+    /* reindexing is not interested in any special sort order */
+    notmuch_query_set_sort (query, NOTMUCH_SORT_UNSORTED);
+
+    status = notmuch_query_search_messages (query, &messages);
+    if (print_status_query ("notmuch reindex", query, status))
+ return status;
+
+    for (;
+ notmuch_messages_valid (messages) && ! interrupted;
+ notmuch_messages_move_to_next (messages)) {
+ message = notmuch_messages_get (messages);
+
+ notmuch_message_reindex(message, indexopts);
+ notmuch_message_destroy (message);
+ if (ret != NOTMUCH_STATUS_SUCCESS)
+    break;
+    }
+
+    notmuch_query_destroy (query);
+
+    return ret || interrupted;
+}
+
+int
+notmuch_reindex_command (notmuch_config_t *config, int argc, char *argv[])
+{
+    char *query_string = NULL;
+    notmuch_database_t *notmuch;
+    struct sigaction action;
+    int opt_index;
+    int ret;
+    notmuch_param_t *indexopts = NULL;
+
+    /* Set up our handler for SIGINT */
+    memset (&action, 0, sizeof (struct sigaction));
+    action.sa_handler = handle_sigint;
+    sigemptyset (&action.sa_mask);
+    action.sa_flags = SA_RESTART;
+    sigaction (SIGINT, &action, NULL);
+
+    notmuch_opt_desc_t options[] = {
+ { NOTMUCH_OPT_INHERIT, (void *) &notmuch_shared_options, NULL, 0, 0 },
+ { 0, 0, 0, 0, 0 }
+    };
+
+    opt_index = parse_arguments (argc, argv, options, 1);
+    if (opt_index < 0)
+ return EXIT_FAILURE;
+
+    notmuch_process_shared_options (argv[0]);
+
+    if (notmuch_database_open (notmuch_config_get_database_path (config),
+       NOTMUCH_DATABASE_MODE_READ_WRITE, &notmuch))
+ return EXIT_FAILURE;
+
+    notmuch_exit_if_unmatched_db_uuid (notmuch);
+
+    query_string = query_string_from_args (config, argc-opt_index, argv+opt_index);
+    if (query_string == NULL) {
+ fprintf (stderr, "Out of memory\n");
+ return EXIT_FAILURE;
+    }
+
+    if (*query_string == '\0') {
+ fprintf (stderr, "Error: notmuch reindex requires at least one search term.\n");
+ return EXIT_FAILURE;
+    }
+    
+    ret = reindex_query (notmuch, query_string, indexopts);
+
+    notmuch_database_destroy (notmuch);
+
+    return ret || interrupted ? EXIT_FAILURE : EXIT_SUCCESS;
+}
diff --git a/notmuch.c b/notmuch.c
index 8e332ce6..201c7454 100644
--- a/notmuch.c
+++ b/notmuch.c
@@ -123,6 +123,8 @@ static command_t commands[] = {
       "Restore the tags from the given dump file (see 'dump')." },
     { "compact", notmuch_compact_command, NOTMUCH_CONFIG_OPEN,
       "Compact the notmuch database." },
+    { "reindex", notmuch_reindex_command, NOTMUCH_CONFIG_OPEN,
+      "Re-index all messages matching the search terms." },
     { "config", notmuch_config_command, NOTMUCH_CONFIG_OPEN,
       "Get or set settings in the notmuch configuration file." },
     { "help", notmuch_help_command, NOTMUCH_CONFIG_CREATE, /* create but don't save config */
diff --git a/performance-test/M04-reindex.sh b/performance-test/M04-reindex.sh
new file mode 100755
index 00000000..d36e061b
--- /dev/null
+++ b/performance-test/M04-reindex.sh
@@ -0,0 +1,11 @@
+#!/bin/bash
+
+test_description='reindex'
+
+. ./perf-test-lib.sh || exit 1
+
+memory_start
+
+memory_run 'reindex *' "notmuch reindex '*'"
+
+memory_done
diff --git a/performance-test/T03-reindex.sh b/performance-test/T03-reindex.sh
new file mode 100755
index 00000000..7af2d22d
--- /dev/null
+++ b/performance-test/T03-reindex.sh
@@ -0,0 +1,13 @@
+#!/bin/bash
+
+test_description='tagging'
+
+. ./perf-test-lib.sh || exit 1
+
+time_start
+
+time_run 'reindex *' "notmuch reindex '*'"
+time_run 'reindex *' "notmuch reindex '*'"
+time_run 'reindex *' "notmuch reindex '*'"
+
+time_done
diff --git a/test/T700-reindex.sh b/test/T700-reindex.sh
new file mode 100755
index 00000000..32385a72
--- /dev/null
+++ b/test/T700-reindex.sh
@@ -0,0 +1,21 @@
+#!/usr/bin/env bash
+test_description='reindexing messages'
+. ./test-lib.sh || exit 1
+
+add_email_corpus
+
+test_begin_subtest 'reindex preserves message-ids'
+notmuch search --output=messages '*' > EXPECTED
+# remove duplicate file
+rm $MAIL_DIR/bar/18:2,
+notmuch reindex '*'
+notmuch search --output=messages '*' > OUTPUT
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest 'reindex preserves tags'
+notmuch dump > EXPECTED
+notmuch reindex '*'
+notmuch dump > OUTPUT
+test_expect_equal_file EXPECTED OUTPUT
+
+test_done
--
2.11.0

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

[rfc patch v3 5/6] test: add known broken test for duplicate message id

In reply to this post by David Bremner-2
There are many other problems that could be tested, but this one we
have some hope of fixing because it doesn't require UI changes, just
indexing changes.
---
 test/T670-duplicate-mid.sh | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)
 create mode 100755 test/T670-duplicate-mid.sh

diff --git a/test/T670-duplicate-mid.sh b/test/T670-duplicate-mid.sh
new file mode 100755
index 00000000..88bd12cb
--- /dev/null
+++ b/test/T670-duplicate-mid.sh
@@ -0,0 +1,17 @@
+#!/usr/bin/env bash
+test_description="duplicate message ids"
+. ./test-lib.sh || exit 1
+
+add_message '[id]="id:duplicate"' '[subject]="message 1"'
+add_message '[id]="id:duplicate"' '[subject]="message 2"'
+
+test_begin_subtest 'Search for second subject'
+test_subtest_known_broken
+cat <<EOF >EXPECTED
+MAIL_DIR/msg-001
+MAIL_DIR/msg-002
+EOF
+notmuch search --output=files subject:'"message 2"' | notmuch_dir_sanitize > OUTPUT
+test_expect_equal_file EXPECTED OUTPUT
+
+test_done
--
2.11.0

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

[rfc patch v3 6/6] lib: index message files with duplicate message-ids

In reply to this post by David Bremner-2
The corresponding xapian document just gets more terms added to it,
but this doesn't seem to break anything.
---
 lib/database.cc            |  3 +++
 test/T670-duplicate-mid.sh | 22 +++++++++++++++++++---
 2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index 5bc131a3..3b9f7828 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -2582,6 +2582,9 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
     if (ret)
  goto DONE;
  } else {
+    ret = _notmuch_message_index_file (message, message_file);
+    if (ret)
+ goto DONE;
     ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
  }
 
diff --git a/test/T670-duplicate-mid.sh b/test/T670-duplicate-mid.sh
index 88bd12cb..2c77e11e 100755
--- a/test/T670-duplicate-mid.sh
+++ b/test/T670-duplicate-mid.sh
@@ -2,11 +2,10 @@
 test_description="duplicate message ids"
 . ./test-lib.sh || exit 1
 
-add_message '[id]="id:duplicate"' '[subject]="message 1"'
-add_message '[id]="id:duplicate"' '[subject]="message 2"'
+add_message '[id]="duplicate"' '[subject]="message 1"'
+add_message '[id]="duplicate"' '[subject]="message 2"'
 
 test_begin_subtest 'Search for second subject'
-test_subtest_known_broken
 cat <<EOF >EXPECTED
 MAIL_DIR/msg-001
 MAIL_DIR/msg-002
@@ -14,4 +13,21 @@ EOF
 notmuch search --output=files subject:'"message 2"' | notmuch_dir_sanitize > OUTPUT
 test_expect_equal_file EXPECTED OUTPUT
 
+add_message '[id]="duplicate"' '[body]="sekrit"'
+test_begin_subtest 'search for body in duplicate file'
+cat <<EOF >EXPECTED
+MAIL_DIR/msg-001
+MAIL_DIR/msg-002
+MAIL_DIR/msg-003
+EOF
+notmuch search --output=files "sekrit" | notmuch_dir_sanitize > OUTPUT
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest 'reindex removes terms from duplicate file'
+rm $MAIL_DIR/msg-003
+notmuch reindex id:duplicate
+cp /dev/null EXPECTED
+notmuch search --output=files "sekrit" | notmuch_dir_sanitize > OUTPUT
+test_expect_equal_file EXPECTED OUTPUT
+
 test_done
--
2.11.0

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|

Re: third round of indexing all files

In reply to this post by David Bremner-2
David Bremner <[hidden email]> writes:

> It seems noticeably faster (on the order of 30-50% faster) and the
> code is quite a bit simpler to adapt the approach in [1] to only
> delete the terms we are going to re-add via indexing.
>
> This obsoletes the previous series at [2]. It still has all of the
> issues mentioned there UI-wise, and the question of the index options
> design probably needs more thought.
>

Some belated testing reveals this implimentation is pretty broken. It
probably won't eat your database, but that's only because I forgot to
add a call to _notmuch_message_sync. So I'd recommend passing on this
for now. The previous approach is probably OK, although I'm going to
bash at this fancier approach a bit to see if I can make it work.
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch