Quantcast

RFC: adding larger test corpus, switching to xz

classic Classic list List threaded Threaded
2 messages Options
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RFC: adding larger test corpus, switching to xz


I currently have some WIP code that passes all tests with our default
corpus, but fails with the smallest performance corpus. The simplest
thing to do would be to add a small sample from our performance corpus
as one for our standard (correctness) suite. I'm currently looking at
146 LKML messages. Unpacked these are about 1.3M; they bloat the source
tarball by about 285K, which is large in relative terms (about 40%), but
small in absolute terms for most modern systems. If we switch to xz
compression, the resulting tarball is only 711K.

So comments:

1) is it worth it to have a larger test corpus to blow up our source
   tarball size?

2) Should we (independently) switch to xz compression for our tarballs?

I'm not very enthusiastic about complicating the test system with yet
another kind of artifact to be downloaded. I think if we want to
minimize the size of the test corpora, I'd probably just extract the
troublesome thread, which will work for testing my current bug, but
maybe not so good for finding future bugs.

d

_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch

signature.asc (671 bytes) Download Attachment
David Bremner-2 David Bremner-2
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFC: adding larger test corpus, switching to xz

David Bremner <[hidden email]> writes:

> I currently have some WIP code that passes all tests with our default
> corpus, but fails with the smallest performance corpus. The simplest
> thing to do would be to add a small sample from our performance corpus
> as one for our standard (correctness) suite. I'm currently looking at
> 146 LKML messages. Unpacked these are about 1.3M; they bloat the source
> tarball by about 285K, which is large in relative terms (about 40%), but
> small in absolute terms for most modern systems. If we switch to xz
> compression, the resulting tarball is only 711K.
>

In the end I found 210 messages (1 thread of 100, one of 48, assorted
smaller threads) that only bloated the source by 161k, so that I decided
to add the corpus. It's not used yet in the test suite, but it is needed
by a series I will post soon.
_______________________________________________
notmuch mailing list
[hidden email]
https://notmuchmail.org/mailman/listinfo/notmuch
Loading...