June 16, 2003
I've put up an archive of the sd-callers email list. It was archived at yahoo groups, but, as I've mentioned before, that archive ran out of space in November 2001. Since there's a lot of good stuff discussed on the list, it seemed a shame for it to vanish into the giant bitbucket in the sky. I know that a lot of list subscribers keep their own archive, but many don't.
First, I researched the various (free) archive mechanisms available. I was tempted by The Mail Archive, but they didn't have an easy mechanism to import old messages (and I was looking at 26692 messages just through November 2001). I figured someone must have written an mbox to mysql database converter, but it seems that it's a fairly hard problem. I have a predilection for databases, so that was definitely my first thought (after first trying to put it in some other archiving service). I did find mail2mysql, but at the time, I was intimidated by perl. (Now, after working with mhonarc (perl), I could probably set this up (and I might in the future)).
After looking at various email archives on line, it seemed like most were using either Hypermail or MHonArc to "HTML-ize" email. I liked the look of Mhonarc, and the support looked good, so I decided to go with it. I also looked at mharc, a collection of perl scripts using MHonArc to automate the updates for a collection of mailing lists, but it uses the Namazu search engine, and I couldn't get it to compile under Mac OS X (issues with redefinitions of getopt, in case you're interested).
Since I was only going to do one email list, I figured I could write the automation scripts myself, so I gave up on mharc (although I did crib the mhonarc resource files and CSS file).
The next step was to get all the yahoo archives into an mbox format. Fortunately, I found a script to do this: yahoo2mbox: Archiver for Yahoo! Groups. Since yahoo has a daily bandwidth limit for each group, it took several days to grab all the email messages into one 72.5 MB file. I used a utility script from mharc to break the mbox file into separate mbox files for each month. I used the same script on my own mbox archives to get the messages sent after November 23, 2001.
I also researched search engines, and decided to go with ht://Dig. Compilation was a breeze; no tweaking at all. Installation wasn't too bad...a little trial and error to get the directories set up appropriately.
After playing with the MHonArc resource files and a lot of trial and error, I got them set up to look and work okay. I wrote a PHP script to do the navigation to the previous and next index pages, and a PHP script to generate the main index page. I decided to add a subject index and an author index, as well as the standard date and thread indices.
Then I wrote a shell script to call MHonArc for each monthly archive, and sat back. After a few (well, a bunch) of minutes, MHonArc had created the HTML files for the approximately 30000 messages and the index files for each month. I tarred and gzipped them up, and transferred them over from my work machine (a Mac G4/DP 450) to my server machine (a Mac G3 300). I installed htdig, tested it, did a little trial and error stuff, and let it rip on the sd-callers files.
Once it looked okay, I announced it on the sd-callers list and started watching my log files. Then I decided that I would install a logfile analysis program. I picked AWStats, because that's the one I use and like for squarez.
After all the other installations I'd done, AWStats was a breeze...copy a few files into the right places, modify a configuration file, and run it.
In the process, I learned to use CPAN a little bit (yahoo2mbox required several modules that weren't installed on my machine), learned to read a little perl, learned a lot about MHonArc, learned that the shell is really terrible for string manipulation, but that bash is a little better than the others, and a lot of other stuff. It's great to running a system that makes using all the gnu public license software pretty easy (as easy as it can be, given that the documentation is usually written by coders for sysops who are assumed to have a lot of background knowledge.
The sd-callers archives is limited to sd-callers subscribers. If you are a subscriber, you've seen the announcement.