Re: So many files, so few files.



Mac wrote:

Indeed, I'm sure you're right.  But then what?  As the months and
years go by, one would want a program to automatically crawl the
site and add updated material.  Note, I say add, since one wouldn't
want to follow the site by deleting material.  But this brings up
the issue of how to maintain the links, etc.


That's a good point. Thinking off the top of my head, the main thing you
want is pdf's (maybe?). You could search for and delete redundant ones
(automatically, of course). Meanwhile, you could just update the html
(using wget) but never delete any pdf's (unless they are redundant). If
some of the pdf's get orphaned from their links, that is OK, because you
could just use google desktop or whatever it is called to index all your
own disk space.


Since the htmls are usually pretty small compared with the pdfs, you could do the Wayback Machine thing--just keep snapshots of the link pages, with appropriately modified pdf file names to avoid overwriting them for a revision change. That way, you could ask for the latest data***, or the latest as of June 2005, or whenever.


Cheers,

Phil Hobbs

.


Quantcast