Re: Need an equation to work out file cacheing

From: Richard Ulrich (Rich.Ulrich_at_comcast.net)
Date: 08/03/04


Date: Mon, 02 Aug 2004 21:17:52 -0400

On Sun, 1 Aug 2004 19:06:28 +0000 (UTC), Gareth Williams
<gareth@nospam.com> wrote:

> Scenario:
>
> 1. A firm receives 100,000 orders per week as electronic documents.

As a real-world problem -- Are you referring to huge .pdf scans
of paper?

If they fill in an electronic 'form,' that should be, at most, 10K
bytes of actual data per order, or 1 GB per week to preserve.
So you save 4 years of data on a single, cheap hard-drive,
which is immediately accessible.

Immediate service for this style might encourage clients to
switch....

>
> 2. All documents are archived on CDROM.

I'm curious - really, CDROM, for high-capacity backups?

>
> 3. A percentage of orders (average around 10%) will go wrong at some
> stage and access to the original order will be required by Customer
> Services to resolve the problem.
>
> 4. Retrieval from CDROM is time-consuming but a limited amount of space is
> available to cache some of the documents on networked storage for more
> immediate retrieval. There is not enough space to cache all the documents
> so the older cached documents are deleted regularly.

A hard-disk farm is probably cheaper than overhead of
managing CDROMs, with their small capacity.

>
> 5. It is not possible to determine "up front" which orders will fail -
> whilst the failure rate is fairly constant over time, just about any
> document could end up being requested by Customer Services.
>
 - See, now that is one of the statements that makes this
sound a lot more like homework than like real-world.
Unless, as the other Reply suggested, you mean something
subtle about 'constant' so that 3 weeks old is much more
likely to be recalled than 3 years old.

> Problem:
>
> The firm would like to cache "N" documents (from the weekly pool of
> 100,000 orders) such that they can satisfy "X" percent of Customer Service
> requests from the networked storage. Customer Services are prepared to
> put up with [100-X]% of requests that would still need to be retrieved
> from the CDROM store.
>
> In short, how do we calculate "N"?

No cost-benefit? Just algebra? Save half, cache half?

I think you just left out something. This lacks some further
specification before it defines an interesting problem.

If there's no difference in recall rate, it doesn't matter
which ones you store. - Might as well slow-file the BIG
ones, since you can keep more of the short ones online.

-- 
Rich Ulrich, wpilib@pitt.edu
http://www.pitt.edu/~wpilib/index.html


Relevant Pages

  • Re: How many documents to store?
    ... Gareth Williams writes: ... All documents are archived on CDROM. ... > document could end up being requested by Customer Services. ... Suppose the fraction of documents you cache is p = N/50000. ...
    (sci.math.num-analysis)
  • Re: How to calculate a useful size for a given pool
    ... All documents are archived on CDROM. ... > stage and access to the original order will be required by Customer ... There is not enough space to cache all the documents ... > requests from the networked storage. ...
    (sci.math)
  • How to calculate a useful size for a given pool
    ... A firm receives 100,000 orders per week as electronic documents. ... All documents are archived on CDROM. ... document could end up being requested by Customer Services. ... requests from the networked storage. ...
    (sci.math)
  • Need an equation to work out file cacheing
    ... A firm receives 100,000 orders per week as electronic documents. ... All documents are archived on CDROM. ... document could end up being requested by Customer Services. ... requests from the networked storage. ...
    (sci.stat.math)
  • Re: How to calculate a useful size for a given pool
    ... All documents are archived on CDROM. ... >document could end up being requested by Customer Services. ... I'm assuming that the pdf decays over time, that orders from last week are ... If you want X to be 99%, look for the point where the cdf ...
    (sci.math)