Re: Not sure how to approach this. (did stats 15 years ago)
From: Richard Ulrich (Rich.Ulrich_at_comcast.net)
Date: 09/25/04
- Next message: Richard Ulrich: "Re: Confidence Intervals"
- Previous message: Richard Ulrich: "Re: When is autocovariance small indicating independent values?"
- In reply to: Rithban: "Re: Not sure how to approach this. (did stats 15 years ago)"
- Messages sorted by: [ date ] [ thread ]
Date: Fri, 24 Sep 2004 22:45:55 -0400
On Thu, 23 Sep 2004 11:06:10 -0600, Rithban <rithban@yahoo.com> wrote:
RU> > What is that, 44 different test problems to benchmark?
Ri>
> Yes. Each unique. Testing a large framework. Two object categories, seven
> configurations which include three important and independent functions, and
> one constant parameter (data set). So five items in 44 combinations.
You might be able to describe, in the end, "five items" and their
main effects. Would all the effects be additive? Would there
be rather few statistical 'interactions'? - For timings, I do expect
the differences to be proportional, which suggests using logs.
However, if the basic units being tested are aggregates of 100
or 180 data points, there might be far less problem with the scaling.
Ri>
> Fortunately as I look at things deeper, about 1/4 of the combinations have
> no meaningful data because operating system effects overshadow the
> algorithms. For the most part about another 1/3 show no significant
> variance. Well, this is what I was testing for, but it's good to know.
RU>
> > 18,000? Your program is writing out results for
> > every pass through some loop? ...cases where...I/O took more CPU time
Ri>
> Your objections are very, very reasoned. I need that many data points to
> squish a degree of intercontinental office politicking, where irrationality
> overrides reason.
>
> No, I'm handling 18k separate examples of real-world data we swiped from the
> lab -- with permission. :-) Technically I only need a subset of the 18k. The
> first part of the data has a linear drift in four of the seven configurations.
If you read me again -- I was not objecting to 18k examples.
I object to 18k measurements, either as a collection strategy,
or as an analysis strategy.
If you did only one of the 44 benchmarks in one pass through
the data, it would be simple to write out the timings after
very K=100 or more of the lines.
If you are performing the 44 benchmarks after each item-read,
then perhaps you do need to collect the information after
each of the 44. However, you still might decide to collect
100 or more lines before writing out each report of cumulative
time-so-far.
Or, after you have the 18k timings, you can aggregate them
into 100 or 180 sums. I've got in mind things like (a) scaling,
(b) reducing the need for transformation, and (c) creating a small
enough set of numbers to make close examination more feasible.
Ri>
> However, there are distinct behaviours that happen over time (varying by
> volume of data). We *do* have to have dump in around 10k experiments (each
> generating 44 data points) before the timings converge on an upper limit.
>
This suggests to me that you might extract more information
from your experiment if you duplicated some of the early data.
That is, you have evidence that performance worsens over
time; that evidence, right now, is based on a different set of
cases for the Early and Late. It would be stronger evidence if
you could say, "These 1000 cases took X seconds in the first
quarter of the file, and worsened to 3*X seconds for the same
cases when they were placed at the end."
[snip, mostly incidental commentary on my previous Reply]
-- Rich Ulrich, wpilib@pitt.edu http://www.pitt.edu/~wpilib/index.html
- Next message: Richard Ulrich: "Re: Confidence Intervals"
- Previous message: Richard Ulrich: "Re: When is autocovariance small indicating independent values?"
- In reply to: Rithban: "Re: Not sure how to approach this. (did stats 15 years ago)"
- Messages sorted by: [ date ] [ thread ]