compare 2 samples



Hello,

It's been decades since I looked at any stats, but presumably this is a very simple problem. I have 2 software algorithms I wish to compare. They are functionally identical, but differ in performance characteristics. I've set a test machine up so that it switches between algorithm implementations and keeps simple timing records for each alg. The results are:

Algorithm P
Samples: 2120, Timings: mean=0.005, stdDev=0.020,
Total Elapsed: 9.994 sec.

Algorithm Q
Samples: 2120, Timings: mean=0.009, stdDev=0.020,
Total Elapsed: 18.435 sec.

I would like to compare the 2 results and state with some confidence that there really is a difference.

When I first looked at the 2 results, I assumed that it was a simple one sided hypothesis test. Dusting off the old statistical text book it didn't look hard.

Assuming that the probability distribution for execution times on a highly concurrent server is normal, we would formulate a null hypothesis that the execution time for P is greater than or equal to the execution time of Q. The alternative hypothesis being that the execution time of P is less than the execution time for Q.

Assume alpha = 0.025 level of significance. Let mu = the true mean execution time for P. The mean execution time for Q is 0.009 seconds, denoted as mu0.

H0: mu >= mu0 = 0.009
H1: mu < 0.009

We have observed a sample mean Xbar = 0.005 for sample size n = 2120.
Using the standardized Z test statistic

Z = (Xbar - mu0) / (sig / n^1/2)
Z = (0.005 – 0.009) / (0.020 / 46.043)
Z = -9.209

For the alpha = 0.025 level of significance the rejection region is
R: Z <= -1.96
So for this case, we reject H0 and conclude that P is faster than Q

However, it seems that this reasoning is invalid because if I make the opposite assumption and reformulate as:


H0: mu >= mu0 = 0.005
H1: mu < 0.005

Z = (Xbar - mu0) / (sig / n^1/2)
Z = (0.009 – 0.005) / (0.020 / 46.043)
Z = 9.209

we have the same rejection region, but now we accept H0 and say that they are equivalent.

I'm guessing that the problem here is that I am not comparing to some 'objective' truth (the way that hypothesis tests are layed out in intro stats books). I assume that I am just using the wrong technique here, so what would the proper way be for comparing the 2 samples?

Thanks

Shane
.



Relevant Pages

  • Re: Aspiring highest-order programmer
    ... An NP algorithm, as Ian reminded us, is ... one whose execution time formula is not a polynomial. ... >>his own goddamn programming and is man enough to admit error) it was ... > reason to suspect that no such algorithm exists. ...
    (comp.programming)
  • Re: Parallel application ran more slowly than the sequential one?
    ... > Could you tell me what are the reasons to make an parallel application ... Your parallel algorithm may be less efficient than your serial algorithm. ... loading/unloading MPI data structures, sharing files over NFS, process ... Compare the runtime of your parallel program to ...
    (comp.parallel)
  • Re: count of each word occurred
    ... > take each word in order and compare it with the rest of the list ... Another version of this algorithm takes advantage of the fact that it does ... loop over all words in the list ... a.c.l.l.c-c++ FAQ: http://www.comeaucomputing.com/learn/faq ...
    (alt.comp.lang.learn.c-cpp)
  • Re: Reading and writing a big file in Ada (GNAT) on Windows XP
    ... common cases (no mapping, single character patterns, and so on). ... block move and compare instructions because they fouled up the pipeline and ... So in this case a better algorithm is probably the way to go (and I ... the source length is>M and the pattern length is>N, ...
    (comp.lang.ada)
  • Re: ArrayList BinarySearch vs Contains
    ... whether a required sorting operation will actually reduce the overall ... performance of an algorithm so that it becomes worse than a linear ... A Linear search does a Fast Compare and a fast iteration to the next ...
    (microsoft.public.dotnet.languages.csharp)

Quantcast