Re: Median of multi-dimensioned data



If your graphs are better thought of as functions than vectors
then google for "functional data analysis".

Ross Clement (Email address invalid - do not use) wrote:
Hi everyone.

I have some multidimensioned data. Each piece of data is the results of
running an autocorrelation on an audio signal for a number of different
time shift. The signal is a sung vowel sound at (approximately) a
requested frequency.

At a number of intervals on this signal, I calculate autocorrelation
for a number of time shifts for a fixed window. This gives me a large
number of graphs (as I think of them, highly multidimensional data with
one dimension for each time shift).

I then check each graph to ensure that there is a sufficiently high
peak near to the requested frequency. If absent, the graph is
discarded. If present the frequency is estimated by fitting a parabola
to the peak and solving for the maximum. The graph is resampled to
correct for any frequency deviation from the requested frequency.

OK, at this point I have (I hope) a number of graphs remaining. I want
to create a single "typical" graph that represents the typical
autocorrelation for that vowel at that frequency, and also gives
standard deviations for each value of the graph (each dimension). The
problem: I *expect* is that there will be outliers. I'm not yet sure
whether removing these outliers will "improve performance", but
obviously need to remove them to try that experiment.

I can easily remove outliers by seat-of-the-pants methods. E.g. I could
calculate the average graph, then discard the graphs that are
uncharacteristically distance from the average. I then calculate the
mean values and deviations from the remaining graphs to arrive at my
model.

But I was wondering if there are more sophisticated techniques that I
should be looking at. Any suggestions?

Cheers,

Ross-c

.