Re: std deviation and median



thanks,
My data doesn't normally distributed. what to do ?
regards.

<joeu2004@xxxxxxxxxxx> a écrit dans le message de news:
1121830013.625135.191450@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> "health inc." <ah26111...@xxxxxxxx> wrote:
>> when SPSS compute the standard deviation. It means a standard
>> deviation of mean of median ?
>
> I am not familiar with SPSS per se, but usually the unqualified
> term "standard deviation" refers to the sample standard deviation
> of the data from (or about or around) the mean.
>
>> I have a median = 3000 and mean = 6000 and std. dev. = 12000.
>> I want to delete some extreme values (+/- 2 std. deviation),
>> it means that I should delete all values > 24000 (12000 x 2)
>> or by median 27000 (3000 + 24000), or by mean 30000
>> (6000 + 24000) ? which would I consider ?
>
> Your question is a little like asking "have you stopped beating
> your wife?". Neither "yes" nor "no" is adequate because the
> question makes presumptions that might be incorrect.
>
> First, your goal is to __identify__ outliers, not necessarily to
> delete them.
>
> Second, there are many methods for identifying outliers. Since
> you do not seem to have a normal distribution (since the median
> is very different from the mean; perhaps it is "normal" with a
> large right skew), the standard deviation might not be the correct
> criterion to use in your case. Instead, you might want to consider
> the IQR. (More about that below.)
>
> Third, if you choose to use the standard deviation, there is not
> much agreement about how many standard deviations constitute an
> outlier. If you had a two-tailed normal distribution, we would
> expect 4.5% of the data beyond +/-2 sd of the mean. That is not
> my idea of an outlier. I prefer 2.7 sd. I will make the reason
> clear below. But I will use 3 sd when I am computationally lazy.
> As I said, reasonable people disagree reasonably about that. But
> I often see 3 sd used for identifying outliers.
>
> Finally, if you choose to identify data beyond +/-2 sd as
> outliers, that would be data > 30000 (6000 + 24000) and
> data < -18000 (6000 - 24000); in other words, the mean +/-2 sd.
>>>From the phrasing of your question, I suspect your data is clipped
> at x = 0 -- or for some other reason you only consider mean +2 sd.
>
> However, as I mentioned, perhaps the IQR would be a better choice
> for determining outliers in your case, assuming that SPSS gives
> you the Q1 and Q3 values. It seems quite common to define "mild"
> outliers as data < Q1 - 1.5*IQR and data > Q3 + 1.5*IQR, where
> IQR = Q3 - Q1 and Q1 and Q3 are the 25th and 75th percentiles
> (1st and 3rd quartiles).
>
> Q1 - 1.5*IQR and Q3 + 1.5*IQR correspond to the mean +/-2.7 sd in
> a normal distribution. If we have a two-tailed normal distribution,
> we expect 0.7% of the data beyond +/-2.7 sd of the mean. And the
> mean +/-3 sd corresponds to Q1 - 1.7*IQR and Q3 + 1.7*IQR, which
> seems "close enough". In a two-tailed normal distribution, we
> expect 0.3% of the data beyond +/-3 sd of the mean.
>
> I hope this helps.
>


.



Relevant Pages

  • Re: std deviation and median
    ... I am not familiar with SPSS per se, ... term "standard deviation" refers to the sample standard deviation ... there are many methods for identifying outliers. ... If you had a two-tailed normal distribution, ...
    (sci.stat.edu)
  • Re: std deviation and median
    ... wrt the response from reef fish. ... In my experience most suspect values (aka outliers) are a result of data gathering or data entry errors. ... term "standard deviation" refers to the sample standard deviation of the data from the mean. ... If you had a two-tailed normal distribution, ...
    (sci.stat.edu)
  • Re: Compute the confidence interval of the standard deviation
    ... standard deviation, in principle nothing can be said about the ... underlaying population. ... is not that of the normal distribution. ... Herman Rubin, Department of Statistics, Purdue University ...
    (sci.math)
  • Re: confidence limit/interval
    ... garbage in, garbage out. ... CONFIDENCEis valid for a "normal distribution" of data. ... It presumes that we know the true standard deviation of the averages ... the above is all about confidence intervals ...
    (microsoft.public.excel.worksheet.functions)
  • Re: A test for randomness?
    ... normal distribution; you do not know the mean and standard deviation ... that you plot these 14 scores on a normal probability plot. ... (the expected value and variance of a uniformly distributed random ...
    (sci.stat.math)

Quantcast