Re: std deviation and median
- From: "health inc." <ah26111972@xxxxxxxx>
- Date: Wed, 20 Jul 2005 00:19:16 -0400
thanks,
My data doesn't normally distributed. what to do ?
regards.
<joeu2004@xxxxxxxxxxx> a écrit dans le message de news:
1121830013.625135.191450@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> "health inc." <ah26111...@xxxxxxxx> wrote:
>> when SPSS compute the standard deviation. It means a standard
>> deviation of mean of median ?
>
> I am not familiar with SPSS per se, but usually the unqualified
> term "standard deviation" refers to the sample standard deviation
> of the data from (or about or around) the mean.
>
>> I have a median = 3000 and mean = 6000 and std. dev. = 12000.
>> I want to delete some extreme values (+/- 2 std. deviation),
>> it means that I should delete all values > 24000 (12000 x 2)
>> or by median 27000 (3000 + 24000), or by mean 30000
>> (6000 + 24000) ? which would I consider ?
>
> Your question is a little like asking "have you stopped beating
> your wife?". Neither "yes" nor "no" is adequate because the
> question makes presumptions that might be incorrect.
>
> First, your goal is to __identify__ outliers, not necessarily to
> delete them.
>
> Second, there are many methods for identifying outliers. Since
> you do not seem to have a normal distribution (since the median
> is very different from the mean; perhaps it is "normal" with a
> large right skew), the standard deviation might not be the correct
> criterion to use in your case. Instead, you might want to consider
> the IQR. (More about that below.)
>
> Third, if you choose to use the standard deviation, there is not
> much agreement about how many standard deviations constitute an
> outlier. If you had a two-tailed normal distribution, we would
> expect 4.5% of the data beyond +/-2 sd of the mean. That is not
> my idea of an outlier. I prefer 2.7 sd. I will make the reason
> clear below. But I will use 3 sd when I am computationally lazy.
> As I said, reasonable people disagree reasonably about that. But
> I often see 3 sd used for identifying outliers.
>
> Finally, if you choose to identify data beyond +/-2 sd as
> outliers, that would be data > 30000 (6000 + 24000) and
> data < -18000 (6000 - 24000); in other words, the mean +/-2 sd.
>>>From the phrasing of your question, I suspect your data is clipped
> at x = 0 -- or for some other reason you only consider mean +2 sd.
>
> However, as I mentioned, perhaps the IQR would be a better choice
> for determining outliers in your case, assuming that SPSS gives
> you the Q1 and Q3 values. It seems quite common to define "mild"
> outliers as data < Q1 - 1.5*IQR and data > Q3 + 1.5*IQR, where
> IQR = Q3 - Q1 and Q1 and Q3 are the 25th and 75th percentiles
> (1st and 3rd quartiles).
>
> Q1 - 1.5*IQR and Q3 + 1.5*IQR correspond to the mean +/-2.7 sd in
> a normal distribution. If we have a two-tailed normal distribution,
> we expect 0.7% of the data beyond +/-2.7 sd of the mean. And the
> mean +/-3 sd corresponds to Q1 - 1.7*IQR and Q3 + 1.7*IQR, which
> seems "close enough". In a two-tailed normal distribution, we
> expect 0.3% of the data beyond +/-3 sd of the mean.
>
> I hope this helps.
>
.
- Follow-Ups:
- Re: std deviation and median
- From: Art Kendall
- Re: std deviation and median
- References:
- std deviation and median
- From: health inc.
- Re: std deviation and median
- From: joeu2004
- std deviation and median
- Prev by Date: Re: std deviation and median
- Next by Date: Re: std deviation and median
- Previous by thread: Re: std deviation and median
- Next by thread: Re: std deviation and median
- Index(es):
Relevant Pages
|