Re: How to identify flat (even) distributions?



On 11 Dec, 08:46, illywhacker <illywac...@xxxxxxxxx> wrote:
On Dec 10, 6:05 pm, Steve555 <foursh...@xxxxxxxxxxxxxx> wrote:



On 10 Dec, 16:50, illywhacker <illywac...@xxxxxxxxx> wrote:

On Dec 10, 5:27 pm, Steve555 <foursh...@xxxxxxxxxxxxxx> wrote:

Hi

If I have 1000 people and their opinion ratings for, say, 100 songs
each (on a scale 1-10) How do I test for those users that have rated
10 1s, 10 2s, 10 3s etc i.e. a flat distribution?
I know I can use standard deviation to  spot those that tend to give
the same rating, or polar extremes, but there's nothing uniquely
identifiable about the SD for these 'flat' users.

Except that they have standard deviation zero, if I have understood
your notation 10 1s, 10 2s, etc., which you did not explain.

illywhacker;

They have given 100 scores: 10 of each possible score from 1 to 10 =
100
sum = 550, mean = 5.5  SD = 2.525
The problem is that any number of distributions could have that SD, it
doesn't uniquely identify a flat distribution.

Sorry for lack of clarity; when naming the subject of this question I
was trying to think of synonyms for flat/even/level... is there an
accepted term for this that statisticians recognize?

You can also try the entropy, which is the classic measure of
uniformity. If the numbers of scores of 1, 2,...i,...10 for an
individual are  notated n_{i}, then define the proportions p_{i} to be
n_{i}/100. Then the entropy is

H(p) = - \sum_{i} p_{i} log(p_{i}) .

If the logarithm is taken to base 2, H tells you the minimum number of
yes/no questions you would need to ask on average to find out the
score, assuming you know the p_{i}. The maximum this can reach is log
(10) when all the p_{i} are the same, i.e. 1/10. The minimum is 0,
when one of the p_{i} = 1.

It does not take the ordering/neighbourhood relations (i.e. that 4
occurs next to 3 and 5, etc.)  into account though.

illywhacker;

Thanks for all the methods suggested. They all measure the quality I'm
looking for. The Simpsons one has the most intuitive scale (i.e 10 is
most uniform, 1 is least), same for the entropy (but on log scale) but
maybe the chi-squared one will be the fastest to compute.


- Then, the second component will be a contrast of some
scores in the "middle" versus both ends. Someone with
an equal number (per category) in the middle (3-8) as at the
ends (1,2, 9,10) is generally Uniform. Someone with most
in the middle is Normal. Someone with most at the extremes
is Extremist. (I'm curious if there are many there.)

Dave, if I've understood you right, the last paragraph suggests a way
I could maybe quickly sift for likely candidates (very uniform/
extreme)
and then do one of the slower calculations to sort them accurately.

"Every shade of gray" is a lot like "using extremes" -- it
doesn't reflect the "normal, centered near or a bit above
the middle" that I would expect from ratings of entertainment.
In fact, it seems strange that someone would end up that
way... perhaps they did it by intention, and perhaps that is
because they misunderstood the instructions?

I agree it's not the norm but they do exist. eg. many teenagers are
very polarized in their views; if it's an artist or genre they like
then everything is 'awesome' otherwise it's '****' ;-) In a big
enough sample, yes, you might have people who are confused or are
being plain awkward, and then you might have those that judge on some
really bizarre criteria such as whether a song contains profanity, or
have an axe to grind such as judging all Christian rock as 10, and
everything else as 0.
Interestingly, in the freely downloadable Netflix database (100
million ratings from 0.5 million users across 18000 movies), there are
plenty of these bi-polars.

Either way, my hunch is that it would be useful, when devising a music
recommendation system, to eliminate - or give a low weighting to - the
scores of these people.
Going back to the original question, I'm also interested to see how
useful those with a uniform distribution are, compared to those with a
normal distribution. (Useful meaning their predictive accuracy for new
data)

Cheers

Steve

.



Relevant Pages

  • Re: distribution of beta random variables
    ... suppose i have a random variable x distributed as beta. ... simplicity, assume it is uniform beta: ... if i take x and scale it by two constants a, ... then what is the resulting distribution of y? ...
    (sci.stat.math)
  • distribution of beta random variables
    ... suppose i have a random variable x distributed as beta. ... simplicity, assume it is uniform beta: ... if i take x and scale it by two constants a, ... then what is the resulting distribution of y? ...
    (sci.stat.math)
  • Re: Women talk only 3.4% more than men
    ... statistically significant rather than that it was statistically ... really only matters when one gets to the extremes of the ... Presuming a normal distribution of ... Surely their temperament and happiness are the main variables? ...
    (uk.philosophy.humanism)
  • Re: Women talk only 3.4% more than men
    ... statistically significant rather than that it was statistically ... really only matters when one gets to the extremes of the ... Presuming a normal distribution of ... Surely their temperament and happiness are the main variables? ...
    (uk.philosophy.humanism)
  • Re: Women talk only 3.4% more than men
    ... statistically significant rather than that it was statistically ... really only matters when one gets to the extremes of the ... Presuming a normal distribution of ... Also on experimental design. ...
    (uk.philosophy.humanism)