Re: Sampling Threshold for Distributions



Hosley wrote:
I am charting distributions of one variable across another, where the
dependent variable has been averaged at discrete independent values.
There are fewer samples at larger numbers, and at certain higher
values there may only be one or two samples that went into the
averaged values. These extreme values are not very representative of
their population since they have low sample size, so I would like to
pre-determine a threshold of how many (or what percentage) of samples
are sufficient to be included in the analysis. Is there any commonly
used rule in stats used to decide when to draw a line b/t values that
are of high enough sampling power to be included in a distribution
chart and those that are not?

Here is an example, just in case the above was painful and confusing
(don't make yourself read this if you already understood): Say for a
100 tree branches, I have measured their length and their weight, both
rounded to the nearest whole number (inches and lbs., respectively).
Now I want to create a chart that shows how branch length varies with
tree weight. I average the length of all of the branches of the same
weight, and plot this on a chart where branch weight is my x axis, and
the averaged branch length is my y axis. I would assume that such a
chart would have a positive slope, but there may be certain weights,
particularly those of smaller and higher values, that only 1 or 2
branches fell into. Thus the average length values at these given
weights will only have 1 or 2 samples. If there is a lot of natural
variability in my population then there is a decent chance that these
low sample values will not be representative of the true population.
Moreover, if I include them in my distribution chart w/o including
sample size at each point (which may be unfeasible), then the viewer
cannot tell which values are more representative than others.
Therefore, I need to determine at what point I am justified in cutting
off values from my population. Note that this not some scalar
statistical analysis, but instead meant to provide a visual
distribution of how the two variables (length and weight) correlate
with respect to each other, and thus it is the overall characteristic
that is important here.

Sorry for the long post. Thanks!

Why not give an ordinary x-y scatterplot, with y-jittering as needed,
with a superimposed regression line and its confidence region?

.



Relevant Pages

  • Sampling Threshold for Distributions
    ... chart and those that are not? ... 100 tree branches, I have measured their length and their weight, both ... variability in my population then there is a decent chance that these ... if I include them in my distribution chart w/o including ...
    (sci.stat.math)
  • Re: Sampling Threshold for Distributions
    ... chart and those that are not? ... 100 tree branches, I have measured their length and their weight, both ... if I include them in my distribution chart w/o including ... variability of the two combined would get messy. ...
    (sci.stat.math)
  • Re: Sampling Threshold for Distributions
    ... chart and those that are not? ... 100 tree branches, I have measured their length and their weight, both ... if I include them in my distribution chart w/o including ... variability of the two combined would get messy. ...
    (sci.stat.math)
  • Re: Line chart - date line association gone mad!
    ... >> column, among others, with a "Weight" entry for each day. ... >> entries and the chart display suddenly has gone awry. ... >> Checking the data line reveals that for each datapoint the display for X ...
    (microsoft.public.excel.charting)
  • Re: A great article regarding those picky eaters...it will make you feel better!
    ... to know for sure that the child is off the charts but following the ... I am saying that having an average chart at all is a stupid way to measure the child's wellbeing. ... If he is growing, he is growing. ... Kid is gaining weight ...
    (misc.kids)