Re: Need Help Determining the "True" Mean of a Sample

From: Richard Ulrich (Rich.Ulrich_at_comcast.net)
Date: 07/27/04


Date: Tue, 27 Jul 2004 12:02:03 -0400

On 26 Jul 2004 10:54:06 -0700, mafro@excite.com (Mafro) wrote:

> Richard,
>
> Thanks very much for your detailed and insightful response!
>
> To answer your questions, the model represents a user visiting a given
> web page that sells products and the values represent the gross
> revenue that each user session generates. The model attempts to
> determine the "true" value of user sessions on a given web page. As to
> the mechanisms, when a user visits a page one of three specific events
> take place:
>
> 1) The user simply leaves (about 75% of the time, which generates
> $0.00).
> 2) The user leaves by clicking an advertisement (about 20% of the
> time, which usually generates between $0.05 and $0.50, with a mean of
> $.25).
> 3) The user purchases a product (about 5% of the time, which usually
> generates between $8.00 and $12.00, with a mean of $10.00, depending
> on the value of the product).
>
> All three of these mechanisms are variable. If a page has a
> particularly good deal, the percentage of people that might purchase
> the product might be higher than 5%. At the same time, if there are
> few or irrelevant advertisements on a given page, the percentage of
> people leaving that page without clicking an ad might be less than
> 20%. So, the exact size and location of the peaks that result from
> these various mechanisms are different for each data set, but their
> existence is pretty ubiquitous.

Okay. That makes the problem more intelligible. In a way.

Now it looks like there is probably correlation between
the 5% and the 20% -- Pages with best response may be
higher on both. You might plot the percentages against
each other to check that.

But it is unclear what you are trying to predict or estimate.
Is that the future income from a page, based on its
history of clicks and buys? - just by fitting the past?

Are the numbers divided by time, to look at trends?

If clicks and buys are correlated, then the number of
clicks might help predict the buys, and the evidence
of other sites might be useful. Is that what you are after?
(But it seems to me that there are big potential differences
in site-design, not to mention the appeal of their ads.)

What is it that you are hoping to learn? Are you trying to
fit a model across the set of sites?

>
> In short, it is easy to determine the mean revenue per user session
> that each web page has generated in the past. However, I'd like to be
> able to determine the confidence interval for each page, based on the
> number of data points, so I can accurately give an upper and lower
> estimate of the "true" value of user sessions on that page with a 95%
> level of confidence. This range will then be used as a statistical
> basis for estimating the value of future user sessions to that page.
>
> Now, some of these web pages have had very few users visit them -
> between 10 and 100 user sessions. Intuitively I know that with such a
> small data set the confidence interval is going to be extremely broad
> and any estimates of future activity very inaccurate.
>
> Other pages have had thousands - perhaps even tens of thousands - of
> user sessions. Again, intuitively I know that with these larger data
> sets the confidence interval is going to be smaller and therefore I
> can offer a base prediction of the value of future user sessions that
> will be much more accurate.
>
> My hope is to find a formula (or formulae) that can be applied to
> thousands of web pages and their varying data sets that I have sitting
> in a database. Thanks again for any further insights you can offer....
[ snip, previous message/ reply]

Maybe this is much more trivial than I expected.
It is easy to look at one site at a time.

You can apply the binomial to get a CI on the number of
clicks, and estimate the revenue as the average times the
counts. Separately, you can take the binomial and apply
it to the buys, similarly.
For small Ns, the CI is going to be pretty wide, and it would
be even wider if you accounted for the difference in
return by the pay-off for a click or a buy. I would think that
it is probably sufficient if you are very explicit about the basis
of estimates, "number of clicks and purchases" and you clearly
state that "the actual revenue may be slightly higher or lower
depending on the exact items chosen."

-- 
Rich Ulrich, wpilib@pitt.edu
http://www.pitt.edu/~wpilib/index.html


Relevant Pages

  • Re: Need Help Determining the "True" Mean of a Sample
    ... determine the "true" value of user sessions on a given web page. ... All three of these mechanisms are variable. ... small data set the confidence interval is going to be extremely broad ... Thanks again for any further insights you can offer.... ...
    (sci.stat.math)
  • Re: Revenue charting
    ... So if the formula is returning #N/A, then the data set should not be charted, but are you still getting zero values in your chart? ... > I am trying to chart revenue trends over a monthly period. ...
    (microsoft.public.excel.misc)
  • Spread Revenue Over Time - Excel Help
    ... I have a data set in columns A through E (Name, Start Date, End Date, ... Revenue, Revenue per Day). ... would like to have a formula spread the revenue in the appropriate ... Is there an easy formula to accomplish this? ...
    (microsoft.public.excel.programming)