Re: Identifying the distribution of a data set



ali wrote:
Dear all

I am creating a software that is reading tcp packets from a link. I have the following information available:

size of the packet: 1 4 7 9..........
frequency of the packet: 12 6 9 1..........

This is just an example. In reality I have thousands of these values.

Now I want to check what distribution fits the packet sizes best for e.g whether the distribution is Poisson or hyper exponential or Pareto or Gamma etc.

One way I guess is to plot a histogram and then study the shape. But I want the task to be fully automated and performed implicitly by the tool that I am developing.

I have been looking into books and searching on the internet as well. But I did not come across any discrete values that could be compared to predict the distribution like for e.g (just assuming)

if coefficient of variance of the sizes of packets is less than 1 then its Poisson etc.

Do you know of any algorithms etc in this regards. I have come across Goodness of fit tests for e.g chi-square test and Kolmogorov test. But I dont exactly know what they do and I think they need some sort of reference data that fits a distribution to calculate the difference from that distribution. Am I following the right path or do I need to look into some thing else?

My deadline is 23rd of August so I am looking for a short cut way. It would be very kind of you to help me.

I have searched this forum quite extensively and I did come across some similar posts but most of the answers I could not understand most probably because I not a statistician rather a computer programmer.

I hope you understand my problem. Any help would be appreciated and please please keep it simple.

Thanks in advance to all of you.

ali

If you have access to Mathcad, there is a useful thread with some code
for comparing fits to different distributions on the Mathcad
collaboratory http://collab.mathsoft.com/%7Emathcad2000 . You may have
to register.

It's in the "Probability & Statistics" section, entitled "Fitting
Statistical Distributions", started 2 July 2005. The last code is in
message 47, Sept 22 2005, from Paul W, "distribution ranking d.mcd".

No guarantees, but Paul W seems to know what he's talking about.

If you don't have Mathcad, maybe you could contact him via the
collaboratory for help/suggestions.

HTH

Jon

.



Relevant Pages

  • Re: Identifying the distribution of a data set
    ... ali wrote: ... I am creating a software that is reading tcp packets from a link. ... Now I want to check what distribution fits the packet sizes best for e.g whether the distribution is Poisson or hyper exponential or Pareto or Gamma etc. ... If you have access to Mathcad, there is a useful thread with some code ...
    (sci.stat.math)
  • Re: Probability question in an M/M/2/4 queue
    ... They're both currently servicing packets, ... and get P_2 followed by Server 2 finishing its job and receiving P_2, ... by the memoryless property of the exponential distribution it seems to ... service times (any uniform distribution) ...
    (sci.math)
  • Re: Self restarting property of RTOS-How it works?
    ... > Del Cecchi wrote: ... > packets per minute, but asynchronously. ... first I would want to know the average utilization of the link. ... and the distribution of the rate at which they are sent. ...
    (comp.arch.embedded)
  • Re: Timing Issue to send and Receive data
    ... Waht do maen by "Nagle". ... "Arkady Frenkel" wrote: ... and if so will put your packets into a single bigger TCP segment. ...
    (microsoft.public.win32.programmer.networks)
  • Identifying the distribution of a data set
    ... I am creating a software that is reading tcp packets from a link. ... Now I want to check what distribution fits the packet sizes best for e.g whether the distribution is Poisson or hyper exponential or Pareto or Gamma etc. ... I have been looking into books and searching on the internet as well. ...
    (sci.stat.math)