Re: Finding Percentiles WITHOUT Sorting
From: Anonymous (anon_at_inter.net)
Date: 08/21/04
- Next message: Herman Rubin: "Re: joint distribution of correlated uniform"
- Previous message: Osher Doctorow: "Re: Frechet Extremal Family for a > = 1/2"
- In reply to: Thad Smith: "Re: Finding Percentiles WITHOUT Sorting"
- Next in thread: Thad Smith: "Re: Finding Percentiles WITHOUT Sorting"
- Reply: Thad Smith: "Re: Finding Percentiles WITHOUT Sorting"
- Messages sorted by: [ date ] [ thread ]
Date: Sat, 21 Aug 2004 15:42:04 +0200
Creating an index will not solve the problem: the index will of course be of
type Integer (4 byte), which is of the same order of that of original data
(type Single, 4 bytes, or Double, 8 bytes). I would need not 32 MB (smaller
by an order of magnitude) but another 400 or 200 MB, depending on whether
that data is Single or Double, respectively. If I had that kind of memory
available I would be having no problem in the first place -- or rather I
would have used it to contain more data, so I'd be in exactly the same
predicament.
"Thad Smith" <thad@ionsky.com> wrote in message
news:4126c5e2$1_4@omega.dimensional.com...
> Anonymous wrote:
> > I have a very large array that occupies most of the memory (several
hundred
> > megabytes in size). I want to calculate its percentiles, but the
"obvious"
> > way is to sort it first -- and I do not want to sort the original array
> > in-place, and of course I do not have enough memory to create a copy be
> > sorted.
>
> The most efficient way to do this exactly would be to sort. If you
> sorted in place, you would probably need an index for each value which
> carries its original location. In addition, you can use another data
> field for the rank, from which you can determine percentile. For 400 MB
> of data, you can use a 32-bit index. If there are 4 million items, you
> would need an additional 32 MB for the added data items. If the data
> structure already includes enough information to relate it to the
> original order or end use, you wouldn't need to add an index.
>
> Add index, sort on value, assign rank, sort on original index. Compute
> percentile from rank.
>
> Thad
>
- Next message: Herman Rubin: "Re: joint distribution of correlated uniform"
- Previous message: Osher Doctorow: "Re: Frechet Extremal Family for a > = 1/2"
- In reply to: Thad Smith: "Re: Finding Percentiles WITHOUT Sorting"
- Next in thread: Thad Smith: "Re: Finding Percentiles WITHOUT Sorting"
- Reply: Thad Smith: "Re: Finding Percentiles WITHOUT Sorting"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|