Re: Can any one help me calculate a statistical probability
- From: James Burns <burns.87@xxxxxxx>
- Date: Tue, 18 Mar 2008 16:31:18 -0400
flame.dawn@xxxxxxxxx wrote:
Here is the question. This concerns a claim of plagarism. There are
two indexes of a similar text numbering about 750,000 words. The first
index has 27,740 terms in it, while the second index has 3,500 terms
in it. The authors of the first index claim that the authors of the
second plagarized their index, but it turns out the indexes are mostly
different, and only a few terms are similar. Can anyone calculate what
the random similarity would be, i.e., if we assume that there was no
plagarism and that index 1 (27740 terms) and index 2 (3500 terms) were
independently derived, what would be the probability that some of the
terms would still be identical if the text to which the indexes refer
is 80%-90% similar.
As I understand it, you wish to show a judge (or jury)
that plagiarism has taken place. I assume that you do not
care as much whether the technique you finally choose
is not the one you started with.
Maybe you should:
(1) Select indexes of similar character from third
(fourth, fifth, ... ) parties.
(2) Make up some sort of reasonable-looking measurement of
similarity, something like the percentage of random n-word
phrases selected from one index and found in the other.
(3) Find the mean and standard of the percentages from
step 2 between the suing author's work and earlier works.
These can form an estimate of how much overlap there /should/
be (since, presumably, the suing author did not plagiarize).
(4) One would hope (in order for this case to proceed) that
one would find the overlap between authors of indexes 1 and 2
to be several standard deviations from the mean percentage
found in step 3.
You can easily turn some number like "4.35 standard
deviations from the mean" into a probability that such
a number would occur without plagiarism -- /assuming/
that the distribution of percentages was Gaussian.
However, it would be hard to support that assumption
in a courtroom, I think.
The exact number is not important, though, /if/ you
have the data on your side. Maybe you could calculate
a variety of probabilities under the assumption of
different common types of distributions. Maybe you could
just graph a histogram of all the other data you took
with (presumably) the defendant's number off to
one side, clearly away from the rest.
The more third-party indexes the better. Measurements
between third-party authors would add to the appearance of
impartiality, I think.
Jim Burns
.
- References:
- Can any one help me calculate a statistical probability
- From: flame . dawn
- Can any one help me calculate a statistical probability
- Prev by Date: Re: show (I + P) is invertiable
- Next by Date: Re: show (I + P) is invertiable
- Previous by thread: Re: Can any one help me calculate a statistical probability
- Next by thread: Re: A consideration concerning the diagonal argument of G. Canto
- Index(es):
Relevant Pages
|
Loading