Re: Can any one help me calculate a statistical probability



flame.dawn@xxxxxxxxx wrote:
Here is the question. This concerns a claim of plagarism. There are
two indexes of a similar text numbering about 750,000 words. The first
index has 27,740 terms in it, while the second index has 3,500 terms
in it. The authors of the first index claim that the authors of the
second plagarized their index, but it turns out the indexes are mostly
different, and only a few terms are similar. Can anyone calculate what
the random similarity would be, i.e., if we assume that there was no
plagarism and that index 1 (27740 terms) and index 2 (3500 terms) were
independently derived, what would be the probability that some of the
terms would still be identical if the text to which the indexes refer
is 80%-90% similar.

As I understand it, you wish to show a judge (or jury)
that plagiarism has taken place. I assume that you do not
care as much whether the technique you finally choose
is not the one you started with.

Maybe you should:
(1) Select indexes of similar character from third
(fourth, fifth, ... ) parties.
(2) Make up some sort of reasonable-looking measurement of
similarity, something like the percentage of random n-word
phrases selected from one index and found in the other.
(3) Find the mean and standard of the percentages from
step 2 between the suing author's work and earlier works.
These can form an estimate of how much overlap there /should/
be (since, presumably, the suing author did not plagiarize).
(4) One would hope (in order for this case to proceed) that
one would find the overlap between authors of indexes 1 and 2
to be several standard deviations from the mean percentage
found in step 3.

You can easily turn some number like "4.35 standard
deviations from the mean" into a probability that such
a number would occur without plagiarism -- /assuming/
that the distribution of percentages was Gaussian.
However, it would be hard to support that assumption
in a courtroom, I think.

The exact number is not important, though, /if/ you
have the data on your side. Maybe you could calculate
a variety of probabilities under the assumption of
different common types of distributions. Maybe you could
just graph a histogram of all the other data you took
with (presumably) the defendant's number off to
one side, clearly away from the rest.

The more third-party indexes the better. Measurements
between third-party authors would add to the appearance of
impartiality, I think.

Jim Burns
.



Relevant Pages

  • Re: Gettier Problems (answers)
    ... potential damage made me jump even for a low probability event but my ... If the standard under dispute is invoked, ... A minimum condition on justification, then, will be ... > Given the equal probability that any one of the tickets will ...
    (sci.logic)
  • Re: Gettier Problems (answers)
    ... >> What verifies the verification of the verification? ... > potential damage made me jump even for a low probability event but my ... If the standard under dispute is invoked, ... A minimum condition on justification, then, will be ...
    (sci.logic)
  • Re: Can any one help me calculate a statistical probability
    ... first index has 27,740 terms in it, while the second index has 3,500 ... The authors of the first index claim that the authors ... It's difficult calculate the request probability because the aren't ... are there any other books on the same topic whose indices can be brought into the considerations? ...
    (sci.stat.math)
  • Re: probability question
    ... I have a question about probability. ... Think in terms of standard deviations and the mean. ... (X is not more than 1 standard deviation less from the mean). ... And you can then consult the normal distribution tables for an answer ...
    (sci.math)
  • Re: [OT] How do I post a letter and later prove that letter was posted?
    ... on balance of probability regardless of whether the allegation amounts ... standard of proof with respect to the whole case. ... prove his case on the balance of probabilities. ... relates to a criminal act. ...
    (uk.legal)

Loading