Re: Computerised authorship attribution
- From: Matt B <mattb333@xxxxxxxxxxxxx>
- Date: Fri, 27 Jan 2006 18:40:48 GMT
On 26 Jan 2006 11:16:52 -0800, "Ross Clement (Email address invalid -
do not use)" <clemenr@xxxxxxxxxx> wrote:
<snip>
>I tried using a heuristic based on information theory to select words
>for use as dimensions. It didn't improve attribution accuracy. That
>doesn't mean that your experiment won't give better results than
>choosing the most frequent words.
Well, I'll give it a try and see what happens.
I think, as John commented, the choice of variables may be the most
important factor here. I think in reducing the words in this way, it
will help to distinguish authors more soundly than the standard
K-Nearest Neighbour technique.
I think this will aid correct classification - what do you think?
>
>If you're doing experiments like this please make sure that you
>understand what the key words "overfitting" and "cross-validation"
>mean.
Sure will. Like I say, I'm not a statistician in any way; I'm just a
Software Engineering student, so all of these techniques are foreign
waters for me. But I will endeavour to grasp as much as I can.
The calculations I included in my previous post - did this look like
correct use of the K-Nearest Neighbour? I just need to check before I
start to code the thing.
Matt
.
- References:
- Computerised authorship attribution
- From: Matt B
- Re: Computerised authorship attribution
- From: Ross Clement (Email address invalid - do not use)
- Re: Computerised authorship attribution
- From: Matt B
- Re: Computerised authorship attribution
- From: Ross Clement (Email address invalid - do not use)
- Re: Computerised authorship attribution
- From: Matt B
- Re: Computerised authorship attribution
- From: Ross Clement (Email address invalid - do not use)
- Computerised authorship attribution
- Prev by Date: Re: Computerised authorship attribution
- Next by Date: Re: Computerised authorship attribution
- Previous by thread: Re: Computerised authorship attribution
- Next by thread: Re: Computerised authorship attribution
- Index(es):