Re: Minimum levenshtein distance for a set of words
- From: "weinzier" <l.weinzierl@xxxxxxxxx>
- Date: 31 May 2005 02:08:33 -0700
Keith Ramsay wrote:
> weinzier wrote:
> |What I am looking for is an algorithm which takes
> | - set of words
> | - costs for insert, delete, replace
> |and gives
> | - minimum levenshtein distance
> | - tells me how many words have this distance (one, many, infinite)
> | - set of all words which have the minimum distance if finite
>
> I'm guessing that by "minimum distance" you mean the
> minimum of the maximum distance from s to one of the
> given strings.
What I meant was the word which minimizes the sum of distances from s
to any of the words.
> |I am sure there is literature about this out there.
> |Again any hints welcome...
>
> The most famous reference, perhaps, is _Time Warps, String Edits,
> and Macromolecules: The Theory and Practice of Sequence Comparison_,
> David Sankoff and Joseph Kruskal eds.
Sounds promising, it is in my local bib, can't get hold of it before
weekend...
> The algorithm in there that perhaps comes closest to doing
> the job is one that would give you the string having the
> smallest *total* distance to the given strings.
Seems that this is *exactly* what I was looking for.
> It doesn't
> take much of a modification to get the string having the
> smallest weighted sum of distances to the given strings.
> There's a tweak for getting all the possible strings achieving
> the minimum, just as it's possible to generate all the
> sequences of insertions and deletions that achieve the minimum--
> although the number can of course increase quickly.
The weighted sum stuff I do not really need.
Does it work with arbitrary cost values for the levenshtein distance?
>
> The algorithm solves the problem for each set of strings
> that you can get by taking an initial segment of each of
> the original strings respectively. Consequently the time
> required grows quickly with the number of strings. I don't
> suppose that anybody knows of any terribly efficient
> algorithm known for the problems, although this is better
> than a more naive brute-force approach.
What do you mean by brute force?
Since the words I am looking for need not be in the inital set the only
brute force method I can think of is to check *any* word (up to a
certain length).
What would be much of interest is a way of knowing how many solutions
are there without doing the whole lot of computations.
Some rules (with proofs) like for a linear system of equations.
For example for a set A with two elements and cost values all the same
the solution is always A.
I suppose that under certain circumstances the solution is the longest
common subsequence.
.
- References:
- Minimum levenshtein distance for a set of words
- From: weinzier
- Re: Minimum levenshtein distance for a set of words
- From: David C . Ullrich
- Re: Minimum levenshtein distance for a set of words
- From: weinzier
- Re: Minimum levenshtein distance for a set of words
- From: Keith Ramsay
- Minimum levenshtein distance for a set of words
- Prev by Date: Re: What Went Wrong with New Math?
- Next by Date: limsup problem
- Previous by thread: Re: Minimum levenshtein distance for a set of words
- Next by thread: Re: I am a newly converted anti-Cantor kook!
- Index(es):
Relevant Pages
|