Re: Algorithm to convert unique sequences of letters,integers, and hyphens into integers under 2147483646



> In the event of a collision, a data record would be replaced
> by a lookup record and the item codes sharing the same hash
> key would be stored as data records with item keys of
> 2^8 * C + 1, 2^8 * C + 2, .. and the item codes stored
> in the lookup record at corresponding positions.

To diferentiate a collision from the case of receiving the same string
as before you would need to keep all strings encoded so far. Once you
go to that trouble (of having to keep all previously encoded strings
in order to guarantee that the same string will always yield the same
hash value), the simplest hash function is just a dictionary index of
the string. Your collision proposal could still be used as a part of
hashed search for this dictionary. Or one could use any other search
method (such as maintaining the strings in a tree and doing binary
search).

The above discussion would correspond to the problem in which the
strings are _not known in advance_. If the strings are known in advance
and additionally if one doesn't want to store the full list with the
encoder & decoder (otherwise one could simply use dictionary index as
the hash value), then any "perfect hash" algorithm (numerous sources
for that exist on the web) can be used to produce unique hash codes for
this known set of strings.

The original poster unfortunately didn't provide enough detail on the
problem requirements to obtain a definite answer.

.



Relevant Pages

  • Re: Maximum String size in Java?
    ... The hash function will *NOT* have the minimal collision ... > for long strings, so on average, SFH bakes it in the performance ...
    (comp.programming)
  • The certification password of Internet Explorer 7 and operation of auto complete
    ... About the certification password of Internet Explorer and operation ... By remembering the strings that are input in the following text ... In this registry, there are values whose name is a string of 42 bytes ... We cannot guess the original strings from the hash value, ...
    (Bugtraq)
  • Re: Maximum String size in Java?
    ... > for long strings, so on average, SFH bakes it in the performance ... >> distribution over the hash table size. ... > you need to be concerned about Unicode strings. ... construct a hash function that does appreciably better than the one ...
    (comp.programming)
  • Re: sort unique
    ... given that a hash table is not ... IMO if the vendor's algorithm does something "obvious", ... function to eliminate keys that hash to the same bucket per some ... strings of random lengths, and two strings are ...
    (comp.lang.lisp)
  • Re: Performance of hash_set vs. Java
    ... > I would like to convert the vector to store pointers to strings, ... >> reading code and hash table, so we can't really help there. ... and involve many object copies an memory allocations. ... This is the C++ input code I would start with. ...
    (comp.lang.cpp)