2022年字符串哈希算法可用 .pdf-得力文库

资源描述

《2022年字符串哈希算法可用 .pdf》由会员分享，可在线阅读，更多相关《2022年字符串哈希算法可用 .pdf（7页珍藏版）》请在得力文库 - 分享文档赚钱的网站上搜索。

1、经典字符串 Hash函数工作中经常需要用大hash这个强有力的工具，hash 表最核心的部分则在于怎么设计一个好的hash函数，以使数据更均匀地分布在若干个桶上。下面来介绍一下我现在用到的一个 hash 函数，我们来看代码：unsigned chostcachehash:get_host_key(const string& host) int result = 1; unsigned i = 0; for (i = 0; i host.size(); i+) result = 31 * result + hosti; return abs(result); inline unsigned

2、getkey(unsigned key) return (key % m_capacity); m_capacity为 hash 桶的个数对于一个字符串，我们首先调用get_host_key() 来得到一个key ，然后再用这个key 调用getkey 来得到他在hash 桶里的位置。这个是我们在工作一直使用的hash 函数，效果也还可以。今天忽然心血来潮，在网上搜了一下看还有没有更好的hash 函数，被我发现了这篇文章，于是转过来看看。php 中出现的字符串 hash 函数static unsigned long hashpjw(char *arkey， unsigned intn

3、keylength) unsigned long h = 0， g; char *arend=arkey+nkeylength; while (arkeyarend) h = (h 24); h = h g; return h; openssl 中出现的字符串 hash 函数unsigned long lh_strhash(char *str) int i，l; unsigned long ret=0; unsigned short *s; 名师资料总结 - - -精品资料欢迎下载 - - - - - - - - - - - - - - - - - - 名师精心整理 - - - - - -

4、- 第 1 页，共 7 页 - - - - - - - - - if (str = null) return(0); l=(strlen(str)+1)/2; s=(unsigned short *)str; for (i=0; i %有点看不懂ret=(si(i&0 x0f); return(ret); */ /* the following hash seems to work very well on normal text strings * no collisions on /usr/dict/words and it distributes on %2n quite * well

5、， not as good as md5， but still good. */ unsigned long lh_strhash(const char *c) unsigned long ret=0; long n; unsigned long v; int r; if (c = null) | (*c = post.abstract) return(ret); /* unsigned char b16; md5(c ， strlen(c) ，b); return(b0|(b18)|(b216)|(b32)v)&0 x0f; ret=(ret(32-r); ret&=0 xffffffffl

6、; ret=v*v; c+; return(ret16)ret); mysql 中出现的字符串 hash 函数#ifndefnew_hash_function 名师资料总结 - - -精品资料欢迎下载 - - - - - - - - - - - - - - - - - - 名师精心整理 - - - - - - - 第 2 页，共 7 页 - - - - - - - - - /* calchashvalue for a key */ static uintcalc_hashnr(const byte *key， uint length) register uint nr=1， nr2=4; w

7、hile (length-) nr= (nr & 63)+nr2)*(uint) (uchar) *key+)+ (nr 8); nr2+=3; return(uint) nr); /* calchashvalue for a key， case indepenently */ static uintcalc_hashnr_caseup(const byte *key，uint length) register uint nr=1， nr2=4; while (length-) nr= (nr & 63)+nr2)*(uint) (uchar) toupper(*key+)+ (nr 8);

8、nr2+=3; return(uint) nr); #else /* * fowler/noll/vo hash * * the basis of the hash algorithm was taken from an idea sent by email to the * ieeeposix p1003.2 mailing list from phongvo () and * glenn fowler (). landon curt noll () * later improved on their algorithm. * * the magic is in the interestin

9、g relationship between the special prime * 16777619 (224 + 403) and 232 and 28. * * this hash produces the fewest collisions of any function that weve seen so * far ， and works well on both numbers and strings. */ uintcalc_hashnr(const byte *key， uintlen) 名师资料总结 - - -精品资料欢迎下载 - - - - - - - - - - - -

10、 - - - - - - 名师精心整理 - - - - - - - 第 3 页，共 7 页 - - - - - - - - - const byte *end=key+len; uint hash; for (hash = 0; key end; key+) hash *= 16777619; hash = (uint) *(uchar*) key; return (hash); uintcalc_hashnr_caseup(const byte *key，uintlen) const byte *end=key+len; uint hash; for (hash = 0; key end;

11、key+) hash *= 16777619; hash = (uint) (uchar) toupper(*key); return (hash); #endif 名师资料总结 - - -精品资料欢迎下载 - - - - - - - - - - - - - - - - - - 名师精心整理 - - - - - - - 第 4 页，共 7 页 - - - - - - - - - 从上表可以看出，这些经典软件虽然构造字符串Hash 函数的方法不同，但是它们的效率都是不错的，相互之间差距很小，读者可以参考实际情况从其中借鉴使用。暴雪公司有个经典的字符串的hash 公式先提一个简单的问题，假如

12、有一个庞大的字符串数组，然后给你一个单独的字符串，让你从这个数组中查找是否有这个字符串并找到它，你会怎么做？有一个方法最简单，老老实实从头查到尾，一个一个比较，直到找到为止，我想只要学过程序设计的人都能把这样一个程序作出来，但要是有程序员把这样的程序交给用户，我只能用无语来评价，或许它真的能工作，但.也只能如此了。最合适的算法自然是使用HashTable（哈希表），先介绍介绍其中的基本知识，所谓Hash，一般是一个整数，通过某种算法，可以把一个字符串压缩成一个整数，这个数称为Hash，当然，无论如何，一个32 位整数是无法对应回一个字符串的，但在程序中，两个字符串计算出的 Hash值相

13、等的可能非常小，下面看看在MPQ 中的 Hash 算法unsigned long HashString(char *lpszFileName, unsigned long dwHashType) unsigned char *key = (unsigned char *)lpszFileName; unsigned long seed1 = 0 x7FED7FED, seed2 = 0 xEEEEEEEE; intch; while(*key != 0) ch = toupper(*key ); seed1 = cryptTable(dwHashType 8) ch (seed1 seed2)

14、; seed2 = ch seed1 seed2 (seed2 5) 3; return seed1; Blizzard 的这个算法是非常高效的，被称为 One-Way Hash ，举个例子，字符串unitneutralacritter.grp通过这个算法得到的结果是0 xA26067F3。是不是把第一个算法改进一下，改成逐个比较字符串的Hash 值就可以了呢，答案是，远远不够，要想得到最快的算法，就不能进行逐个的比较，通常是构造一个哈希表(Hash Table)来解决问题，哈希表是一个大数组，这个数组的容量根据程序的要求来定义，例如1024，每一个

15、 Hash 值通过取模运算(mod)对应到数组中的一个位置，这样，只要比较这个字符串的哈希值对应的位置又没有被占用，就可以得到最后的结果了，想想这是什么速度？是的，是最快的O(1)，现在仔细看看这个算法吧intGetHashTablePos(char *lpszString, SOMESTRUCTURE *lpTable, intnTableSize) 名师资料总结 - - -精品资料欢迎下载 - - - - - - - - - - - - - - - - - - 名师精心整理 - - - - - - - 第 5 页，共 7 页 - - - - - - - - - intnHash = Has

16、hString(lpszString), nHashPos = nHash % nTableSize; if (lpTablenHashPos.bExists& !strcmp(lpTablenHashPos.pString, lpszString) returnnHashPos; else return -1; /Error value 看到此，我想大家都在想一个很严重的问题：假如两个字符串在哈希表中对应的位置相同怎么办？ ,究竟一个数组容量是有限的，这种可能性很大。解决该问题的方法很多，我首先想到的就是用链表 ,感谢大学里学的数据结构教会了这个百试百灵的法宝，我碰到的很多算法都可以转化成

17、链表来解决，只要在哈希表的每个入口挂一个链表，保存所有对应的字符串就 OK了。事情到此似乎有了完美的结局，假如是把问题独自交给我解决，此时我可能就要开始定义数据结构然后写代码了。然而 Blizzard 的程序员使用的方法则是更精妙的方法。基本原理就是：他们在哈希表中不是用一个哈希值而是用三个哈希值来校验字符串。中国有句古话再一再二不能再三再四，看来 Blizzard 也深得此话的精髓，假如说两个不同的字符串经过一个哈希算法得到的入口点一致有可能，但用三个不同的哈希算法算出的入口点都一致，那几乎可以肯定是不可能的事了，这个几率是1:18889465931478580854784 ，大概是 1

18、0 的 22.3 次方分之一，对一个游戏程序来说足够安全了。现在再回到数据结构上，Blizzard 使用的哈希表没有使用链表，而采用顺延的方式来解决问题，看看这个算法：intGetHashTablePos(char *lpszString, MPQHASHTABLE *lpTable, intnTableSize) constint HASH_OFFSET = 0, HASH_A = 1, HASH_B = 2; intnHash = HashString(lpszString, HASH_OFFSET); intnHashA = HashString(lpszString, HASH_A)

19、; intnHashB = HashString(lpszString, HASH_B); intnHashStart = nHash % nTableSize, nHashPos = nHashStart; while (lpTablenHashPos.bExists) if (lpTablenHashPos.nHashA = nHashA&lpTablenHashPos.nHashB = nHashB) returnnHashPos; else nHashPos = (nHashPos 1) % nTableSize; if (nHashPos = nHashStart) break; 名

20、师资料总结 - - -精品资料欢迎下载 - - - - - - - - - - - - - - - - - - 名师精心整理 - - - - - - - 第 6 页，共 7 页 - - - - - - - - - return -1; /Error value 1. 计算出字符串的三个哈希值（一个用来确定位置，另外两个用来校验) 2. 察看哈希表中的这个位置3. 哈希表中这个位置为空吗？假如为空，则肯定该字符串不存在，返回4. 假如存在，则检查其他两个哈希值是否也匹配，假如匹配，则表示找到了该字符串，返回5. 移到下一个位置，假如已经越界，则表示没有找到，返回6. 看看是不是又回到了原来的位置，假如是，则返回没找到7. 回到 3 怎么样，很简单的算法吧，但确实是天才的idea, 其实最优秀的算法往往是简单有效的算法。名师资料总结 - - -精品资料欢迎下载 - - - - - - - - - - - - - - - - - - 名师精心整理 - - - - - - - 第 7 页，共 7 页 - - - - - - - - -

展开阅读全文