外文翻译hash算法大学论文.doc-得力文库

资源描述

《外文翻译hash算法大学论文.doc》由会员分享，可在线阅读，更多相关《外文翻译hash算法大学论文.doc（13页珍藏版）》请在得力文库 - 分享文档赚钱的网站上搜索。

1、广东工业大学本科毕业设计（论文）外文参考文献译文及原文系部计算机与艺术设计学部专业信息工程年级 2006级班级名称 06本信息工程3班学号学生姓名指导教师 2010 年 4 月 25 日目录HASH算法1通用Hash算法2Hash算法难度3HASH FUNCTIONS6Commonly Used Hash Functions7Difficulties with Hash Functions9Hash算法哈希算法（Hash functions）是密码学主要的一部分。这是我们加密人员与泛滥的破解技术抗争的主力，我们知道他们最不喜欢就是密码图形。一个hash算法提供了可变长度

2、的输入字符串和固定长度的结果。输入的很简便就是“hash”的意思，这个词不是人名的缩写。你可以用hash来输入数据，固定长度的字符串允许我们使用hash值来引用实际字符串本身。因为hash算法使用长的字符串，再变成一个短的。不可避免有2个字符串通过hash算法会得出一样的结果，这个在密码学中叫“碰撞”。举个你可以明白hash值的例子，假如Jon Callas和Jane Cannoy他们名字的hash值都是JC。碰撞是了解hash算法很重要的部分，我们将会在比特（bit）的单位上有更多的介绍。尽管缩写是一个很简单描述原文的方式，缩写造成了密码学目的的hash算法的错误。密码学的hash算法。有很

3、多用在加密技术中的属性。很难逆向运算hash算法。据hash知识，没有一个好的办法找到hash值对应的那个字符串。我们已经知道了hash算法会丢失数据，创造了一个简单的相对性。这个相同的性质也是名字缩写的：除了JC没其它的信息，不能找出我的名字，是JonCallas？是Jane Cannoy？还是？一个hash值，它应该很难确定一个本来的字符串。这个性质是缩写遗漏（initials lack）。看缩写的时候如果知道名字的匹配是很简单的。在密码学中，我们想找出源信息和这个结果之间的联系，他们之间的关系是尽可能不透明的。确定一个源字符串，我们根据这个字符串的hash值很难找出第二个字符串。很难有效

4、的改变字符串获得一个碰撞。也很难改变“我同意支付100美元”到“我同意支付500美元”而获得碰撞。注意这2个字符串之间只有1位不同。也很难找出碰撞的2个字符串的hash值。这个算法在很多不同的事情上给了我们灵活的想法，这里有一些例子：当你在PGP软件中输入密码的时候，我们使用hash算法来生成一个密钥。中间的过程就是hash算法，通常一遍遍的使用来降低破解者的暴力破解的风险。PGP软件的随机数生成器在传入数据后，会根据你键盘和鼠标的移动时时更新。这样使得观察者不确定这个值，也没有不变的随机数字。我们使用hash算法消除观察者的数据中的不均匀性。随机数生成器使用hash算法产生输出值。这个过程P

5、GP软件也做了。文件完整性算法，使用hash算法可以很快的检查文件。比如：你可以保留文件的hash列表在你的电脑上。hash数据库中的值也变了，你就看到计算机内的文件变化了。软件分布系统站点通常有分布的复杂密码系统使用hash算法创建数据完整性作为它的一个系统组件，我们稍后会了解这个。注意几乎所有算法现在都在被广泛使用，这有一个假设它们不会发生碰撞。如果2个密钥发生了hash碰撞，任何一个密钥都可以解密文件。如果2个软件包有相同的hash值时，一个肯定被误认为是另外一个。通用Hash算法表格1列出了一些hash算法的共同点，特别是PGP使用的。表格1：通用Hash算法名称大小（Bits）描述M

6、D5128MD5是hash系列算法中的最低标准，PGP软件在PGP5.0版本以前使用。MD5的脆弱性在1996年第一次出现。MD5是MD4的改进，PGP软件不再使用它的原因是它是第一个被破解的通用hash算法SHA-1160SHA-1是MD5的改进，由NIST设计，解决MD5的问题后被广泛使用。RIPE-MD/160160RIPE-MD/160是一个和SHA-1差不多的Hash算法。设计RIPE-MD/160为了改善超过MD5。它被Reseaux IPEuropens(RIPE)组织设计，而不是美国NIST我们认为它的安全性和SHA-1差不多。SHA-256256SHA-256是美国NIST最

7、新设计的新Hash算法。也属于“SH-2”的类型，它有和其它不同的内部结构，但和其它hash算法的基本结构都是一样的。SHA-512512512这是“SH-2”算法的一种，和SHA-256差不多。SHA-384384SHA-384有比SHA-512更小的输出。一般不常用SHA-384是因为除了大小以外没有任何优势，如果我们需要比SHA-256强度高的算法，我们会直接选SHA-512。同样SHA-224是SHA-256缩小版。Hash算法难度目前(2006年中期)，我们知道了hash算法系列在使用上并不是很完美，他们中的一些确实不完善。这个问题到2004年的夏天变的明朗了，中国山东大学王小云教授

8、宣布她和她的团队在一些hash算法中发现碰撞。这时RSA名字中的“S”的这个人Adi Shamir说，“上星期，我还认为hash算法是我们认为最好的部件。现在则认为它是我们的部件中是最差的。”在2005年初，王小云的攻击延伸到了第一次幸免的SHA-1。我们仍然在应对这个问题。他们中的所有都绕着hash碰撞，2个字符串生成了一样的hash值。一个数学的分支：组合数学（combinatorics）中的一个叫归档原理（Pigeonhole Principled）的公理。最简单的归档原理的解释是：如果你有13个鸽子而只有12个笼子，至少有一个里面装有2个鸽子。很显然的，不是吗？那就是为什么这是公理的原

9、因！如果你应用这个公理到hash算法，考虑16-bit的hash计算。再考虑整个16-bit的字符。依照归档原理，至少有2个字符串会有一样的hash。事实上，还有一大堆一样的例子。这个碰撞和鸽子汇集问题是一样的。如果这个碰撞是均匀分布的(这对hash算法来说也正确)，一个hash值那会有256个碰撞，然后根据归档原理，至少1个hash有至少有256个碰撞。找出一个碰撞应该和猜一样容易，但是有多难呢？回答这个问题又引发另外一个有趣的数学问题叫生日问题。在谈论区块大小的时候我们谈到过的。和Alice有相同生日的人的概率是1/365。但是如果你有一房子满满的人，和另外一个人生日发生碰撞的概率有多大？

10、特别的，有多少人机率均等也就是房间中有2个人的生日是一样的呢？这个问题的一般回答和找出hash碰撞的是一样的。我们认为生日是比另外一个名字缩写问题更好的hash问题，但远远不完美。尽管如此，生日是一个公平的任意分布的b。对于生日来说，原来生日c碰撞的机率是大约23个人中的偶数。通常，机率是偶数的大约是选项数字的平方根。我确定你注意到我使用了一个很含糊的词“大约”。这是因为答案不是准确，只是接近平方根。大概的说，碰撞的机率是:其中，Prob是机率的意思，只表示函数名，pigeons是鸽子，holes是笼子洞。Pigeons和holes都是输入变量的名字。省下你的数学运算。如果你解决了鸽子数目的问

11、题，结果的机率是一个洞的2个鸽子里面每一个都有的概率。可以算出约为，对于我们使用的那个问题来说，我们也可以认为等于。特别是当我们去处理一个非常大的数字的时候，这样去推测很方便，这个方法也是理论数学中被惯用的手法。所以，如果我们有一个n-bit的hash算法，如果我们有个字符串的碰撞的机率相等。也就是说，160-bit的hash运算只有80-bit的安全性。是很大的一个数字。大约2倍的阿伏伽德罗常数。阿伏伽德罗常数d是摩尔体积的分子数，或者用一个方便的东西表示，就是一汤勺水中水分子的数目。那是个很大的数。王小云带着报告参加了2004年密码界峰会，她震撼了密码界。她没有用一张纸来展示如何碰撞，

12、她仅仅只用了他们中的一部分。就像你看到的，因为碰撞很难发现。仅仅有128-bit的hash算法中的一部分中有碰撞，也就意味着碰撞已经出现了。对于密码分析学家的主要问题是“她知道我们不能够做什么？”6和月以后，她的技术扩展到攻击质数的160-bit的hash算法。这就是我们在最后2年所总结的：王小云是最优秀的密码分析专家。她有着其它数学家没有的基础数学洞察力；她非常迅速的成为世界上为数不多的、最优秀的hash算法密码分析专家。一些其它的理论工作不是去进行应用实际，而是更多的思考。有很多议案关于如何修改剩下的算法来抵抗王小云的攻击。他们都非常棒，但是一个明显的问题是，“明年有什么攻击，这个修正可以

13、解决吗？”当然，这个问题是不能回答的。我们不可能反对未知的攻击来保护我们的算法。无论如何，其中的很多议案确实是解决的好办法。一个简单的技术诸如当进行hash运算时使用每双字节(用AABBCC来代替ABC)，或者插入0比特在每4个字节后面，或者添加随机数据在准备hash运算的数据之前，用这些办法解决了已知的问题。我们开始考虑一个如何设计一个好的hash算法的想法。在2005年10月，NIST主持了一个关于hash算法的工作组。密码专家开始考虑想出一个如何设计一个好的hash算法的想法。第2个工作组在2006年8月开始计划。同样也有像AES相似的竞争方式来产生一个新的hash算法。工程师的观点中也

14、有一些好的想法。在PGP团队中，我们已经发扬了首创精神。在PGP团队，我们开始转移MD5到1997年的水平。PGP5.0开始从MD5向SHA-1发展，保持MD5的唯一目的是为了向后兼容性。PGP8.0.3介绍了这个技术支持，也可以在阅读中找到，但是没有SHA-256、SHA-384和SHA-512的算法。PGP9.0开始从SHA-1向SHA-256发展。Hash Functions Hash functions are an important part of cryptography. They are the workhorses that we cryptographers use an

15、d abuse for all sorts of things, and yet we understand them least of all the cryptographic primitives. A hash function takes a variable-length input string and creates a fixed-length output. That “hash” of the input is a shortcut, not unlike a persons initials. You can refer to the input string by i

16、ts hash. The fact that it is a fixed-length string allows us to easily use the hash value as a referrer to the actual string itself. Because a hash function takes a long string and reduces it to a short one, it is inevitable that there will be two strings that hash to the same value, or collide in c

17、ryptographer-speak. For example, the names Jon Callas and Jane Cannoy collide with initials to the hash of JC. Collisions are important in the understanding of hash functions, and well talk more about them in a bit. Although initials are an easy way to describe the basic concept, initials make a bad

18、 hash function for cryptographic purposes. A cryptographic hash function has a number of other properties that make it useful cryptographically. It should be hard to reverse a hash function. Knowing the hash, there should be no good way to find the input string that generated it. Given that (typical

19、ly) hash functions lose data, this is a relatively easy property to create. The same property is also true for initials: knowing JC and nothing else, there is no good way to get to my name. Given a hash value, it should be hard to identify a possible source string. This property is one that initials

20、 lack. It is very easy to look at a set of initials and know if a name matches it. With a cryptographic hash function, however, we want the relationship between a source and a result to be as opaque as possible. Given one source string, it should be hard to find a second string that collides with it

21、s hash. It should be especially hard to change a string usefully and get a collision. In an extreme case, it should be hard to change “I agree to pay $100” to “I agree to pay $500” and have that collide. Note that the difference between the two strings is only a single bit. It should also be hard to

22、 find two strings that collide in their hash values. These requirements give us very flexible functions that are used for lots of different things. Here are some examples: Random number generators themselves often use hash functions to produce their output. The one in PGP software does. File integri

23、ty systems use hash functions as quick checks on the files. For example, you can keep a list of the hashes of the files on your computer, and you can see if that file has changed by comparing the hash of the file on disk to the one in the database. Software distribution sites also often list the has

24、h value of the distributed file so that people who want to see if they have the right file can compute and compare hashes. Complex cryptographic systems that create data integrity use hash functions as a component. Well talk more about them later. Note that for almost all of these uses, theres an as

25、sumption that there wont be collisions. If two passphrases collide in their hash, either can decrypt a file. If two software packages hash to the same value, then one can be mistaken for the other. Commonly Used Hash Functions Table 1 lists some commonly used hash functions, especially the ones we p

26、resently use in PGP software. NameSize DescriptionMD5128bitsMD5 was the sole hash function that PGP software used prior to PGP 5.0. Weaknesses in MD5 rst showed up in 1996. MD5 is itself an improvement on MD4, which was never used in PGP software and was the rst common hash function to be fully brok

27、en.SHA-1160bitsSHA-1 appeared in PGP 5.0, and also in OpenPGP. SHA-1 is an improvement on MD5 that was created by NIST to be wider and also to correct problems in MD5.RIPE-MD/160160bitsRIPE-MD/160 is a hash function similar to SHA-1. RIPE-MD/160 was created to be an improvement over MD5. However, it

28、 was created by the European Rseaux IP Europens (RIPE) organization rather than the US NIST. We expect it has similar security characteristics to SHA-1.SHA-256256bitsSHA-256 is one of a new family of hashes created by the US NIST that are collectively called the “SHA-2” family. It has dierent intern

29、al structure, but comes from the same basic construction as the other hash functions in this table.SHA-512512bitsThis is another member of the “SHA-2” family, along with SHA-256SHA-384384bitsSHA-384 is a variant of SHA-512 that has a smaller output. In general, SHA-384 is not used, because it has no

30、 advantages over SHA-512 except for the hash size. It runs at the same speed as SHA-512, so usually if we need something stronger than SHA-256, we go directly to SHA-512. There is also a SHA-224 which is a similar truncation of SHA-256.Table1 Commonly Used Hash FunctionsDifficulties with Hash Functi

31、ons Presently (mid-2006), we know the suite of hash functions we have been using is not perfect, and some of them are quite imperfect. These problems came to light in the summer of 2004 when Xiaoyung Wang announced that she and her team produced collisions in a number of hash functions WANG04. Adi S

32、hamir, the “S” in the RSA algorithm, said at the time, “Last week, I thought that hash functions were the component we understood best. Now I see that they are the component we understand least.” In early 2005, Xiaoyungs attacks were extended to SHA-1, which had survived her first work WANG05. We ar

33、e still coping with these problems, all of which revolve around hash function collisions, two strings producing the same hash value. One of the axioms of the branch of mathematics called combinatorics is called the Pigeonhole Principle. At its simplest, the Pigeonhole Principle states that if you ha

34、ve thirteen pigeons and only twelve pigeonholes, then at least one hole must contain at least two pigeons. Pretty obvious, isnt it? Thats why its an axiom. If you apply this principle to hash functions, consider sixteen-byte hashes. Also consider the entire set of seventeen-byte strings. According t

35、o the Pigeonhole Principle, there are going to be at least two original strings that produce the same sixteen-byte hash. As a matter of fact, there have to be a whole lot of them. The collisions are the equivalent of pigeons lumping themselves together. If the collisions are evenly distributed (and

36、thus the hash function perfect), then there will be 256 collisions per hash value, and according to the Pigeonhole Principle, there has to be at least one hash with at least 256 collisions. Finding a collision ought to be no better than guessing, but how hard is that? Answering that question raises

37、another interesting mathematical problem called The Birthday Problem, which we first saw when talking about block sizes. The probability that a given person has the same birthday as Alice is about 1/365 11. But if you have a room full of people, what is the chance that there will be a collision on t

38、heir birthday? Specifically, with how many people are there even odds that there will be two people in the room who share the same birthday? The general case answer to this question is the same as finding collisions in a hash function. We can think of a birthday as yet another hash function with per

39、haps better properties than initials, but still nowhere near perfect. Nonetheless, birthdays are fairly randomly distributed12. For birthdays, it turns out that the odds of a birthday collision are even at about 23 people. In the general case, the odds are even at about the square root of the number

40、 of options. Im sure you noticed my use of the weasel-word “about.” This is because the answer isnt exactly the square root, but close to it. In the general case, the chance of collisions isSo, if we have an n-bit hash function, there are even odds of a collision when we have strings weve hashed. Th

41、us, we say that a 160-bit hash function should have 80 bits of security. 280 is a very large number. Its about twice Avogadros Number, which is the number of molecules in a mole, or to put it in convenient terms, the number of molecules in a rounded tablespoon of water. Its a big number. When Wang c

42、ame to Crypto 2004 with WANG04 in hand, it shook cryptographers up. She didnt have a paper that showed how to create collisions, she merely had a lot of them. As you can see, because collisions are supposed to be hard to find, merely possessing a handful of collisions on each of a handful of 128-bit

43、 hash functions means that something is up. For cryptographers, the main question was, “What does she know that we dont?” Six months later, her techniques were extended to attack the prime 160-bit hash function. Here is a summary of what weve learned in the last two years: Wang is an excellent crypt

44、analyst. She doesnt have any fundamental mathematical insights that other mathematicians dont have; shes merely the worlds best hash function cryptanalyst by leaps and bounds. Some other theoretical work that wasnt particularly practical is getting a lot of thought. For example, a few months before

45、WANG04, John Kelsey and Bruce Schneier showed in KSHASH04 that when looking for a SHA-1 collision of a given string, you could do it in 2106work instead of 2160 but you need to have messages 260 long to be able to do so. Before Wang showed flaws in how we were doing things, this was interesting but

46、not practical. Now some of us wonder if this impractical flaw is an indication of a structural problem. We dont know, yet. There are a number of proposals on how to modify existing functions to withstand Wangs attacks. Theyre all very good, but the obvious follow-on question is, “What new attack wil

47、l happen next year, that this fix doesnt account for?” Of course, this question is unanswerable. We just cant protect against unknown attacks. However, a number of these proposed solutions are practical to implement. Simple techniques like doubling every byte as you hash (instead of hashing ABC, hash AABBCC) or inserting a zero byte after every four bytes SY05 or adding in some random data at the front of the data to be hashed protect against these known problems. Were starting to get an idea of how to design hash functions better. In October 2005, NIST hosted a workshop on hash funct

展开阅读全文