zipf’s law and heaps’ law

Heaps’ Law

Heaps’ law can also be applied in characterizing natural language processing, according to which the vocabulary size grows in a sublinear function with document size, say with , where t denotes the total number of words and N(t) is the number of distinct words.

Heap法则描述的是词的个数是整个文章长度的函数。即多次出现的词仅算一次的次数是所有单词(文档长度)的函数。

Zip’s Law

Ranking all the words in descending order of occurrence frequency and denoting by z(r) the frequency of the word with rank r, the Zipf’s law reads , where is the maximal frequency and is the so-called Zipf’s exponent.

Zip法则指出一个词出现的频率是它频率降序排名的倒数。 也代表频率,x代表排名。