Python计算信息熵

2016-09-23 董付国 Python小屋 Python小屋

信息熵可以用来判定指定信源发出的信息的不确定性,信息越是杂乱无章毫无规律,信息熵就越大。如果某信源总是发出完全一样的信息,那么熵为0,也就是说信息是完全可以确定的。

本文要点在于演示Python字典和内置函数的用法。


from math import log

from random import randint


def informationEntropy(lst):

    #数据总个数

    num = len(lst)

    #每个数据出现的次数

    numberofNoRepeat = dict()

    for data in lst:

        numberofNoRepeat[data] = numberofNoRepeat.get(data,0) + 1

    #打印各数据出现次数,以便核对

    print(numberofNoRepeat)

    #返回信息熵,其中x/num为每个数据出现的频率

    return abs(sum(map(lambda x: x/num * log(x/num,2), numberofNoRepeat.values())))


#功能测试

for i in range(10):

    lst = [randint(1,5) for i in range(randint(5,30))]

    print('Entropy:', informationEntropy(lst))

    print('='*20)

    

print('Entropy:', informationEntropy([1,1,1,1,1,1]))


某次运行结果为:

{1: 4, 2: 3, 3: 9, 4: 3, 5: 8}

Entropy: 2.1608467607817

====================

{1: 3, 2: 1, 3: 5, 4: 2, 5: 7}

Entropy: 2.057924310831006

====================

{1: 5, 2: 3, 3: 2, 4: 1, 5: 2}

Entropy: 2.1339375660949167

====================

{1: 1, 3: 3, 4: 3, 5: 1}

Entropy: 1.8112781244591327

====================

{1: 3, 2: 4, 3: 1, 4: 3, 5: 2}

Entropy: 2.199687794731328

====================

{1: 1, 2: 2, 3: 5, 4: 3, 5: 3}

Entropy: 2.155968102145908

====================

{1: 1, 3: 2, 4: 2, 5: 1}

Entropy: 1.9182958340544893

====================

{1: 1, 2: 2, 4: 2, 5: 1}

Entropy: 1.9182958340544893

====================

{1: 8, 2: 4, 3: 6, 4: 5, 5: 6}

Entropy: 2.284560633641686

====================

{2: 3, 3: 1, 4: 2, 5: 2}

Entropy: 1.9056390622295662

====================

{1: 6}

Entropy: 0.0