Python统计模块statistics用法精要

2016-07-19 董付国 Python小屋 Python小屋

1、mean()

计算平均值

>>> import statistics

>>> statistics.mean([1, 2, 3, 4, 5, 6, 7, 8, 9])

5.0

>>> statistics.mean(range(1,10))

5.0

>>> import fractions

>>> x = [(3, 7), (1, 21), (5, 3), (1, 3)]

>>> y = [fractions.Fraction(*item) for item in x]

>>> y

[Fraction(3, 7), Fraction(1, 21), Fraction(5, 3), Fraction(1, 3)]

>>> statistics.mean(y)

Fraction(13, 21)

>>> import decimal

>>> x = ('0.5', '0.75', '0.625', '0.375')

>>> y = map(decimal.Decimal, x)

>>> y

<map object at 0x00000000033465C0>

>>> list(y)

[Decimal('0.5'), Decimal('0.75'), Decimal('0.625'), Decimal('0.375')]

>>> statistics.mean(y)

Traceback (most recent call last):

  File "<pyshell#411>", line 1, in <module>

    statistics.mean(y)

  File "C:\Python 3.5\lib\statistics.py", line 292, in mean

    raise StatisticsError('mean requires at least one data point')

statistics.StatisticsError: mean requires at least one data point

>>> list(y)

[]

>>> y = map(decimal.Decimal, x)

>>> statistics.mean(y)

Decimal('0.5625')

2、median()、median_low()、median_high()、median_grouped()

各种中位数

>>> statistics.median([1, 3, 5, 7])

4.0

>>> statistics.median_low([1, 3, 5, 7])

3

>>> statistics.median_high([1, 3, 5, 7])

5

>>> statistics.median([1, 3, 7])

3

>>> statistics.median([5, 3, 7])

5

>>> statistics.median(range(1,10))

5

>>> statistics.median_low([5, 3, 7])

5

>>> statistics.median_high([5, 3, 7])

5

>>> statistics.median_grouped([5, 3, 7])

5.0

>>> statistics.median_grouped([5, 3, 7, 1])

4.5

>>> statistics.median_grouped([52, 52, 53, 54])

52.5

>>> statistics.median_low([52, 52, 53, 54])

52

>>> statistics.median_high([52, 52, 53, 54])

53

>>> statistics.median_high([1, 3, 3, 5, 7])

3

>>> statistics.median_low([1, 3, 3, 5, 7])

3

>>> statistics.median_grouped([1, 3, 3, 5, 7])

3.25

>>> statistics.median_grouped([1, 2, 2, 3, 4, 4, 4, 4, 4, 5])

3.7

>>> statistics.median_grouped([1, 2, 2, 3, 4, 4, 4, 4, 4, 5], interval=2)

3.4

3、mode()

返回最常见数据或出现次数最多的数据(most common data)

>>> statistics.mode([1, 3, 5, 7])

Traceback (most recent call last):

  File "<pyshell#435>", line 1, in <module>

    statistics.mode([1, 3, 5, 7])

  File "C:\Python 3.5\lib\statistics.py", line 434, in mode

    'no unique mode; found %d equally common values' % len(table)

statistics.StatisticsError: no unique mode; found 4 equally common values

>>> statistics.mode([1, 3, 5, 7, 3])

3

>>> statistics.mode([1, 3, 5, 7, 3, 5])

Traceback (most recent call last):

  File "<pyshell#437>", line 1, in <module>

    statistics.mode([1, 3, 5, 7, 3, 5])

  File "C:\Python 3.5\lib\statistics.py", line 434, in mode

    'no unique mode; found %d equally common values' % len(table)

statistics.StatisticsError: no unique mode; found 2 equally common values

>>> statistics.mode([1, 3, 5, 7, 3, 5, 5])

5

>>> statistics.mode(["red", "blue", "blue", "red", "green", "red", "red"])

'red'

>>> statistics.mode(list(range(5)) + [3])

3

4、pstdev()

返回总体标准差(population standard deviation ,the square root of the population variance)。

>>> statistics.pstdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75])

0.986893273527251

>>> statistics.pstdev(range(20))

5.766281297335398

>>> statistics.pstdev([1, 2, 3, 4, 5, 10, 9, 8, 7, 6])

2.8722813232690143

5、pvariance()

返回总体方差(population variance)或二次矩(second moment)。

>>> statistics.pvariance([1.5, 2.5, 2.5, 2.75, 3.25, 4.75])

0.9739583333333334

>>> statistics.pvariance([1, 2, 3, 4, 5, 10, 9, 8, 7, 6])

8.25

>>> x = [1, 2, 3, 4, 5, 10, 9, 8, 7, 6]

>>> mu = statistics.mean(x)

>>> mu

5.5

>>> statistics.pvariance([1, 2, 3, 4, 5, 10, 9, 8, 7, 6], mu)

8.25

>>> statistics.pvariance(range(20))

33.25

>>> statistics.pvariance((random.randint(1,10000) for i in range(30)))

10903549.933333334

6、variance()、stdev()

计算样本方差(sample variance)和样本标准差(sample standard deviation,the square root of the sample variance,也叫均方差)。

>>> statistics.variance((random.randint(1,10000) for i in range(30)))

10229013.655172413

>>> statistics.stdev((random.randint(1,10000) for i in range(30)))

3106.2902337180203

>>> _ * _ #注意,上面的两个样本数据并不一样,因为都是随机数

9649039.016091954

>>> statistics.variance(range(20))

35.0

>>> statistics.stdev(range(20))

5.916079783099616

>>> _ * _

35.0

>>> statistics.variance([1, 2, 3, 4, 5, 10, 9, 8, 7, 6])

9.166666666666666

>>> statistics.stdev([1, 2, 3, 4, 5, 10, 9, 8, 7, 6])

3.0276503540974917

>>> statistics.variance([1.5, 2.5, 2.5, 2.75, 3.25, 4.75])

1.16875

>>> statistics.stdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75])

1.0810874155219827

>>> _ * _

1.1687500000000002

>>> statistics.variance([3, 3, 3, 3, 3, 3])

0.0

>>> statistics.stdev([3, 3, 3, 3, 3, 3])

0.0