均值方差合并

均值方差合并

已知两组数据的个数,均值,方差,合并这两组数据,求合并数据后的均值方差

均值

$$
\mathrm{mean}=\sum_i^n{\frac{x_i}{n}}=\frac{x_1+x_2+…+x_n}{n}=\frac{\mathrm{sum_1}}{n}
$$

其中$\mathrm{sum_1}=(x_1+x_2+…+x_n)$

方差

$$
\begin{alignat}{2} \mathrm{Var} & = \frac{\sum_i^n{(x_i-\mathrm{mean})^2}}{n} \& = \frac{\sum_i^n{(x_i^2-2\mathrm{mean}x_i+\mathrm{mean}^2)}}{n} \& = \frac{\mathrm{sum_2}}{n}-2\mathrm{mean}\sum_i^n{\frac{x_i}{n}}+ \mathrm{mean}^2 \& = \frac{\mathrm{sum_2}}{n}-\mathrm{mean}^2\\end{alignat}
$$

其中$\mathrm{sum_2}=(x_1^2+x_2^2+…+x_n^2)$

合并均值方差

计两个数组$A=(x_1,x_2,…x_m)$, $B=(y_1,y_2,…y_n)$。A数组包含m个元素,均值为mean1,方差为Var1,B数组包含n个元素,均值为mean2,方差为Var2

则合并A,B数组后的均值为
$$
\begin{alignat}{2} \mathrm{mean_{merge}} & = \frac{\sum_i^m{x_i} +\sum_j^n{y_j}}{m+n} \& = \frac{m\mathrm{mean1} +n\mathrm{mean2}}{m+n}\\end{alignat}
$$
方差为
$$
\begin{alignat}{2} \mathrm{Var_{merge}} & = \frac{\sum_i^m{(x_i-\mathrm{mean_{merge}})^2+\sum_j^n{(y_j-\mathrm{mean_{merge}})^2}}}{m + n} \& = \frac{\sum_i^m{(x_i^2-2\mathrm{mean_{merge}}x_i+\mathrm{mean_{merge}}^2)+\sum_j^n{(y_j^2-2\mathrm{mean_{merge}}y_j+\mathrm{mean_{merge}}^2)}}}{m+n} \& = \frac{\mathrm{sum_2}}{m+n}-\mathrm{mean_{merge}}^2 \& = \frac{\mathrm{(Var_A+\mathrm{mean_A}^2})m +\mathrm{(Var_B+\mathrm{mean_B}^2})n}{m+n}-\mathrm{mean_{merge}}^2\\end{alignat}
$$
其中$\mathrm{sum_2}=(x_1^2+x_2^2+…+x_m^2 + y_1^2+y_2^2+…+y_n^2)$, 记$\mathrm{sum_A}=(x_1^2+x_2^2+…+x_m^2)=\mathrm{(Var_A+\mathrm{mean_A}^2})m$,$\mathrm{sum_B}=( y_1^2+y_2^2+…+y_n^2) = \mathrm{(Var_B+\mathrm{mean_B}^2})n$

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# -*-coding:utf-8 -*-
import math
import numpy as np

__author__ = 'Heroinlj <heroinlj@gmail.com>'


def merge_mean_var(n, mean1, stddev1, m, mean2, stddev2):
"""
已知两组数据的个数,均值和方差,求总数据的均值和方差
Args:
n: 第一组数据的个数
mean1: 第一组数据的均值
var1: 第一组数据的方差
m: 第二组数据的个数
mean2: 第二组数据的均值
var2: 第二组数据的方差

Returns:
所有数据的个数,均值,方差
"""
mean = (n * mean1 + m * mean2) / (m + n)
var = (n*(stddev1**2 + mean1**2) + m*(stddev2**2 + mean2**2))/(m+n) - mean**2
return m+n, mean, np.sqrt(var)


def get_mean_var(array):
mean = np.mean(array)
var = np.var(array)
return mean, var


def merge_test():
array = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
array1 = np.array([1, 3, 5, 7, 9])
array2 = np.array([2, 4, 6, 8])
mean, var = get_mean_var(array)
mean1, var1 = get_mean_var(array1)
mean2, var2 = get_mean_var(array2)
print(mean, var)
print(mean1, var1)
print(mean2, var2)
print(merge_mean_var(array1.size, mean1, var1, array2.size, mean2, var2))


if __name__ == '__main__':
mean1 = np.array([0.3877486679383603, 0.3927668330259108, 0.37973901869221405])
stddev1 = np.array([0.25303787237568615, 0.2553699318189926, 0.2548485603364443])
mean2 = np.array([0.42458302171523427, 0.42458302171523427, 0.42458302171523427])
stddev2 = np.array([0.2318353974716489, 0.2318353974716489, 0.2318353974716489])
# mean: (0.40397939, 0.40678635, 0.39949912)
# stddev: (0.24460695 0.24578618 0.24598417)
print(merge_mean_var(2304, mean1, stddev1, 1815, mean2, stddev2))