均值方差合并

#均值方差合并

均值

$$
\mathrm{mean}=\sum_i^n{\frac{x_i}{n}}=\frac{x_1+x_2+…+x_n}{n}=\frac{\mathrm{sum_1}}{n}
$$

其中$\mathrm{sum_1}=(x_1+x_2+…+x_n)$

方差

$$
\begin{alignat}{2} \mathrm{Var} & = \frac{\sum_i^n{(x_i-\mathrm{mean})^2}}{n} \& = \frac{\sum_i^n{(x_i^2-2\mathrm{mean}x_i+\mathrm{mean}^2)}}{n} \& = \frac{\mathrm{sum_2}}{n}-2\mathrm{mean}\sum_i^n{\frac{x_i}{n}}+ \mathrm{mean}^2 \& = \frac{\mathrm{sum_2}}{n}-\mathrm{mean}^2\\end{alignat}
$$

其中$\mathrm{sum_2}=(x_1^2+x_2^2+…+x_n^2)$

合并均值方差

计两个数组$A=(x_1,x_2,…x_m)$, $B=(y_1,y_2,…y_n)$。A数组包含m个元素,均值为mean1,方差为Var1,B数组包含n个元素,均值为mean2,方差为Var2

则合并A,B数组后的均值为
$$
\begin{alignat}{2} \mathrm{mean_{merge}} & = \frac{\sum_i^m{x_i} +\sum_j^n{y_j}}{m+n} \& = \frac{m\mathrm{mean1} +n\mathrm{mean2}}{m+n}\\end{alignat}
$$
方差为
$$
\begin{alignat}{2} \mathrm{Var_{merge}} & = \frac{\sum_i^m{(x_i-\mathrm{mean_{merge}})^2+\sum_j^n{(y_j-\mathrm{mean_{merge}})^2}}}{m + n} \& = \frac{\sum_i^m{(x_i^2-2\mathrm{mean_{merge}}x_i+\mathrm{mean_{merge}}^2)+\sum_j^n{(y_j^2-2\mathrm{mean_{merge}}y_j+\mathrm{mean_{merge}}^2)}}}{m+n} \& = \frac{\mathrm{sum_2}}{m+n}-\mathrm{mean_{merge}}^2 \& = \frac{\mathrm{(Var_A+\mathrm{mean_A}^2})m +\mathrm{(Var_B+\mathrm{mean_B}^2})n}{m+n}-\mathrm{mean_{merge}}^2\\end{alignat}
$$
其中$\mathrm{sum_2}=(x_1^2+x_2^2+…+x_m^2 + y_1^2+y_2^2+…+y_n^2)$, 记$\mathrm{sum_A}=(x_1^2+x_2^2+…+x_m^2)=\mathrm{(Var_A+\mathrm{mean_A}^2})m$,$\mathrm{sum_B}=( y_1^2+y_2^2+…+y_n^2) = \mathrm{(Var_B+\mathrm{mean_B}^2})n$

Python代码

code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
import math
import numpy as np


def merge_mean_var(n, mean1, var1, m, mean2, var2):
"""
已知两组数据的个数,均值和方差,求总数据的均值和方差
Args:
n: 第一组数据的个数
mean1: 第一组数据的均值
var1: 第一组数据的方差
m: 第二组数据的个数
mean2: 第二组数据的均值
var2: 第二组数据的方差

Returns:
所有数据的个数,均值,方差
"""
mean = (n * mean1 + m * mean2) / (m + n)
var = (n * (var1 + mean1**2) + m * (var2 + mean2**2))/(m + n) - mean**2
return m+n, mean, var


def get_mean_var(array):
mean = np.mean(array)
var = np.var(array)
return mean, var


if __name__ == '__main__':
array = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
array1 = np.array([1, 3, 5, 7, 9])
array2 = np.array([2, 4, 6, 8])
mean, var = get_mean_var(array)
mean1, var1 = get_mean_var(array1)
mean2, var2 = get_mean_var(array2)
print(mean, var)
print(mean1, var1)
print(mean2, var2)
print(merge_mean_var(array1.size, mean1, var1, array2.size, mean2, var2))

output

5.0 6.666666666666667
5.0 8.0
5.0 5.0
(9, 5.0, 6.666666666666668)