Statistics
Mean, Median, Mode
- Mean (
) - Average of all data - Median - Middle number in the data
- Mode - Most frequent number
Variance ( or )
Description
Measure relative amount of 'scattering' of data.
Formula
Explanation
- Variance is (standard deviation) - Loop through all elements, element[i] - Mean
- Divided by number of elements
Population variance ( ) vs Sample variance ( )
There are 2 types of variance - Population and sample
- Population: Use all data points to calculate it
- Sample: Have an unbiased pick of a few data points and calculate variance using that
So, population variance will always be more accurate as it uses all the data points.
Why use sample then?
Might not have all data points so use sample instead.
For example, to test the battery of a new phone model. It's impossible to test every phone (as millions exists).
So test 50 randomly selected phones. As the sample count increases, it'll get closer to the true value.
Formula for sample deviation
Difference:
- For the symbol, use
instead of - Use
instead of - Divide by
instead of . Wikipedia, Bessel's correction
Standard Deviation
Standard deviation (
Uses
Finding probability given mean and standard deviation
NOTE: This is only for normal distribution (Bell curve) (I think)
Common values, from the video (Timestamp: 0:33)
--- Math terms
- $P(\bar{x}-\ \sigma < \text{value}<\bar{x}+\ \sigma)$ = 68%
- $P(\bar{x}-2\sigma < \text{value}<\bar{x}+2\sigma)$ = 95%
- $P(\bar{x}-3\sigma < \text{value}<\bar{x}+3\sigma)$ = 99.75%
--- Simple terms
Probability of a value between $\pm c\times \sigma$ from the mean.
When $c$ = ...
- $c=1$, $P=68\%$
- $c=2$, $P=95\%$
- $c=3$, $P=99.75\%$