Difference between revisions of "Statistical formulas documentation"

From mtab wikisupport
Jump to: navigation, search
 
(34 intermediate revisions by the same user not shown)
Line 11: Line 11:
  
  
{|  
+
<nomathjax>{|
 
|-
 
|-
| width="225"|Approximate Annual HH Income || colspan="2"|Weighted<br/>Response <span style="color:red">'''(1) Accumulated Response'''</span> ||
+
|width="225"|Approximate Annual HH Income || colspan="2"|Weighted<br/>Response <span style="color:red">'''(1) Accumulated Response'''</span> ||
 
|-
 
|-
| Less than $15,000 || width="150"|11,714 || 11,714
+
|Less than &#36;15,000 || width="150"|11,714 || 11,714
 
|-
 
|-
| $15,000 - $24,999 || 46,054 || 57, 768
+
|&#36;15,000 - &#36;24,999 || 46,054 || 57, 768
 
|-
 
|-
| $25,000 - $34,999 || 83,965 || 141,733
+
|&#36;25,000 - &#36;34,999 || 83,965 || 141,733
 
|-
 
|-
| $35,000 - $44,999 || 102,093 || 243,826
+
|&#36;35,000 - &#36;44,999 || 102,093 || 243,826
 
|-
 
|-
| $45,000 - $59,999 || 155,721 || '''399,546'''
+
|&#36;45,000 - &#36;59,999 || 155,721 || '''399,546'''
 
|-
 
|-
| $60,000 - $74,999 || 161,435 || <span style="color:green">'''560,981'''</span> <--''Median will fall here'' <span style="color:red">(3)</span>
+
|&#36;60,000 - &#36;74,999 || 161,435 ||<span style="color:green">'''560,981'''</span> <--''Median will fall here'' <span style="color:red">(3)</span>
 
|-
 
|-
| $75,000 - $99,999 || 193,540 || 754,521  
+
|&#36;75,000 - &#36;99,999 || 193,540 || 754,521  
 
|-
 
|-
| $100,000 - $124,999 || 134,706 || 889,227
+
|&#36;100,000 - &#36;124,999 || 134,706 || 889,227
 
|-
 
|-
| $125,000 - $149,999 || 59,748 || 948,975
+
|&#36;125,000 - &#36;149,999 || 59,748 || 948,975
 
|-
 
|-
| $150,000 - $199,999 || 41,971 || 990,946
+
|&#36;150,000 - &#36;199,999 || 41,971 || 990,946
 
|-
 
|-
| $200,000 - $249,999 || 16,391 || 1,007,337
+
|&#36;200,000 - &#36;249,999 || 16,391 || 1,007,337
 
|-
 
|-
| $250,000 or More || 32,409 || <span style="color:blue">'''1,039,746'''</span>
+
|&#36;250,000 or More || 32,409 || <span style="color:blue">'''1,039,746'''</span>
 
|-
 
|-
| Weighted Subset Total Count || <span style="color:blue">'''1,039,746'''</span>
+
|Weighted Subset Total Count || <span style="color:blue">'''1,039,746'''</span>
 
|-
 
|-
| Weighted Sample Total Count || 1,255,411
+
|Weighted Sample Total Count || 1,255,411
|}
+
|}</nomathjax>
  
  
{|
+
<nomathjax>{|
 
|-
 
|-
 
| <span style="color:red">(1)</span> || width="550"|<span style="color:red">'''Calculated Accumulated Weighted Response'''</span>
 
| <span style="color:red">(1)</span> || width="550"|<span style="color:red">'''Calculated Accumulated Weighted Response'''</span>
Line 53: Line 53:
 
| <span style="color:red">(3)</span> || Find first value in Accumulated Response column that is greater than step 2 value  
 
| <span style="color:red">(3)</span> || Find first value in Accumulated Response column that is greater than step 2 value  
 
|-
 
|-
| || The ''median will fall between the $60,000-$74,999'' bracket  
+
| || The ''median will fall between the &#36;60,000-&#36;74,999'' bracket  
 
|-
 
|-
 
| <span style="color:red">(4)</span> || Step 2 amount (<span style="color:red">519,873</span>) MINUS preceding break accumulated response '''399,546''' = || 120,327
 
| <span style="color:red">(4)</span> || Step 2 amount (<span style="color:red">519,873</span>) MINUS preceding break accumulated response '''399,546''' = || 120,327
 
|-
 
|-
| <span style="color:red">(5)</span> || Acc. Response where Meidan will fall <span style="color:green">560,981</span> MINUS preceding break '''399,546''' = || 161,435
+
| <span style="color:red">(5)</span> || Acc. Response where Median will fall <span style="color:green">560,981</span> MINUS preceding break '''399,546''' = || 161,435
 
|-
 
|-
 
| <span style="color:red">(6)</span> || Step 4 Divided by Step 5 || 0.74536
 
| <span style="color:red">(6)</span> || Step 4 Divided by Step 5 || 0.74536
 
|-
 
|-
| <span style="color:red">(7)</span> || Multiply Step 6 by the range 14,999 ($60,000-$75,999) || 11180
+
| <span style="color:red">(7)</span> || Multiply Step 6 by the range 14,999 (&#36;60,000-&#36;75,999) || 11180
 
|-
 
|-
| <span style="color:red">(8)</span> || Add Step 7 to bottom of range $60,000 || '''71,180'''
+
| <span style="color:red">(8)</span> || Add Step 7 to bottom of range &#36;60,000 || '''71,180'''
|}
+
|}</nomathjax>
  
  
Line 79: Line 79:
  
  
{|  
+
<nomathjax>{|  
 
|-
 
|-
 
| || <span style="color:blue">(A)</span> || || || <span style="color:blue">(B)</span> || <span style="color:blue">(C)</span>
 
| || <span style="color:blue">(A)</span> || || || <span style="color:blue">(B)</span> || <span style="color:blue">(C)</span>
Line 85: Line 85:
 
| width="225"|Approximate Annual HH Income || || STAT1 || STAT2 || Midpoint
 
| width="225"|Approximate Annual HH Income || || STAT1 || STAT2 || Midpoint
 
|-
 
|-
| Less than $15,000 || width="150"|11,714 || width="150"|1 || width="150"|14,999 || width="150"|7,500 || 87,857,249
+
| Less than &#36;15,000 || width="150"|11,714 || width="150"|1 || width="150"|14,999 || width="150"|7,500 || 87,857,249
 
|-
 
|-
| $15,000 - $24,999 || 46,054 || 15,000 || 24,999 || 20,000 || 921,059,004
+
| &#36;15,000 - &#36;24,999 || 46,054 || 15,000 || 24,999 || 20,000 || 921,059,004
 
|-
 
|-
| $25,000 - $34,999 || 83,965 || 25,000 || 34,999 || 30,000 || 2,518,899,346
+
| &#36;25,000 - &#36;34,999 || 83,965 || 25,000 || 34,999 || 30,000 || 2,518,899,346
 
|-
 
|-
| $35,000 - $44,999 || 102,093 || 35,000 || 44,999 || 40,000 || 4,083,654,266
+
| &#36;35,000 - &#36;44,999 || 102,093 || 35,000 || 44,999 || 40,000 || 4,083,654,266
 
|-
 
|-
| $45,000 - $59,999 || 155,721 || 45,000 || 59,999 || 52,500 || 8,175,254,132
+
| &#36;45,000 - &#36;59,999 || 155,721 || 45,000 || 59,999 || 52,500 || 8,175,254,132
 
|-
 
|-
| $60,000 - $74,999 || 161,435 || 60,000 || 74,999 || 67,500 || 10,896,752,251
+
| &#36;60,000 - &#36;74,999 || 161,435 || 60,000 || 74,999 || 67,500 || 10,896,752,251
 
|-
 
|-
| $75,000 - $99,999 || 193,540 || 75,000 || 99,999 || 87,500 || 16,934,669,636
+
| &#36;75,000 - &#36;99,999 || 193,540 || 75,000 || 99,999 || 87,500 || 16,934,669,636
 
|-
 
|-
| $100,000 - $124,999 || 134,706 || 100,000 || 124,999 || 112,500 || 15,154,345,342
+
| &#36;100,000 - &#36;124,999 || 134,706 || 100,000 || 124,999 || 112,500 || 15,154,345,342
 
|-
 
|-
| $125,000 - $149,999 || 59,748 || 125,000 || 149,999 || 137,500 || 8,125,258,359
+
| &#36;125,000 - &#36;149,999 || 59,748 || 125,000 || 149,999 || 137,500 || 8,125,258,359
 
|-
 
|-
| $150,000 - $199,999 || 41,971 || 150,000 || 199,999 || 175,000 || 7,344,910,850
+
| &#36;150,000 - &#36;199,999 || 41,971 || 150,000 || 199,999 || 175,000 || 7,344,910,850
 
|-
 
|-
| $200,000 - $249,999 || 16,391 || 200,000 || 249,000 || 225,000 || 3,688,025,252
+
| &#36;200,000 - &#36;249,999 || 16,391 || 200,000 || 249,000 || 225,000 || 3,688,025,252
 
|-
 
|-
| $250,000 or More || 32,409 || 250,000 || 300,000 || 275,000 || 8,912,452,979
+
| &#36;250,000 or More || 32,409 || 250,000 || 300,000 || 275,000 || 8,912,452,979
 
|-
 
|-
 
| Weighted Subset Total Count || '''1,039,746''' || || || || '''86,933,138,666'''
 
| Weighted Subset Total Count || '''1,039,746''' || || || || '''86,933,138,666'''
 
|-
 
|-
 
| Weighted Sample Total Count || 1,255,411
 
| Weighted Sample Total Count || 1,255,411
|}
+
|}</nomathjax>
  
  
Line 126: Line 126:
 
For categorized questions, each response is assigned 1 or 2 stat weights. If a single weight is assigned, then this is the value used to calculate the standard deviation. If 2 weights are provided, the midpoint is used.
 
For categorized questions, each response is assigned 1 or 2 stat weights. If a single weight is assigned, then this is the value used to calculate the standard deviation. If 2 weights are provided, the midpoint is used.
  
 +
D = Question Mean - Stat Value as described above<br/>
 +
SS = Sum of Squares, D*D*Weighted Response Count, for all table responses<br/>
 +
Sample = Sum of all Weighted Response Counts for all table responses
  
D = Question Mean - Stat Value as described above
+
Standard Deviation = SQRT(SS/Sample-1));
  
SS = Sum of Squares, D*D*Weighted Response Count, for all table responses
 
  
Sample = Sum of all Weighted Response Counts for all table responses
+
The calculation is the same for continuous variables except the actual data values are used instead of stat weight.
  
 +
D = Question Mean - Response Value<br/>
 +
SS = Sum of Squares, D*D*Respondent Weight Count for each response<br/>
 +
Sample = Sum of all Respondent Weights for each response
  
 
Standard Deviation = SQRT(SS/Sample-1));
 
Standard Deviation = SQRT(SS/Sample-1));
  
  
The calculation is the same for continuous variables except the actual data values are used instead of stat weight.
+
'''Comparison of two population means using T-Statistic
  
 +
<u>'''When the two populations have equal variances'''</u>
  
D = Question Mean - Response Value
+
<math>t = \dfrac{m_1-m_2}{\sqrt{\dfrac{(n_1-1)s_1^2+(n_2-1)s_2^2}{n_1+n_2-2}(\dfrac{1}{n_1}+\dfrac{1}{n_2})}}</math>
  
SS = Sum of Squares, D*D*Respondent Weight Count for each response
+
Where<br/>
 +
m1 = Mean of the 1st sample<br/>
 +
s1 = Standard Deviation of the 1st sample<br/>
 +
n1 = Un-weighted Sample of the 1st sample
  
Sample = Sum of all Respondent Weights for each response
+
m2 = Mean of the 2nd sample<br/>
 +
s2 = Standard Deviation of the 2nd sample<br/>
 +
n2 = Un-weighted Sample of the 2nd sample
  
 +
Decision rules:<br/>
 +
If |t|<1.65 then the two populations are NOT significantly different at 90%<br/>
 +
If |t|&ge;1.65 then the two populations ARE significantly different at 90%<br/>
 +
If |t|<1.95 then the two populations are NOT significantly different at 95%<br/>
 +
If |t|&ge;1.95 then the two populations ARE significantly different at 95%
  
Standard Deviation = SQRT(SS/Sample-1));
 
  
 +
<u>'''When the two populations have UNEQUAL variances'''</u>
  
'''Comparison of two population means using T-Statistic
+
<math>t = \dfrac{m_1-m_2}{\sqrt{\dfrac{s_1^2}{n_1}+\dfrac{s_2^2}{n_2}}}</math>
  
<u>'''When the two populations have equal variances'''</u>
+
Where<br/>
 +
m1 = Mean of the 1st sample<br/>
 +
s1 = Standard Deviation of the 1st sample<br/>
 +
n1 = Un-weighted Sample of the 1st sample
 +
 
 +
m2 = Mean of the 2nd sample<br/>
 +
s2 = Standard Deviation of the 2nd sample<br/>
 +
n2 = Un-weighted Sample of the 2nd sample
  
<math>t=(m_1-m_2) \over \sqrt{(n_1-1)</math>
+
Decision rules: <br/>
 +
If |t|<1.65 then the two populations are NOT significantly different at 90%<br/>
 +
If |t|&ge;1.65 then the two populations ARE significantly different at 90%<br/>
 +
If |t|<1.95 then the two populations are NOT significantly different at 95%<br/>
 +
If |t|&ge;1.95 then the two populations ARE significantly different at 95%

Latest revision as of 13:27, 26 August 2013

mTAB Median Calculation from Income Brackets

Approximate Annual HH Income
Median 71,180
Unweighted Sample Total Count 10,811


Approximate Annual HH Income Weighted
Response (1) Accumulated Response
Less than $15,000 11,714 11,714
$15,000 - $24,999 46,054 57, 768
$25,000 - $34,999 83,965 141,733
$35,000 - $44,999 102,093 243,826
$45,000 - $59,999 155,721 399,546
$60,000 - $74,999 161,435 560,981 <--Median will fall here (3)
$75,000 - $99,999 193,540 754,521
$100,000 - $124,999 134,706 889,227
$125,000 - $149,999 59,748 948,975
$150,000 - $199,999 41,971 990,946
$200,000 - $249,999 16,391 1,007,337
$250,000 or More 32,409 1,039,746
Weighted Subset Total Count 1,039,746
Weighted Sample Total Count 1,255,411


(1) Calculated Accumulated Weighted Response
(2) Divide total (1,039,746) by 2=519,873 519,873
(3) Find first value in Accumulated Response column that is greater than step 2 value
The median will fall between the $60,000-$74,999 bracket
(4) Step 2 amount (519,873) MINUS preceding break accumulated response 399,546 = 120,327
(5) Acc. Response where Median will fall 560,981 MINUS preceding break 399,546 = 161,435
(6) Step 4 Divided by Step 5 0.74536
(7) Multiply Step 6 by the range 14,999 ($60,000-$75,999) 11180
(8) Add Step 7 to bottom of range $60,000 71,180


mTAB Mean/Weighted Average Calculation from Income Brackets

Approximate Annual HH Income
Mean/Weighted Average 83,610
Unweighted Sample Total Count 10,811


(A) (B) (C)
Approximate Annual HH Income STAT1 STAT2 Midpoint
Less than $15,000 11,714 1 14,999 7,500 87,857,249
$15,000 - $24,999 46,054 15,000 24,999 20,000 921,059,004
$25,000 - $34,999 83,965 25,000 34,999 30,000 2,518,899,346
$35,000 - $44,999 102,093 35,000 44,999 40,000 4,083,654,266
$45,000 - $59,999 155,721 45,000 59,999 52,500 8,175,254,132
$60,000 - $74,999 161,435 60,000 74,999 67,500 10,896,752,251
$75,000 - $99,999 193,540 75,000 99,999 87,500 16,934,669,636
$100,000 - $124,999 134,706 100,000 124,999 112,500 15,154,345,342
$125,000 - $149,999 59,748 125,000 149,999 137,500 8,125,258,359
$150,000 - $199,999 41,971 150,000 199,999 175,000 7,344,910,850
$200,000 - $249,999 16,391 200,000 249,000 225,000 3,688,025,252
$250,000 or More 32,409 250,000 300,000 275,000 8,912,452,979
Weighted Subset Total Count 1,039,746 86,933,138,666
Weighted Sample Total Count 1,255,411


(1) Find Midpoint of data ranges - Column (B)
(2) Multiply Weighted Counts (A) by Midpoints (B) to generate (C)
(3) Divide the sum of column (C) by the total weighted response at the bottom of column (A)...
86,933,138,666 divided by 1,039,746 = 83,610
You will notice the calculated average matches the mTAB produced average


Standard Deviation Calculation

For categorized questions, each response is assigned 1 or 2 stat weights. If a single weight is assigned, then this is the value used to calculate the standard deviation. If 2 weights are provided, the midpoint is used.

D = Question Mean - Stat Value as described above
SS = Sum of Squares, D*D*Weighted Response Count, for all table responses
Sample = Sum of all Weighted Response Counts for all table responses

Standard Deviation = SQRT(SS/Sample-1));


The calculation is the same for continuous variables except the actual data values are used instead of stat weight.

D = Question Mean - Response Value
SS = Sum of Squares, D*D*Respondent Weight Count for each response
Sample = Sum of all Respondent Weights for each response

Standard Deviation = SQRT(SS/Sample-1));


Comparison of two population means using T-Statistic

When the two populations have equal variances

\(t = \dfrac{m_1-m_2}{\sqrt{\dfrac{(n_1-1)s_1^2+(n_2-1)s_2^2}{n_1+n_2-2}(\dfrac{1}{n_1}+\dfrac{1}{n_2})}}\)

Where
m1 = Mean of the 1st sample
s1 = Standard Deviation of the 1st sample
n1 = Un-weighted Sample of the 1st sample

m2 = Mean of the 2nd sample
s2 = Standard Deviation of the 2nd sample
n2 = Un-weighted Sample of the 2nd sample

Decision rules:
If |t|<1.65 then the two populations are NOT significantly different at 90%
If |t|≥1.65 then the two populations ARE significantly different at 90%
If |t|<1.95 then the two populations are NOT significantly different at 95%
If |t|≥1.95 then the two populations ARE significantly different at 95%


When the two populations have UNEQUAL variances

\(t = \dfrac{m_1-m_2}{\sqrt{\dfrac{s_1^2}{n_1}+\dfrac{s_2^2}{n_2}}}\)

Where
m1 = Mean of the 1st sample
s1 = Standard Deviation of the 1st sample
n1 = Un-weighted Sample of the 1st sample

m2 = Mean of the 2nd sample
s2 = Standard Deviation of the 2nd sample
n2 = Un-weighted Sample of the 2nd sample

Decision rules:
If |t|<1.65 then the two populations are NOT significantly different at 90%
If |t|≥1.65 then the two populations ARE significantly different at 90%
If |t|<1.95 then the two populations are NOT significantly different at 95%
If |t|≥1.95 then the two populations ARE significantly different at 95%