Note: 0330Count data models

M Wu
4 min readMay 24, 2021

If x_k is not in logarithm, then parameter β_k is semielasticity.
If x_k is in logarithm, then parameter β_k is elasticity.

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.8639 0.7091 6.860 6.9e-12 ***
lnN 1.4242 0.3725 3.823 0.000132 ***
private 0.6841 0.3896 1.756 0.079110 .
— -
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

InN increase 1 unit →number of errors increase 1.4% (跟probit logit不同,這裡是%而不是percent point)

N increase 1 % →number of errors increase 1.4%

private is dummy variable.
private hospital have higher number of errors than public ones by 68%(semielasticity)

|β| > 0.5 → we should apply the exact interpretation of semielasticity.
[exp(0.68) — 1 ]* 100% = 97%

COUNT DAT MODELS

Poisson distribution

Poisson regression model

E(y|x) = Var(y|x) = λ
The model can bu used if the condition above is met.
If not, we should use the negative binomial regression model Parameters’ interpretation
μ(x) = exp(xβ)
If xk is not in logarithm, then parameter βk is semielasticity.
If xk is in logarithm, then parameter βk is elasticity.

Negative Binomial distribution

A discrete random variable Y is said to have negative binomial distribution with parameters (v, p), v ∈ N, 0 < p < 1 if:

H0: Poisson is equally good as NB regression. → there is no sense in estimates more advance model we should estimates the simple one. (Poisson regression)

H1: NB > Poisson

Negative Binomial Regression model:

H0: α = 0 , Poisson
H1: α ≠ 0 , α >0 NB

— The ln(population)<0, however, it is insignificant which mean the population are not effect the number of medal won by a country.
— The ln(GDP) > 0, whenever GDP increases by 1 %, the number of medal won increases by 0.66%.
— Soviet (not in logarithm), Soviet countries won more medal than other countries by 1.908% more. → [exp(1.9)-1]*100%
— Host (not in logarithm), host countries won more medal than other countries by 0.5% more. → [exp(0.5)-1]*100%

Zero Inflated Poisson Model (ZIP)

ZIP model allows to model variables with overdispersion problem.
It is assumed that the entities can be divided into two subgroups:
Always-zero-group
Not-always-zero-group — counts generated with a Poisson distribution, can be zero, but not always

Zero Inflated Poisson Model (ZIP)

Two-step estimation
(1) Logit (or probit) model to assign the observation to either always-zero or not-always-zero group
(2) Poisson model for not-always-zero group observations

Additional year of schoolly will decrease probability for being in always zero group.
Men have probability for being in always zero group than women

Additional number of hospital visits will increases number of visits by 23.6%
Additional number of chronic diseases will increases number of visits by 19%
Additional private insurance will increases number of visits by 21%

Testing zero inflation

Score test for zero inflation

Hurdle models

how these variables effects being in non-zero group?

Count data modelling strategies

--

--