Identifying and fitting of distributions for discrete variables

In the previous article, we saw how fitting of distribution is done in R for continuous variables. Lets see now how to implement it for discrete variables using R.

Some of the standard discrete distributions are Binomial, Poisson and Negative Binomial.

Identifying and fitting distributions on the given data

Data : The given data is retail data consisting of 1 variable named number of complaints for 100 retailers.

1. Importing data

salesdata<-read.csv("salesdata dist fitting discrete.csv",header=T)
head(salesdata)

##   retailer ncomp
## 1        1    10
## 2        2     9
## 3        3     2
## 4        4     9
## 5        5     4
## 6        6     6

2. Checking normality using Box-Whisker plots of the variable

boxplot(salesdata$ncomp,col="blue",main="ncomp")

From the above Boxplot, we can see that number of complaints show a bit skewness.

3. Plotting the Skewness-Kurtosis plot to choose the best distribution(s) to be fitted for number of complaints

library(fitdistrplus)
descdist(salesdata$ncomp,discrete = T)

## summary statistics
## ------
## min:  2   max:  12 
## median:  6 
## mean:  6.2 
## estimated sd:  2.566293 
## estimated skewness:  0.3805439 
## estimated kurtosis:  2.460484

The distribution closest to the observation is the best option for fitting. In this case, it is poisson distribution.

4. Fitting poisson distribution for number of complaints

fit1<-fitdist(salesdata$ncomp,"pois")
plot(fit1)