Bootstrapping is a method for estimating the sampling distribution of an estimator by resampling with replacement from the original sample. The method is especially useful when sampling distribution of estimator is not standard distribution.
Bootstrapping can be used in the following scenarios:
Let us see how to perform bootstrapping in R.
Objective is to generate sampling distribution of sample median. The original sample contains 10 values: It is imported in R as bootdata
.
bootdata
X
1 1.0
2 0.6
3 1.2
4 -0.2
5 1.6
6 1.7
7 0.9
8 1.8
9 0.0
10 2.5
R has function boot()
in package boot
to generate bootstrap replicates of a statistic applied to data. This function allows both parametric and nonparametric resampling.
install.packages("boot")
library(boot)
The function boot()
available in package boot
calculates a statistic (in our case - median) for specified number of times (say 1000). The statistic
is defined using a function. The following function f accepts data and vector of random numbers i
and calculates median for resampled data.
f <- function (data, i) {
d <- data[i,]
med <- median(d)
return(med)
}
The above function is called 1000 times using the function boot()
.
bootobject <- boot(data = bootdata, statistic = f, R = 1000)
data=
is the original sample (A vector, matrix or a dataframe).
statistic=
is a function which when applied to data returns a vector containing the statistic(s) of interest.
R=
is the number of bootstrap replicates. Generally, 1000 or more replicates are generated to get sampling distribution.
Note that the function boot()
calls function f
by sending original data and the vector of random numbers 1000 times.
bootobject
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = bootdata, statistic = f, R = 1000)
Bootstrap Statistics :
original bias std. error
t1* 1.1 0.05645 0.3337398
This bootstrap object can be used for further analysis. For instance, suppose we want to know the 95% confidence interval of the median:
boot.ci(bootobject, type = "perc")
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates
CALL :
boot.ci(boot.out = bootobject, type = "perc")
Intervals :
Level Percentile
95% ( 0.45, 1.70 )
Calculations and Intervals on Original Scale
boot.ci()
generates five types of equi-tailed two-sided nonparametric confidence intervals.
Types include:
"norm"
)"basic"
)"stud"
)"perc"
)"bca"
)