Code
using Bootstrap
using Statistics
using Plots
gr(); # activate gr backend
Some notes related to presentation
James Read-Tannock gave talk with general background ideas on non-parametric stats, more classical tests still taught in Psych programmes, and then introduced the ideas of the bootstrap, jackknife, and permutation testing. [Slides]
Some thoughts on computationally expensive methods. To get some hands on the basics, let’s have a look at a basic example (from Chapter 2 in the book. Table 2.1), in which survival data from mouse in a treatment / control experiment are presented.
I decided to use julia
in this notebook, but it shouldn’t be hard to reproduce in R
, python
, matlab
, or your own preferred method of working with and plotting data.
The julia
packages required are listed in the Project.toml
and Manifest.toml
. You should be able to download the folder from the github repo and import Pkg
follwed by Pkg.activate(".")
and Pkg.instantiate()
.
using Bootstrap
using Statistics
using Plots
gr(); # activate gr backend
# data from Table 2.1 in Efron & Tibshirani book
= [94,197,16,38,99,141,23];
treatment = [52,104,146,10,51,30,40,27,46];
control
plot(treatment,ones(size(treatment)), line=:stem, color=:red, lw=2, label="Treatment")
plot!(control,ones(size(treatment)), line=:stem,
=:blue, lw=2, label="Control",
color=[600,100], ylim=[0, 1.2],
size= :outertopright, yticks=[0,1],
legend = "Efron & Tibshirani murine example",
title = font(10,"Arial"),
titlefont =:right,
titlefonthaligns= font(10,"Arial"),
tickfont = font(10,"Arial"))
legendfont xaxis!(label="Survival time (days)")
median
Now we can create 10000 bootstrap samples from the treatment
and control
data - making sure to respect the size of each list/vector, and to use sampling with replacement*.
## basic bootstrap of the "median" treatment data
@time bs1 = bootstrap(median, treatment, BasicSampling(n_boot))
0.332936 seconds (802.73 k allocations: 40.556 MiB, 99.63% compilation time)
Bootstrap Sampling
Estimates:
Var │ Estimate Bias StdError
│ Float64 Float64 Float64
─────┼──────────────────────────────
1 │ 94.0 -14.0293 37.3796
Sampling: BasicSampling
Samples: 10000
Data: Vector{Int64}: { 7 }
The Bootstrap.jl
package (see documentation) also has tools for getting confidence intervals, and extracting other info like # of observations, etc
## calculate 95% confidence intervals
= 0.95;
cil
## basic CI
= confint(bs1, BasicConfInt(cil));
bci1
## percentile CI
= confint(bs1, PercentileConfInt(cil));
bci2
## BCa CI
= confint(bs1, BCaConfInt(cil));
bci3
## Normal CI
= confint(bs1, NormalConfInt(cil)); bci4
which lets you compare, e.g. the normal confidence interval (94.0, 34.7667175864441, 181.29188241355592), with the percentile confidence interval (94.0, 23.0, 141.0).
To visualise the distribution of bootstrapped medians, we can get them with straps()
and use the histogram()
function from Plots.jl
. The bootstrap estimate of the standard error of the statistic of interest can be obtained from the standard deviation of the bootstrap replications. (The library will do this under the hood).
# look at the distribution (called "straps")
histogram( straps(bs1)[1], title="Distribution of bootstrapped medians [treatment]",
="",
label=[600,300] ) size
We can also try something a bit more complicated, like boostrap sample treatment
and sample
and use those to look at the distribution of difference in means.
## basic bootstrap of the "mean" data
= bootstrap(x->mean(x), treatment, BasicSampling(n_boot))
bs_t = bootstrap(y->mean(y), control, BasicSampling(n_boot))
bs_c
# the difference estimator
# calculated from the
= straps(bs_t)[1];
t = straps(bs_c)[1];
c = t - c;
z = 50;
nbins
= range(minimum([t;c;z]),maximum([t;c;z]), nbins )
bins # percentile
= quantile(z, [0.025, 0.975])
pc
histogram(t, bins = bins, alpha=0.8, color=:gray, label="treatment",size=[600,400])
histogram!(c, bins = bins, alpha=0.8, color=:white, label="control")
… and from those bootstrapped means, we can calculate the difference and look at the distribution, and percentiles of that:
# look at the distribution (called "straps")
histogram( z, bins = bins,
="Distribution of bootstrapped difference",
title=:red,
color="treatment - control",
label=[600,400])
sizevline!(pc, lw=2, label="2.5 and 97.5 percentiles")
vline!([mean(z)], lw=2, color=:black, label="")
There is a whole ecosystem for doing hypothesis testing in julia
which also includes permutation testing (see documentation for HypothesisTests.jl
) or dip into R
, which will have tons of options, too.
Lots more interesting stuff out there - if you want to add things here, send me a message.
There is a pretty comprehensive article on boot strap methods by the inventors of the methods - worth a look if this is something you will use in your work Efron and Tibshirani (1986).
The classic textbook is quite expensive and a bit harder to get but you can ask around and/or look in the library to check it out.
The first couple of chapters are available at Born lab, Harvard Medical School as part of some neurobio class. Check it out. Tibshirani and Efron (1993).