--- title: "Confidence Interval Estimation with Hand-Crafted CDFs" author: "Peter E. Freeman" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Confidence Interval Estimation with Hand-Crafted CDFs} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ## Introduction Recall that to determine confidence interval bounds, we solve the equation \begin{align*} F_Y(y_{\rm obs} \vert \theta) - q = 0 \end{align*} for $\theta$. Here, $Y$ is our chosen statistic (e.g., $\bar{X}$), $F_Y(\cdot)$ is the cumulative distribution function (or cdf) for $Y$'s sampling distribution, $y_{\rm obs}$ is the observed statistic value, and $q$ is a quantile that we determine given the confidence coefficient $1-\alpha$ and the type of interval we are constructing (one-sided lower bound, one-sided upper bound, or two-sided). In this vignette, we examine the situation in which the sampling distribution cdf *is* analytically derivable, but (a) is not (easily) invertible by hand, and (b) is (to our knowledge!) not coded in an existing `R` package. In this situation, we would develop our own cdf-calculating function, and then pass it to `cdfinv()`. ## Example Let's suppose we sample $n = 16$ data from a $\text{Uniform}(0,\theta)$ distribution and we wish to compute a 95% upper bound for $\theta$. Further, let's suppose that the $10^{\rm th}$ datum is $x_{(10),\rm obs} = 0.45$, and that we will use this datum to determine the confidence interval. (This is obviously an academic exercise: we would normally utilize $x_{(16),\rm obs}$ and solve for the upper bound by hand.) ## Deriving the CDF For the $\text{Uniform}(0,\theta)$ distribution, the cdf within the domain is \begin{align*} F_X(x) = \int_0^x \frac{1}{\theta - 0} dy = \frac{x}{\theta} \,. \end{align*} We utilize a result found in Casella & Berger (2002; page 229): the cdf for the $j^{\rm th}$ order statistic is \begin{align*} F_{(j)}(x) = \sum_{k=j}^n \binom{n}{k} [F_X(x)]^k [1-F_X(x)]^{n-k} \,. \end{align*} (Our notation slightly differs from that of Casella & Berger.) Here, that cdf would be \begin{align*} F_{(j)}(x) = \sum_{k=j}^n \binom{n}{k} \left(\frac{x}{\theta}\right)^k \left[1-\left(\frac{x}{\theta}\right)\right]^{n-k} \,. \end{align*} This is obviously *not* a cdf that we can easily invert by hand when $j < n$. ## Coding the CDF Below we show how one might create one's own code that returns the cdf value as a function of a coordinate $x$ and a parameter $\theta$. Note that the first argument must be `q`...the observed statistic value passed to `cdfinv()` will be matched to this argument name. ```{r} # give the function a name that will not conflict with others in the namespace punif.ord <- function(q,theta,j,n) { sum(choose(n,j:n)*(q/theta)^(j:n)*(1-q/theta)^(n-j:n)) } ``` ## Applying the CDF Below we show how we can pass the function, now defined in `R`'s global environment, to `cdfinv()`. The first three arguments are standard: the distribution "name" is `unif.ord` (`cdfinv()` will tack a `p` on the front), the parameter of interest is `theta`, and the observed statistic value is 0.45. The next argument specifies that we wish to compute a (95% by default) upper bound on $\theta$, instead of the default two-sided interval. As we know that $\theta \geq x_{(10),\rm obs}$, we specify a lower parameter bound (`lpb`) of 0.45. The last two arguments are extra ones specifically required for `punif.ord()`. ```{r} require(cdfinv) cdfinv("unif.ord","theta",0.45,bound="upper",lpb=0.45,j=10,n=16) ``` The 95% upper bound on $\theta$ determined using the $10^{\rm th}$ ordered datum is $\hat{\theta}_U = 1.151$.