---
title: "The K distribution"
author: "R. Wayne Oldford"
#date: "`r Sys.Date()`"
output: 
  rmarkdown::html_vignette:
    fig_caption: yes
vignette: >
  %\VignetteIndexEntry{The K distribution}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---
```{r, setup, echo=FALSE}
library(qqtest)
```

# Definition

Suppose $X \sim \chi^2_m$ is a Chi-squared random variate on $m$ degrees of freedom. Then the distribution of
\[Y = \sqrt{\frac{X}{m}}\]
is the *Kay* distribution on $m$ degrees of freedom, written as
\[Y \sim K_m.\]
Its density is
\[f(y) = \left\{\begin{array}{lcl}
    \frac{m^{\frac{m}{2}} y ^{m-1} e^{-\frac{1}{2} m y^2}} {2^{\frac{m}{2}-1} \Gamma(\frac{m}{2})} &~~~& \text{for} ~~ 0 \le y < \infty \\
    &&\\
    0 && \text{otherwise}.
    \end{array}
    \right.
\]

The $K_m$ density has some very attractive features over the $\chi^2_m$ density:

- $K_m$ has a much more symmetric density than had the $\chi^2_m$, for any $m$;
- like a $\chi^2_m$ density a $K_m$ density, as $m \rightarrow \infty$ $K_m$  becomes more symmetric and nearly normally distributed but does both faster than a $\chi^2_m$; 
- as $m \uparrow$, the $K_m$ density concentrates around the value $y=1$, rather than heading off to $\infty$ like the $\chi^2_m$;

```{r, echo=FALSE, warning=FALSE, message=FALSE, fig.align="center", fig.width=7, fig.height=4, fig.cap="As m increases, Km has better properties"}
# compare K density to that of chi as degrees of freedom increase
op <-par(mfrow=c(1,2))
p <- seq(0.001, .999, 0.001)
#
# First get all the chi-square densities and plot them
xchi5 <- qchisq(p,5)
dchi5 <- dchisq(xchi5,5)
xchi10 <- qchisq(p,10)
dchi10 <- dchisq(xchi10,10)
xchi20 <- qchisq(p,20)
dchi20 <- dchisq(xchi20,20)
xchi30 <- qchisq(p,30)
dchi30 <- dchisq(xchi20,30)
xlim <- range(xchi5, xchi10, xchi20, xchi30)
ylim <- range(dchi5, dchi10, dchi20, dchi30)
plot(xchi5, dchi5, type="l", xlab="x", ylab="density", 
     xlim=xlim, ylim=ylim,  
     main="chi-squared densities", col="steelblue")
lines(xchi10, dchi10, lty=2, col="steelblue")
lines(xchi20, dchi20, lty=3, col="steelblue")
lines(xchi20, dchi30, lty=4, col="steelblue")
legend("topright",  
       legend=c("df = 5", "df = 10", "df = 20", "df = 30"),  
       lty=c(1,2,3,4),  
       title="degrees of freedom",  
       cex=0.75, bty="n", col="steelblue")
#
# Now get all the K densities and plot them
xkay5 <- qkay(p,5)
dkay5 <- dkay(xkay5,5)
xkay10 <- qkay(p,10)
dkay10 <- dkay(xkay10,10)
xkay20 <- qkay(p,20)
dkay20 <- dkay(xkay20,20)
xkay30 <- qkay(p,30)
dkay30 <- dkay(xkay20,30)
xlim <- range(xkay5, xkay10, xkay20, xkay30)
ylim <- range(dkay5, dkay10, dkay20, dkay30)
plot(xkay5, dkay5, type="l",  
     xlab="y", ylab="density", 
     xlim=xlim, ylim=ylim,  
     main="K densities", col="steelblue")
lines(xkay10, dkay10, lty=2, col="steelblue")
lines(xkay20, dkay20, lty=3, col="steelblue")
lines(xkay20, dkay30, lty=4, col="steelblue")
abline(v=1, col="grey", lty=5)
legend("topright",  
       legend=c("df = 5", "df = 10", "df = 20", "df = 30"),  
       lty=c(1,2,3,4),  
       title="degrees of freedom",  
       cex=0.75, bty="n", col="steelblue")
par(op)
```

These values were calculated using the `dkay(...)` density function.  For example, `dkay(1.0, df=10) = ` `r dkay(1.0, df=10)`.

## Normal theory relations
Perhaps the most obvious relation between a normal random variate and a $K_m$ is that if $Z \sim N(0,1)$, then $|Z|\sim K_1$, the half-normal.

More important in applications is that distribution of the estimator of the sample standard deviation is proportional to a $K_m$.  To be precise, if $Y_1, \ldots, Y_n$ are independent and identically distributed as $N(\mu, \sigma^2)$ random variates, with realizations $y_1, \ldots, y_n$ and the usual estimates $\widehat{\mu} = \sum y_i /n$ and $\widehat{\sigma} = \sqrt{\sum (y_i - \widehat{\mu})^2/(n-1)}$, then the corresponding estimators $\widetilde{\mu}$ and $\widetilde{\sigma}$ are distributed as
\[ \widetilde{\mu} \sim N(\mu, \frac{\sigma^2}{n})  ~~~~~\text{and} ~~~~~  \frac{\widetilde{\sigma}}{\sigma} \sim K_{n-1}. \]
The latter shows that $K_m$ is used for inference (e.g. tests and confidence intervals) about $\sigma$.

This is handy because the $K_m$ quantiles vary much less than do those of $\chi^2_m$.  For example, condider the following table of the cumulative distribution.

```{r, echo=FALSE, results='asis'}
df <- c(1:10,seq(15, 40, 5))
p <- c( 0.05, 0.5, 0.95)
fun <- function(p) qkay(p,df)
table <- as.data.frame(cbind(df,sapply(p, fun)))
colnames(table) <- c("df", paste0("p=", p))
knitr::kable(table)
```

Unlike the $\chi^2_m$ distribution, the quantiles in this table stabilize, allowing $1 \pm 0.20$ being not a bad rule of thumb for a $90\%$ probability of the ratio $\widetilde{\sigma}/\sigma$.

These values were calculated using the `qkay(...)` quantile function.  For example, `qkay(0.05, df=5) = ` `r qkay(0.05, df=5)`.  These would be used to construct interval estimates for $\sigma$.  

To get observed significance levels, the cumulative distribution function `pkay(...)` would be used.  For example, `SL = 1- pkay(1.4, df=10) = 1 -` `r pkay(1.4, df=10)` `=` `r 1- pkay(1.4, df=10)`.

### The Student t distribution

For the standard normal theory, the Student $t_m$ distribution can be defined as follows. If $Z \sim N(0,1)$ and $Y \sim K_m$ is distributed independently of $Z$, then the ratio
\[T=\frac{Z}{Y} = \frac{N(0,1)}{K_m} = t_m\]
which is fairly easy to remember.

For the estimators from the above model
\[\frac{\widetilde{\mu} - \mu}
       {\widetilde{\sigma} / \sqrt{n}}
       = \frac{ \frac{\widetilde{\mu}-\mu}
                     {\sigma/\sqrt{n}}
                     }
              {\frac{\widetilde{\sigma}}
                    {\sigma}
                    }
               = \frac{N(0,1)}{K_{n-1}} = t_{n-1} \]
is used to construct interval estimates and tests for the value of the parameter $\mu$.

# The functions
As with every other distribution in `R` four functions are provided for the $K_m$ distribution. These are

- `dkay(x, df=m, ...)` which evalutes the density of $K_m$ at $x$,
- `pkay(x, df=m, ...)` which evalutes the distribution of $K_m$ at $x$,
- `qkay(p, df=m, ...)` which evalutes the quantile of $K_m$ at the proportion $p$,
- `rkay(n, df=m, ...)` which generates $n$ pseudo-random realizations from $K_m$.

The parameters in the ellipsis include a non-centrality parameter.  All functions rely on the corresponding $\chi^2_m$ functions in base `R`.

We briefly illustrate each below.

## The density `dkay(x, df, ...)`

```{r, warning=FALSE, message=FALSE, fig.align="center", fig.width=5, fig.height=4}
x <- seq(0,2,0.01)
plot(x, dkay(x, df=10), type="l", col="steelblue", 
     main="Density", xlab="x", ylab="f(x)")
abline(v=1.0, lty=2, col="grey")

```

## The cumulative distribution function `pkay(x, df, ...)`

```{r, warning=FALSE, message=FALSE, fig.align="center", fig.width=5, fig.height=4}
x <- seq(0,2,0.01)
plot(x, pkay(x, df=10), type="l", col="steelblue", 
     main="Distribution", xlab="x", ylab="F(x)")
abline(v=1.0, lty=2, col="grey")

```

## The quantile function `qkay(p, df, ...)`

```{r, warning=FALSE, message=FALSE, fig.align="center", fig.width=5, fig.height=4}
x <- seq(0,2,0.01)
p <- pkay(x, df=10)
plot(p, qkay(p, df=10), type="l", col="steelblue", 
     main="Quantile function", xlab="p", ylab="Q(p)")
abline(h=1.0, lty=2, col="grey")

```

## Pseudo-random realizations `rkay(n, df, ...)`

```{r, warning=FALSE, message=FALSE, fig.align="center", fig.width=5, fig.height=4}
x <- rkay(1000, df=10)
hist(x, col="steelblue", 
     main="Pseudo-random numbers", xlab="x")
abline(v=1.0, lty=2, col="grey")

```