Kernel Density Estimation Algorithm with R

1. Introduction

This very short blog post aims to illustrate the step by step algorithm used to estimate a density.

2. Data

Let’s start by generating 2000 data points from a gamma distribution :

data <- rgamma(2000, 6, 2)

3. The Algorithm

density_perso <- function (data){
    sigma <- sd(data)
    mu <- mean(data)
    n <- length(data)
    h <- (4/5)^(1/5)*sigma*n^(-1/5) # bandwidth optimal en théorie
    data_sorted <- sort(data)
    ku <- vector(length=length(data))
    col_density <- vector(length=length(data))
    col_density_theory <- vector(length=length(data))
    for (i in seq_along(data_sorted)){
    x <- data_sorted[i]
        for (j in seq_along(data_sorted)){
            u <- (x-data_sorted[j]) / h
            ku[j] <- dnorm(u, 0, 1)
            }
        col_density[i] <- mean(ku) / h
        }
    cbind(data_sorted, col_density)
}

The function above takes a random vector as only argument.
The h variable is the optimal smoothing parameter :

$h=\frac{4}{5}^{\frac{1}{5}} * \sigma*n^{\frac{-1}{5}}$

4. Viz

density_result <- density_perso(data)
col_density_theory <- dgamma(density_result[,1], 6, 2)
plot(density(data), col="red")
# title(main="Visualisation des densités estimées")
lines(density_result[,1], density_result[,2], col="green")
lines(density_result[,1], col_density_theory, col="blue")
legend(4.2, 0.35,
       legend=c("Estimation de Densité calculée par R",
                "Estimation de Densité calculée par la fonction density_perso",
                "Densité théorique"),
       col=c("red", "green", "blue"), lty=1, cex=0.5)

png The density natively given by R (red) is close to the density we computed ourselves (green).