1. Introduction
This very short blog post aims to illustrate the step by step algorithm used to estimate a density.
2. Data
Let’s start by generating 2000 data points from a gamma distribution :
data <- rgamma(2000, 6, 2)
3. The Algorithm
density_perso <- function (data){
sigma <- sd(data)
mu <- mean(data)
n <- length(data)
h <- (4/5)^(1/5)*sigma*n^(-1/5) # bandwidth optimal en théorie
data_sorted <- sort(data)
ku <- vector(length=length(data))
col_density <- vector(length=length(data))
col_density_theory <- vector(length=length(data))
for (i in seq_along(data_sorted)){
x <- data_sorted[i]
for (j in seq_along(data_sorted)){
u <- (x-data_sorted[j]) / h
ku[j] <- dnorm(u, 0, 1)
}
col_density[i] <- mean(ku) / h
}
cbind(data_sorted, col_density)
}
The function above takes a random vector as only argument.
The h variable is the optimal smoothing parameter :
$h=\frac{4}{5}^{\frac{1}{5}} * \sigma*n^{\frac{-1}{5}}$
4. Viz
density_result <- density_perso(data)
col_density_theory <- dgamma(density_result[,1], 6, 2)
plot(density(data), col="red")
# title(main="Visualisation des densités estimées")
lines(density_result[,1], density_result[,2], col="green")
lines(density_result[,1], col_density_theory, col="blue")
legend(4.2, 0.35,
legend=c("Estimation de Densité calculée par R",
"Estimation de Densité calculée par la fonction density_perso",
"Densité théorique"),
col=c("red", "green", "blue"), lty=1, cex=0.5)
The density natively given by R (red) is close to the density we computed ourselves (green).