## wtorek, 5 lipca 2011

### Plotting ROC curves in ggplot2

Default ROC curves in R are disgusting. ggplot2 comes to the rescue.

First, let's write some data generating function that will be useful for ROC:
``````getExampleDataForROC <- function(len = 100, distort = 1) {
half <- round(len/2)
### Group ###
group <- vector()
group[1:half] <- "Disease"
group[(half + 1):len] <- "Control"
group <- as.factor(group)
### Score ###
score <- vector()
score[1:half] <- 1
score[(half + 1):len] <- -1
for (i in 1:len)
score[i] <- score[i] + rnorm(1, 0, distort)
order <- order(score)
data.frame(pos = score[order], group = group[order])
}``````

Let's write ROC plotting functions:
``````getPointsForROC <- function(len = 100, distort = 1) {
basal <- getExampleDataForROC(len, distort)
tp <- vector(); tn <-vector(); fp <-vector(); fn <- vector()
tpr <- vector(); fpr <- vector()
acc <- vector(); spc <- vector()
len <- dim(basal)
for(i in 1:len-1) {
tp[i] <- sum(basal[(i+1):len,2] == "Disease")
tn[i] <- sum(basal[1:i,2] == "Control")
fp[i] <- sum(basal[(i+1):len,2] == "Control")
fn[i] <- sum(basal[1:i,2] == "Disease")
tpr[i] <- tp[i] / (tp[i] + fn[i])
fpr[i] <- fp[i] / (fp[i] + tn[i])
acc[i] <- (tp[i] + tn[i]) / ((tp[i] + fn[i]) +  (fp[i] + tn[i]))
spc[i] <- 1 - fpr[i]
}
points <- (cbind(fpr,tpr))[(len-1):1,]
points <- rbind(points, c(1,1))
return(points)
}

plotROC <- function(len = 100) {
points.part1 <- getPointsForROC(len, 1)
desc.part1 <- rep("Distort = 1",len)
points.part2 <- getPointsForROC(len, 1.5)
desc.part2 <- rep("Distort = 1.5",len)
points.part3 <- getPointsForROC(len, 2)
desc.part3 <- rep("Distort = 2",len)
points.part4 <- getPointsForROC(len, 2.5)
desc.part4 <- rep("Distort = 2.5",len)

points <- rbind(points.part1, points.part2, points.part3, points.part4)
desc <- c(desc.part1, desc.part2, desc.part3, desc.part4)
data <- data.frame(TPR = points[,2], FPR = points[,1], GeneSet = desc)
colors = brewer.pal(5, "YlOrRd")[2:5]
qplot(FPR, TPR, data = data, geom = "blank", main = "ROC curve", xlab = "False Positive Rate (1-Specificity)", ylab = "True Positive Rate (Sensitivity)" ) + geom_line(aes(x = FPR, y = TPR, data = data, colour = GeneSet), size = 2, alpha = 0.7) + scale_colour_manual(values=colors)
}``````
Finally, let's generate plot:
``plotROC(1000)``

#### 2 komentarze:

1. Hello, I ran this exact code, and I get the following error :

> plotROC(1000)
Warning: Ignoring unknown aesthetics: data
Don't know how to automatically pick scale for object of type data.frame. Defaulting to continuous.
Error: Aesthetics must be either length 1 or the same as the data (4000): x, y, data, colour

2. 