wtorek, 5 lipca 2011

Plotting ROC curves in ggplot2

Default ROC curves in R are disgusting. ggplot2 comes to the rescue.


First, let's write some data generating function that will be useful for ROC:
getExampleDataForROC <- function(len = 100, distort = 1) {
  half <- round(len/2)
  ### Group ###
  group <- vector()
  group[1:half] <- "Disease"
  group[(half + 1):len] <- "Control"
  group <- as.factor(group)
  ### Score ###
  score <- vector()
  score[1:half] <- 1
  score[(half + 1):len] <- -1
  for (i in 1:len)
    score[i] <- score[i] + rnorm(1, 0, distort)
  order <- order(score)
  data.frame(pos = score[order], group = group[order])
}

Let's write ROC plotting functions:
getPointsForROC <- function(len = 100, distort = 1) {
  basal <- getExampleDataForROC(len, distort)
  tp <- vector(); tn <-vector(); fp <-vector(); fn <- vector()
  tpr <- vector(); fpr <- vector()
  acc <- vector(); spc <- vector()
  len <- dim(basal)[1]
  for(i in 1:len-1) {
    tp[i] <- sum(basal[(i+1):len,2] == "Disease")
    tn[i] <- sum(basal[1:i,2] == "Control")
    fp[i] <- sum(basal[(i+1):len,2] == "Control")
    fn[i] <- sum(basal[1:i,2] == "Disease")
    tpr[i] <- tp[i] / (tp[i] + fn[i])
    fpr[i] <- fp[i] / (fp[i] + tn[i])
    acc[i] <- (tp[i] + tn[i]) / ((tp[i] + fn[i]) +  (fp[i] + tn[i]))
    spc[i] <- 1 - fpr[i]
  }
  points <- (cbind(fpr,tpr))[(len-1):1,]
  points <- rbind(points, c(1,1))
  return(points)
} 


plotROC <- function(len = 100) {
  points.part1 <- getPointsForROC(len, 1)
  desc.part1 <- rep("Distort = 1",len)
  points.part2 <- getPointsForROC(len, 1.5)
  desc.part2 <- rep("Distort = 1.5",len)
  points.part3 <- getPointsForROC(len, 2)
  desc.part3 <- rep("Distort = 2",len)
  points.part4 <- getPointsForROC(len, 2.5)
  desc.part4 <- rep("Distort = 2.5",len)

  points <- rbind(points.part1, points.part2, points.part3, points.part4)
  desc <- c(desc.part1, desc.part2, desc.part3, desc.part4)
  data <- data.frame(TPR = points[,2], FPR = points[,1], GeneSet = desc)
  colors = brewer.pal(5, "YlOrRd")[2:5]
  qplot(FPR, TPR, data = data, geom = "blank", main = "ROC curve", xlab = "False Positive Rate (1-Specificity)", ylab = "True Positive Rate (Sensitivity)" ) + geom_line(aes(x = FPR, y = TPR, data = data, colour = GeneSet), size = 2, alpha = 0.7) + scale_colour_manual(values=colors)
}
Finally, let's generate plot:
plotROC(1000)

3 komentarze:

  1. Hello, I ran this exact code, and I get the following error :

    > plotROC(1000)
    Warning: Ignoring unknown aesthetics: data
    Don't know how to automatically pick scale for object of type data.frame. Defaulting to continuous.
    Error: Aesthetics must be either length 1 or the same as the data (4000): x, y, data, colour

    OdpowiedzUsuń
  2. Way cool! Some very valid points! I appreciate you writing this article and also the rest of the website is extremely good.
    Data Science Course
    Best Data Science Online Training in Hyderabad

    OdpowiedzUsuń