Dendrograms in ggplot2

How to make Dendrograms in ggplot2 with Plotly.


New to Plotly?

Plotly is a free and open-source graphing library for R. We recommend you read our Getting Started guide for the latest installation or upgrade instructions, then move on to our Plotly Fundamentals tutorials or dive straight in to some Basic Charts tutorials.

Default dentogram

The hclust() and dendrogram() functions in R makes it easy to plot the results of hierarchical cluster analysis and other dendrograms in R. However, it is hard to extract the data from this analysis to customize these plots, since the plot() functions for both these classes prints directly without the option of returning the plot data.

library(plotly)
library(ggplot2)
library(ggdendro)

hc <- hclust(dist(USArrests), "ave")
p <- ggdendrogram(hc, rotate = FALSE, size = 2)

ggplotly(p)
library(plotly)
library(ggplot2)
library(ggdendro)

model <- hclust(dist(USArrests), "ave")
dhc <- as.dendrogram(model)

data <- dendro_data(dhc, type = "rectangle")
p <- ggplot(segment(data)) + 
  geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + 
  coord_flip() + 
  scale_y_reverse(expand = c(0.2, 0))

ggplotly(p)

Of course, using ggplot2 to create the dendrogram means one has full control over the appearance of the plot. For example, here is the same data, but this time plotted horizontally with a clean background. In ggplot2 this means passing a number of options to theme. The ggdendro packages exports a function, theme_dendro() that wraps these options into a convenient function.

Note that coordinate system already present. Adding new coordinate system, which will replace the existing one.

library(plotly)
library(ggplot2)
library(ggdendro)

model <- hclust(dist(USArrests), "ave")
dhc <- as.dendrogram(model)

data <- dendro_data(dhc, type = "rectangle")
p <- ggplot(segment(data)) + 
  geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + 
  coord_flip() + 
  scale_y_reverse(expand = c(0.2, 0))

p <- p + 
      coord_flip() + 
      theme_dendro()

ggplotly(p)

Triangular segments

You can draw dendrograms with triangular line segments (instead of rectangular segments). For example:

library(plotly)
library(ggplot2)
library(ggdendro)

model <- hclust(dist(USArrests), "ave")
dhc <- as.dendrogram(model)

data <- dendro_data(dhc, type = "triangle")
p <- ggplot(segment(data)) + 
      geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + 
      coord_flip() + 
      scale_y_reverse(expand = c(0.2, 0)) +
      theme_dendro()

ggplotly(p)

Regression tree diagrams

tree() function in package tree creates tree diagrams. To extract the plot data for these diagrams using ggdendro, you use the the same idiom as for plotting dendrograms:

library(plotly)
library(ggplot2)
library(tree)
library(ggdendro)

data(cpus, package = "MASS")
model <- tree(log10(perf) ~ syct + mmin + mmax + cach + chmin + chmax, 
              data = cpus)
tree_data <- dendro_data(model)
p <- ggplot(segment(tree_data)) +
  geom_segment(aes(x = x, y = y, xend = xend, yend = yend, size = n), 
               colour = "blue", alpha = 0.5) +
  scale_size("n") +
  geom_text(data = label(tree_data), 
            aes(x = x, y = y, label = label), vjust = -0.5, size = 3) +
  geom_text(data = leaf_label(tree_data), 
            aes(x = x, y = y, label = label), vjust = 0.5, size = 2) +
  theme_dendro()

ggplotly(p)

Classification tree diagrams

The rpart() function in package rpart creates classification diagrams. To extract the plot data for these diagrams using ggdendro follows the same basic pattern as dendrograms:

library(plotly)
library(ggplot2)
library(rpart)
library(ggdendro)

model <- rpart(Kyphosis ~ Age + Number + Start, 
               method = "class", data = kyphosis)
data <- dendro_data(model)
p <- ggplot() + 
      geom_segment(data = data$segments, 
                   aes(x = x, y = y, xend = xend, yend = yend)) + 
      geom_text(data = data$labels, 
                aes(x = x, y = y, label = label), size = 3, vjust = 0) +
      geom_text(data = data$leaf_labels, 
                aes(x = x, y = y, label = label), size = 3, vjust = 1) +
      theme_dendro()

ggplotly(p)

Twins diagrams: agnes and diana

The cluster package allows you to draw agnes and diana diagrams.

library(plotly)
library(ggplot2)
library(cluster)
library(ggdendro)

model <- agnes(votes.repub, metric = "manhattan", stand = TRUE)
dg <- as.dendrogram(model)
p <- ggdendrogram(dg)

ggplotly(p)
library(plotly)
library(ggplot2)
library(cluster)
library(ggdendro)

model <- diana(votes.repub, metric = "manhattan", stand = TRUE)
dg <- as.dendrogram(model)
p <- ggdendrogram(dg)

ggplotly(p)

What About Dash?

Dash for R is an open-source framework for building analytical applications, with no Javascript required, and it is tightly integrated with the Plotly graphing library.

Learn about how to install Dash for R at https://dashr.plot.ly/installation.

Everywhere in this page that you see fig, you can display the same figure in a Dash for R application by passing it to the figure argument of the Graph component from the built-in dashCoreComponents package like this:

library(plotly)

fig <- plot_ly() 
# fig <- fig %>% add_trace( ... )
# fig <- fig %>% layout( ... ) 

library(dash)
library(dashCoreComponents)
library(dashHtmlComponents)

app <- Dash$new()
app$layout(
    htmlDiv(
        list(
            dccGraph(figure=fig) 
        )
     )
)

app$run_server(debug=TRUE, dev_tools_hot_reload=FALSE)