Box Plots in Julia

How to make Box Plots in Julia with Plotly.


A box plot is a statistical representation of numerical data through their quartiles. The ends of the box represent the lower and upper quartiles, while the median (second quartile) is marked by a line inside the box. For other statistical representations of numerical data, see other statistical charts.

Box Plot

In a box plot, the distribution of the column given as y argument is represented.

using PlotlyJS, CSV, DataFrames
df = dataset(DataFrame, "tips")
plot(df, y=:total_bill, kind="box")

If a column name is given as x argument, a box plot is drawn for each value of x.

using PlotlyJS, CSV, DataFrames
df = dataset(DataFrame, "tips")
plot(df, x=:time, y=:total_bill, kind="box")

Display the underlying data

With the boxpoints argument, display underlying data points with either all points ("all"), outliers only ("outliers", default), or none of them (false).

using PlotlyJS, CSV, DataFrames
df = dataset(DataFrame, "tips")
plot(df, x=:time, y=:total_bill, boxpoints="all", kind="box")

Choosing The Algorithm For Computing Quartiles

By default, quartiles for box plots are computed using the linear method (for more about linear interpolation, see #10 listed on http://www.amstat.org/publications/jse/v14n3/langford.html and https://en.wikipedia.org/wiki/Quartile for more details).

However, you can also choose to use an exclusive or an inclusive algorithm to compute quartiles.

The exclusive algorithm uses the median to divide the ordered dataset into two halves. If the sample is odd, it does not include the median in either half. Q1 is then the median of the lower half and Q3 is the median of the upper half.

The inclusive algorithm also uses the median to divide the ordered dataset into two halves, but if the sample is odd, it includes the median in both halves. Q1 is then the median of the lower half and Q3 the median of the upper half.

using PlotlyJS, CSV, DataFrames
df = dataset(DataFrame, "tips")

plot(
    df,
    x=:day, y=:total_bill, color=:smoker, quartilemethod="exclusive", kind="box",
    Layout(boxmode="group")
)

Difference Between Quartile Algorithms

It can sometimes be difficult to see the difference between the linear, inclusive, and exclusive algorithms for computing quartiles. In the following example, the same dataset is visualized using each of the three different quartile computation algorithms.

using PlotlyJS

data = [1,2,3,4,5,6,7,8,9]
trace1 = box(y=data, boxpoints="all", quartilemethod="linear", name="linear")
trace2 = box(y=data, boxpoints="all", quartilemethod="inclusive", name="inclusive")
trace3 = box(y=data, boxpoints="all", quartilemethod="exclusive", name="exclusive")

plot([trace1, trace2, trace3])

Styled box plot

For the interpretation of the notches, see https://en.wikipedia.org/wiki/Box_plot#Variations.

using PlotlyJS, CSV, DataFrames
df = dataset(DataFrame, "tips")
plot(
    df, kind="box",
    x=:time, y=:total_bill, color=:smoker, notched=true,
    hovertext=:day,
    Layout(title="Box plot of total bill", boxmode="group")
)

Basic Horizontal Box Plot

using PlotlyJS

# Use x instead of y argument for horizontal plot
plot([box(x=rand(50)), box(x=rand(50).+ 1)])

Box Plot With Precomputed Quartiles

You can specify precomputed quartile attributes rather than using a built-in quartile computation algorithm.

This could be useful if you have already pre-computed those values or if you need to use a different algorithm than the ones provided.

using PlotlyJS

plot(box(
    y=[0:9 0:9 0:9],
    name="Precompiled Quartiles",
    q1=[1, 2, 3],
    median=[4, 5, 6],
    q3=[7, 8, 9],
    lowerfence=[-1, 0, 1],
    upperfence=[5, 6, 7],
    mean=[2.2, 2.8, 3.2],
    sd=[0.2, 0.4, 0.6],
    notchspan=[0.2, 0.4, 0.6]
))

Colored Box Plot

using PlotlyJS
trace1 = box(y=rand(50), marker_color="indianred", name="Sample A")
trace2 = box(y=rand(50), marker_color="lightseagreen", name="Sample B")
plot([trace1, trace2])

Box Plot Styling Mean & Standard Deviation

using PlotlyJS

trace1 = box(
    y=randn(20),
    name="Only Mean",
    marker_color="darkblue",
    boxmean=true # represent mean
)
trace2 = box(
    y=randn(20),
    name="Mean & SD",
    marker_color="royalblue",
    boxmean="sd" # represent mean and standard deviation
)
plot([trace1, trace2])

Styling Outliers

The example below shows how to use the boxpoints argument. If "outliers", only the sample points lying outside the whiskers are shown. If "suspectedoutliers", the outlier points are shown and points either less than 4Q1-3Q3 or greater than 4Q3-3Q1 are highlighted (using outliercolor). If "all", all sample points are shown. If False, only the boxes are shown with no sample points.

using PlotlyJS

trace1 = box(
    y=[0.75, 5.25, 5.5, 6, 6.2, 6.6, 6.80, 7.0, 7.2, 7.5, 7.5, 7.75, 8.15,
       8.15, 8.65, 8.93, 9.2, 9.5, 10, 10.25, 11.5, 12, 16, 20.90, 22.3, 23.25],
    name="All Points",
    jitter=0.3,
    pointpos=-1.8,
    boxpoints="all", # represent all points
    marker_color="rgb(7,40,89)",
    line_color="rgb(7,40,89)"
)

trace2 = box(
    y=[0.75, 5.25, 5.5, 6, 6.2, 6.6, 6.80, 7.0, 7.2, 7.5, 7.5, 7.75, 8.15,
        8.15, 8.65, 8.93, 9.2, 9.5, 10, 10.25, 11.5, 12, 16, 20.90, 22.3, 23.25],
    name="Only Whiskers",
    boxpoints=false, # no data points
    marker_color="rgb(9,56,125)",
    line_color="rgb(9,56,125)"
)

trace3 = box(
    y=[0.75, 5.25, 5.5, 6, 6.2, 6.6, 6.80, 7.0, 7.2, 7.5, 7.5, 7.75, 8.15,
        8.15, 8.65, 8.93, 9.2, 9.5, 10, 10.25, 11.5, 12, 16, 20.90, 22.3, 23.25],
    name="Suspected Outliers",
    boxpoints="suspectedoutliers", # only suspected outliers
    marker=attr(
        color="rgb(8,81,156)",
        outliercolor="rgba(219, 64, 82, 0.6)",
        line=attr(
            outliercolor="rgba(219, 64, 82, 0.6)",
            outlierwidth=2)),
    line_color="rgb(8,81,156)"
)

trace4 = box(
    y=[0.75, 5.25, 5.5, 6, 6.2, 6.6, 6.80, 7.0, 7.2, 7.5, 7.5, 7.75, 8.15,
        8.15, 8.65, 8.93, 9.2, 9.5, 10, 10.25, 11.5, 12, 16, 20.90, 22.3, 23.25],
    name="Whiskers and Outliers",
    boxpoints="outliers", # only outliers
    marker_color="rgb(107,174,214)",
    line_color="rgb(107,174,214)"
)

plot([trace1, trace2, trace3, trace4], Layout(title="Box Plot Styling Outliers"))

Grouped Box Plots

using PlotlyJS

x = ["day 1", "day 1", "day 1", "day 1", "day 1", "day 1",
     "day 2", "day 2", "day 2", "day 2", "day 2", "day 2"]


trace1 = box(
    y=[0.2, 0.2, 0.6, 1.0, 0.5, 0.4, 0.2, 0.7, 0.9, 0.1, 0.5, 0.3],
    x=x,
    name="kale",
    marker_color="#3D9970"
)
trace2 = box(
    y=[0.6, 0.7, 0.3, 0.6, 0.0, 0.5, 0.7, 0.9, 0.5, 0.8, 0.7, 0.2],
    x=x,
    name="radishes",
    marker_color="#FF4136"
)
trace3 = box(
    y=[0.1, 0.3, 0.1, 0.9, 0.6, 0.6, 0.9, 1.0, 0.3, 0.6, 0.8, 0.5],
    x=x,
    name="carrots",
    marker_color="#FF851B"
)

plot([trace1, trace2, trace3], Layout(yaxis_title="normalized moisture", boxmode="group"))

Grouped Horizontal Box Plot

using PlotlyJS

y = ["day 1", "day 1", "day 1", "day 1", "day 1", "day 1",
     "day 2", "day 2", "day 2", "day 2", "day 2", "day 2"]

trace1 = box(
    x=[0.2, 0.2, 0.6, 1.0, 0.5, 0.4, 0.2, 0.7, 0.9, 0.1, 0.5, 0.3],
    y=y,
    name="kale",
    marker_color="#3D9970",
    orientation="h"
)
trace2 = box(
    x=[0.6, 0.7, 0.3, 0.6, 0.0, 0.5, 0.7, 0.9, 0.5, 0.8, 0.7, 0.2],
    y=y,
    name="radishes",
    marker_color="#FF4136",
    orientation="h"
)
trace3 = box(
    x=[0.1, 0.3, 0.1, 0.9, 0.6, 0.6, 0.9, 1.0, 0.3, 0.6, 0.8, 0.5],
    y=y,
    name="carrots",
    marker_color="#FF851B",
    orientation="h"
)

plot([trace1, trace2, trace3], Layout(boxmode="group", xaxis=attr(title="normalized moisture")))

Rainbow Box Plots

using PlotlyJS

N = 30     # Number of boxes

# generate an array of rainbow colors by fixing the saturation and lightness of the HSL
# representation of colour and marching around the hue.
# Plotly accepts any CSS color format, see e.g. http://www.w3schools.com/cssref/css_colors_legal.asp.
c = ["hsl($(h), 50%, 50%)" for h in range(0, stop=360, length=N)]

# Each box is represented by a dict that contains the data, the type, and the colour.
# Use list comprehension to describe N boxes, each with a different colour and with different randomly generated data:
traces = [box(
    y=[j * (3.5 * sin(pi * i/N) + i/N + (1.5 + 0.5 * cos(pi*i/N))) for j in [1:1:10;]],
    marker_color=c[i]
) for i in [1:1:N;]]
layout = Layout(
    paper_bgcolor="rgb(233,233,233)",
    plot_bgcolor="rgb(233,233,233)",
    xaxis=attr(
        showgrid=false,
        zeroline=false,
        showticklabels=false
    ),
    yaxis=attr(
        zeroline=false, gridcolor="white"
    )
)
plot(traces, layout)

Fully Styled Box Plots

using PlotlyJS

x_data = ["Carmelo Anthony", "Dwyane Wade",
          "Deron Williams", "Brook Lopez",
          "Damian Lillard", "David West",]

N = 50

y0 = [(10 * i + 30) for i in rand(N)]
y1 = [(13 * i + 38) for i in rand(N)]
y2 = [(11 * i + 33) for i in rand(N)]
y3 = [(9 * i + 36) for i in rand(N)]
y4 = [(15 * i + 31) for i in rand(N)]
y5 = [(12 * i + 40) for i in rand(N)]

y_data = [y0, y1, y2, y3, y4, y5]

color_vec = ["rgba(93, 164, 214, 0.5)", "rgba(255, 144, 14, 0.5)", "rgba(44, 160, 101, 0.5)",
          "rgba(255, 65, 54, 0.5)", "rgba(207, 114, 255, 0.5)", "rgba(127, 96, 0, 0.5)"]


traces = [
    box(
        y=yd,
        name=xd,
        boxpoints="all",
        jitter=0.5,
        whiskerwidth=0.2,
        fillcolor=cls,
        marker_size=2,
        line_width=1
    )
    for (xd, yd, cls) in zip(x_data, y_data, color_vec)
]

layout = Layout(
    title="Points scored by the Top 9 Scoring NBA Players in 2012",
    yaxis=attr(
        autorange=true,
        showgrid=true,
        zerline=true,
        dtick=5,
        gridcolor="rgb(255,255,255)",
        zerlinecolor="rgb(255,255,255)",
        zerolinewidth=2
    ),
    margin=attr(
        l=40,
        r=30,
        b=80,
        t=100
    ),
    paper_bgcolor="rgb(243,243,243)",
    plot_bgcolor="rgb(243,243,243)",
    showlegend=false
)

plot(traces, layout)

Reference

See https://plotly.com/julia/reference/box/ for more information and chart attribute options!