Black Lives Matter. Please consider donating to Black Girls Code today.
https://www.blackgirlscode.com/

# Histograms in Python

How to make Histograms in Python with Plotly.

New to Plotly?

Plotly is a free and open-source graphing library for Python. We recommend you read our Getting Started guide for the latest installation or upgrade instructions, then move on to our Plotly Fundamentals tutorials or dive straight in to some Basic Charts tutorials.

In statistics, a histogram is representation of the distribution of numerical data, where the data are binned and the count for each bin is represented. More generally, in plotly a histogram is an aggregated bar chart, with several possible aggregation functions (e.g. sum, average, count...). Also, the data to be binned can be numerical data but also categorical or date data.

If you're looking instead for bar charts, i.e. representing data with rectangular bar, go to the Bar Chart tutorial.

## Histogram with Plotly Express¶

Plotly Express is the easy-to-use, high-level interface to Plotly, which operates on a variety of types of data and produces easy-to-style figures.

In [1]:
import plotly.express as px
df = px.data.tips()
fig = px.histogram(df, x="total_bill")
fig.show()

In [2]:
import plotly.express as px
df = px.data.tips()
# Here we use a column with categorical data
fig = px.histogram(df, x="day")
fig.show()


#### Choosing the number of bins¶

By default, the number of bins is chosen so that this number is comparable to the typical number of samples in a bin. This number can be customized, as well as the range of values.

In [3]:
import plotly.express as px
df = px.data.tips()
fig = px.histogram(df, x="total_bill", nbins=20)
fig.show()


#### Accessing the counts (y-axis) values¶

JavaScript calculates the y-axis (count) values on the fly in the browser, so it's not accessible in the fig. You can manually calculate it using np.histogram.

In [4]:
import plotly.express as px
import numpy as np

df = px.data.tips()
# create the bins
counts, bins = np.histogram(df.total_bill, bins=range(0, 60, 5))
bins = 0.5 * (bins[:-1] + bins[1:])

fig = px.bar(x=bins, y=counts, labels={'x':'total_bill', 'y':'count'})
fig.show()


#### Type of normalization¶

The default mode is to represent the count of samples in each bin. With the histnorm argument, it is also possible to represent the percentage or fraction of samples in each bin (histnorm='percent' or probability), or a density histogram (the sum of all bar areas equals the total number of sample points, density), or a probability density histogram (the sum of all bar areas equals 1, probability density).

In [5]:
import plotly.express as px
df = px.data.tips()
fig = px.histogram(df, x="total_bill", histnorm='probability density')
fig.show()


#### Aspect of the histogram plot¶

In [6]:
import plotly.express as px
df = px.data.tips()
fig = px.histogram(df, x="total_bill",
title='Histogram of bills',
labels={'total_bill':'total bill'}, # can specify one label per df column
opacity=0.8,
log_y=True, # represent bars with log scale
color_discrete_sequence=['indianred'] # color of histogram bars
)
fig.show()


#### Several histograms for the different values of one column¶

In [7]:
import plotly.express as px
df = px.data.tips()
fig = px.histogram(df, x="total_bill", color="sex")
fig.show()


#### Using histfunc¶

For each bin of x, one can compute a function of data using histfunc. The argument of histfunc is the dataframe column given as the y argument. Below the plot shows that the average tip increases with the total bill.

In [8]:
import plotly.express as px
df = px.data.tips()
fig = px.histogram(df, x="total_bill", y="tip", histfunc='avg')
fig.show()


#### Visualizing the distribution¶

With the marginal keyword, a subplot is drawn alongside the histogram, visualizing the distribution. See the distplot pagefor more examples of combined statistical representations.

In [9]:
import plotly.express as px
df = px.data.tips()
fig = px.histogram(df, x="total_bill", color="sex", marginal="rug", # can be box, violin
hover_data=df.columns)
fig.show()


## Histograms with go.Histogram¶

If Plotly Express does not provide a good starting point, it is also possible to use the more generic go.Histogram class from plotly.graph_objects. All of the available histogram options are described in the histogram section of the reference page: https://plotly.com/python/reference#histogram.

### Basic Histogram¶

In [10]:
import plotly.graph_objects as go

import numpy as np
np.random.seed(1)

x = np.random.randn(500)

fig = go.Figure(data=[go.Histogram(x=x)])
fig.show()


### Normalized Histogram¶

In [11]:
import plotly.graph_objects as go

import numpy as np

x = np.random.randn(500)
fig = go.Figure(data=[go.Histogram(x=x, histnorm='probability')])

fig.show()


### Horizontal Histogram¶

In [12]:
import plotly.graph_objects as go

import numpy as np

y = np.random.randn(500)
# Use y argument instead of x for horizontal histogram

fig = go.Figure(data=[go.Histogram(y=y)])
fig.show()


### Overlaid Histogram¶

In [13]:
import plotly.graph_objects as go

import numpy as np

x0 = np.random.randn(500)
# Add 1 to shift the mean of the Gaussian distribution
x1 = np.random.randn(500) + 1

fig = go.Figure()

# Overlay both histograms
fig.update_layout(barmode='overlay')
# Reduce opacity to see both histograms
fig.update_traces(opacity=0.75)
fig.show()


### Stacked Histograms¶

In [14]:
import plotly.graph_objects as go

import numpy as np

x0 = np.random.randn(2000)
x1 = np.random.randn(2000) + 1

fig = go.Figure()

# The two histograms are drawn on top of another
fig.update_layout(barmode='stack')
fig.show()


### Styled Histogram¶

In [15]:
import plotly.graph_objects as go

import numpy as np
x0 = np.random.randn(500)
x1 = np.random.randn(500) + 1

fig = go.Figure()
x=x0,
histnorm='percent',
name='control', # name used in legend and hover labels
xbins=dict( # bins used for histogram
start=-4.0,
end=3.0,
size=0.5
),
marker_color='#EB89B5',
opacity=0.75
))
x=x1,
histnorm='percent',
name='experimental',
xbins=dict(
start=-3.0,
end=4,
size=0.5
),
marker_color='#330C73',
opacity=0.75
))

fig.update_layout(
title_text='Sampled Results', # title of plot
xaxis_title_text='Value', # xaxis label
yaxis_title_text='Count', # yaxis label
bargap=0.2, # gap between bars of adjacent location coordinates
bargroupgap=0.1 # gap between bars of the same location coordinates
)

fig.show()


### Cumulative Histogram¶

In [16]:
import plotly.graph_objects as go

import numpy as np

x = np.random.randn(500)
fig = go.Figure(data=[go.Histogram(x=x, cumulative_enabled=True)])

fig.show()


### Specify Aggregation Function¶

In [17]:
import plotly.graph_objects as go

x = ["Apples","Apples","Apples","Oranges", "Bananas"]
y = ["5","10","3","10","5"]

fig = go.Figure()

fig.show()


### Custom Binning¶

For custom binning along x-axis, use the attribute nbinsx. Please note that the autobin algorithm will choose a 'nice' round bin size that may result in somewhat fewer than nbinsx total bins. Alternatively, you can set the exact values for xbins along with autobinx = False.

In [18]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

x = ['1970-01-01', '1970-01-01', '1970-02-01', '1970-04-01', '1970-01-02',
'1972-01-31', '1970-02-13', '1971-04-19']

fig = make_subplots(rows=3, cols=2)

trace0 = go.Histogram(x=x, nbinsx=4)
trace1 = go.Histogram(x=x, nbinsx = 8)
trace2 = go.Histogram(x=x, nbinsx=10)
trace3 = go.Histogram(x=x,
xbins=dict(
start='1969-11-15',
end='1972-03-31',
size='M18'), # M18 stands for 18 months
autobinx=False
)
trace4 = go.Histogram(x=x,
xbins=dict(
start='1969-11-15',
end='1972-03-31',
size='M4'), # 4 months bin size
autobinx=False
)
trace5 = go.Histogram(x=x,
xbins=dict(
start='1969-11-15',
end='1972-03-31',
size= 'M2'), # 2 months
autobinx = False
)

fig.append_trace(trace0, 1, 1)
fig.append_trace(trace1, 1, 2)
fig.append_trace(trace2, 2, 1)
fig.append_trace(trace3, 2, 2)
fig.append_trace(trace4, 3, 1)
fig.append_trace(trace5, 3, 2)

fig.show()


If you want to display information about the individual items within each histogram bar, then create a stacked bar chart with hover information as shown below. Note that this is not technically the histogram chart type, but it will have a similar effect as shown below by comparing the output of px.histogram and px.bar. For more information, see the tutorial on bar charts.

In [19]:
import plotly.express as px
df = px.data.tips()
fig1 = px.bar(df, x='day', y='tip', height=300,
title='Stacked Bar Chart - Hover on individual items')
fig2 = px.histogram(df, x='day', y='tip', histfunc='sum', height=300,
title='Histogram Chart')
fig1.show()
fig2.show()


### Share bins between histograms¶

In this example both histograms have a compatible bin settings using bingroup attribute. Note that traces on the same subplot, and with the same barmode ("stack", "relative", "group") are forced into the same bingroup, however traces with barmode = "overlay" and on different axes (of the same axis type) can have compatible bin settings. Histogram and histogram2d trace can share the same bingroup.

In [20]:
import plotly.graph_objects as go
import numpy as np

fig = go.Figure(go.Histogram(
x=np.random.randint(7, size=100),
bingroup=1))

x=np.random.randint(7, size=20),
bingroup=1))

fig.update_layout(
barmode="overlay",
bargap=0.1)

fig.show()


#### Reference¶

Dash is an open-source framework for building analytical applications, with no Javascript required, and it is tightly integrated with the Plotly graphing library.

Learn about how to install Dash at https://dash.plot.ly/installation.

Everywhere in this page that you see fig.show(), you can display the same figure in a Dash application by passing it to the figure argument of the Graph component from the built-in dash_core_components package like this:

import plotly.graph_objects as go # or plotly.express as px
fig = go.Figure() # or any Plotly Express function e.g. px.bar(...)