Histograms in Python

How to make Histograms in Python with Plotly.


New to Plotly?

Plotly is a free and open-source graphing library for Python. We recommend you read our Getting Started guide for the latest installation or upgrade instructions, then move on to our Plotly Fundamentals tutorials or dive straight in to some Basic Charts tutorials.

In statistics, a histogram is representation of the distribution of numerical data, where the data are binned and the count for each bin is represented. More generally, in Plotly a histogram is an aggregated bar chart, with several possible aggregation functions (e.g. sum, average, count...) which can be used to visualize data on categorical and date axes as well as linear axes.

Alternatives to histogram plots for visualizing distributions include violin plots, box plots, ECDF plots and strip charts.

If you're looking instead for bar charts, i.e. representing raw, unaggregated data with rectangular bar, go to the Bar Chart tutorial.

Histograms with Plotly Express

Plotly Express is the easy-to-use, high-level interface to Plotly, which operates on a variety of types of data and produces easy-to-style figures.

In [1]:
import plotly.express as px
df = px.data.tips()
fig = px.histogram(df, x="total_bill")
fig.show()
1020304050051015202530
total_billcount
In [2]:
import plotly.express as px
df = px.data.tips()
# Here we use a column with categorical data
fig = px.histogram(df, x="day")
fig.show()
SunSatThurFri0102030405060708090
daycount

Choosing the number of bins

By default, the number of bins is chosen so that this number is comparable to the typical number of samples in a bin. This number can be customized, as well as the range of values.

In [3]:
import plotly.express as px
df = px.data.tips()
fig = px.histogram(df, x="total_bill", nbins=20)
fig.show()
01020304050010203040506070
total_billcount

Histograms on Date Data

Plotly histograms will automatically bin date data in addition to numerical data:

In [4]:
import plotly.express as px

df = px.data.stocks()
fig = px.histogram(df, x="date")
fig.update_layout(bargap=0.2)
fig.show()
Jan 2018Apr 2018Jul 2018Oct 2018Jan 2019Apr 2019Jul 2019Oct 201902468101214
datecount

Histograms on Categorical Data

Plotly histograms will automatically bin numerical or date data but can also be used on raw categorical data, as in the following example, where the X-axis value is the categorical "day" variable:

In [5]:
import plotly.express as px
df = px.data.tips()
fig = px.histogram(df, x="day", category_orders=dict(day=["Thur", "Fri", "Sat", "Sun"]))
fig.show()
ThurFriSatSun0102030405060708090
daycount

Histograms in Dash

Dash is the best way to build analytical apps in Python using Plotly figures. To run the app below, run pip install dash, click "Download" to get the code and run python app.py.

Get started with the official Dash docs and learn how to effortlessly style & deploy apps like this with Dash Enterprise.

Out[6]:

Sign up for Dash Club → Free cheat sheets plus updates from Chris Parmer and Adam Schroeder delivered to your inbox every two months. Includes tips and tricks, community apps, and deep dives into the Dash architecture. Join now.

Accessing the counts (y-axis) values

JavaScript calculates the y-axis (count) values on the fly in the browser, so it's not accessible in the fig. You can manually calculate it using np.histogram.

In [7]:
import plotly.express as px
import numpy as np

df = px.data.tips()
# create the bins
counts, bins = np.histogram(df.total_bill, bins=range(0, 60, 5))
bins = 0.5 * (bins[:-1] + bins[1:])

fig = px.bar(x=bins, y=counts, labels={'x':'total_bill', 'y':'count'})
fig.show()
01020304050010203040506070
total_billcount

Type of normalization

The default mode is to represent the count of samples in each bin. With the histnorm argument, it is also possible to represent the percentage or fraction of samples in each bin (histnorm='percent' or probability), or a density histogram (the sum of all bar areas equals the total number of sample points, density), or a probability density histogram (the sum of all bar areas equals 1, probability density).

In [8]:
import plotly.express as px
df = px.data.tips()
fig = px.histogram(df, x="total_bill", histnorm='probability density')
fig.show()
102030405000.010.020.030.040.050.06
total_billprobability density

Aspect of the histogram plot

In [9]:
import plotly.express as px
df = px.data.tips()
fig = px.histogram(df, x="total_bill",
                   title='Histogram of bills',
                   labels={'total_bill':'total bill'}, # can specify one label per df column
                   opacity=0.8,
                   log_y=True, # represent bars with log scale
                   color_discrete_sequence=['indianred'] # color of histogram bars
                   )
fig.show()
102030405091234567891023
Histogram of billstotal billcount

Several histograms for the different values of one column

In [10]:
import plotly.express as px
df = px.data.tips()
fig = px.histogram(df, x="total_bill", color="sex")
fig.show()
1020304050051015202530
sexFemaleMaletotal_billcount

Aggregating with other functions than count

For each bin of x, one can compute a function of data using histfunc. The argument of histfunc is the dataframe column given as the y argument. Below the plot shows that the average tip increases with the total bill.

In [11]:
import plotly.express as px
df = px.data.tips()
fig = px.histogram(df, x="total_bill", y="tip", histfunc='avg')
fig.show()
10203040500246810
total_billavg of tip

The default histfunc is sum if y is given, and works with categorical as well as binned numeric data on the x axis:

In [12]:
import plotly.express as px
df = px.data.tips()
fig = px.histogram(df, x="day", y="total_bill", category_orders=dict(day=["Thur", "Fri", "Sat", "Sun"]))
fig.show()
ThurFriSatSun020040060080010001200140016001800
daysum of total_bill

New in v5.0

Histograms afford the use of patterns (also known as hatching or texture) in addition to color:

In [13]:
import plotly.express as px

df = px.data.tips()
fig = px.histogram(df, x="sex", y="total_bill", color="sex", pattern_shape="smoker")
fig.show()
FemaleMale050010001500200025003000
sex, smokerFemale, NoFemale, YesMale, NoMale, Yessexsum of total_bill

Visualizing the distribution

With the marginal keyword, a marginal is drawn alongside the histogram, visualizing the distribution. See the distplot page for more examples of combined statistical representations.

In [14]:
import plotly.express as px
df = px.data.tips()
fig = px.histogram(df, x="total_bill", color="sex", marginal="rug", # can be `box`, `violin`
                         hover_data=df.columns)
fig.show()
1020304050051015202530
sexFemaleMaletotal_billcount

Adding text labels

New in v5.5

You can add text to histogram bars using the text_auto argument. Setting it to True will display the values on the bars, and setting it to a d3-format formatting string will control the output format.

In [15]:
import plotly.express as px
df = px.data.tips()
fig = px.histogram(df, x="total_bill", y="tip", histfunc="avg", nbins=8, text_auto=True)
fig.show()
1.8376472.4552313.6076924.1909094.941001020304050600246810
total_billavg of tip

Histograms with go.Histogram

If Plotly Express does not provide a good starting point, it is also possible to use the more generic go.Histogram class from plotly.graph_objects. All of the available histogram options are described in the histogram section of the reference page: https://plotly.com/python/reference#histogram.

Basic Histogram

In [16]:
import plotly.graph_objects as go

import numpy as np
np.random.seed(1)

x = np.random.randn(500)

fig = go.Figure(data=[go.Histogram(x=x)])
fig.show()
−2−10123051015202530354045

Normalized Histogram

In [17]:
import plotly.graph_objects as go

import numpy as np

x = np.random.randn(500)
fig = go.Figure(data=[go.Histogram(x=x, histnorm='probability')])

fig.show()
−3−2−10123400.020.040.060.080.1

Horizontal Histogram

In [18]:
import plotly.graph_objects as go

import numpy as np

y = np.random.randn(500)
# Use `y` argument instead of `x` for horizontal histogram

fig = go.Figure(data=[go.Histogram(y=y)])
fig.show()
051015202530354045−3−2−10123

Overlaid Histogram

In [19]:
import plotly.graph_objects as go

import numpy as np

x0 = np.random.randn(500)
# Add 1 to shift the mean of the Gaussian distribution
x1 = np.random.randn(500) + 1

fig = go.Figure()
fig.add_trace(go.Histogram(x=x0))
fig.add_trace(go.Histogram(x=x1))

# Overlay both histograms
fig.update_layout(barmode='overlay')
# Reduce opacity to see both histograms
fig.update_traces(opacity=0.75)
fig.show()
−2−101234051015202530354045
trace 0trace 1

Stacked Histograms

In [20]:
import plotly.graph_objects as go

import numpy as np

x0 = np.random.randn(2000)
x1 = np.random.randn(2000) + 1

fig = go.Figure()
fig.add_trace(go.Histogram(x=x0))
fig.add_trace(go.Histogram(x=x1))

# The two histograms are drawn on top of another
fig.update_layout(barmode='stack')
fig.show()
−3−2−101234020406080100120140160
trace 1trace 0

Styled Histogram

In [21]:
import plotly.graph_objects as go

import numpy as np
x0 = np.random.randn(500)
x1 = np.random.randn(500) + 1

fig = go.Figure()
fig.add_trace(go.Histogram(
    x=x0,
    histnorm='percent',
    name='control', # name used in legend and hover labels
    xbins=dict( # bins used for histogram
        start=-4.0,
        end=3.0,
        size=0.5
    ),
    marker_color='#EB89B5',
    opacity=0.75
))
fig.add_trace(go.Histogram(
    x=x1,
    histnorm='percent',
    name='experimental',
    xbins=dict(
        start=-3.0,
        end=4,
        size=0.5
    ),
    marker_color='#330C73',
    opacity=0.75
))

fig.update_layout(
    title_text='Sampled Results', # title of plot
    xaxis_title_text='Value', # xaxis label
    yaxis_title_text='Count', # yaxis label
    bargap=0.2, # gap between bars of adjacent location coordinates
    bargroupgap=0.1 # gap between bars of the same location coordinates
)

fig.show()
−3−2−10123405101520
controlexperimentalSampled ResultsValueCount

Histogram Bar Text

You can add text to histogram bars using the texttemplate argument. In this example we add the x-axis values as text following the format %{variable}. We also adjust the size of the text using textfont_size.

In [22]:
import plotly.graph_objects as go

numbers = ["5", "10", "3", "10", "5", "8", "5", "5"]

fig = go.Figure()
fig.add_trace(go.Histogram(x=numbers, name="count", texttemplate="%{x}", textfont_size=20))

fig.show()
510385103800.511.522.533.54

Cumulative Histogram

In [23]:
import plotly.graph_objects as go

import numpy as np

x = np.random.randn(500)
fig = go.Figure(data=[go.Histogram(x=x, cumulative_enabled=True)])

fig.show()
−3−2−1012340100200300400500

Specify Aggregation Function

In [24]:
import plotly.graph_objects as go

x = ["Apples","Apples","Apples","Oranges", "Bananas"]
y = ["5","10","3","10","5"]

fig = go.Figure()
fig.add_trace(go.Histogram(histfunc="count", y=y, x=x, name="count"))
fig.add_trace(go.Histogram(histfunc="sum", y=y, x=x, name="sum"))

fig.show()
ApplesOrangesBananas024681012141618
countsum

Custom Binning

For custom binning along x-axis, use the attribute nbinsx. Please note that the autobin algorithm will choose a 'nice' round bin size that may result in somewhat fewer than nbinsx total bins. Alternatively, you can set the exact values for xbins along with autobinx = False.

In [25]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

x = ['1970-01-01', '1970-01-01', '1970-02-01', '1970-04-01', '1970-01-02',
     '1972-01-31', '1970-02-13', '1971-04-19']

fig = make_subplots(rows=3, cols=2)

trace0 = go.Histogram(x=x, nbinsx=4)
trace1 = go.Histogram(x=x, nbinsx = 8)
trace2 = go.Histogram(x=x, nbinsx=10)
trace3 = go.Histogram(x=x,
                      xbins=dict(
                      start='1969-11-15',
                      end='1972-03-31',
                      size='M18'), # M18 stands for 18 months
                      autobinx=False
                     )
trace4 = go.Histogram(x=x,
                      xbins=dict(
                      start='1969-11-15',
                      end='1972-03-31',
                      size='M4'), # 4 months bin size
                      autobinx=False
                      )
trace5 = go.Histogram(x=x,
                      xbins=dict(
                      start='1969-11-15',
                      end='1972-03-31',
                      size= 'M2'), # 2 months
                      autobinx = False
                      )

fig.append_trace(trace0, 1, 1)
fig.append_trace(trace1, 1, 2)
fig.append_trace(trace2, 2, 1)
fig.append_trace(trace3, 2, 2)
fig.append_trace(trace4, 3, 1)
fig.append_trace(trace5, 3, 2)

fig.show()
1970197119720246Jan 1970Jul 1970Jan 1971Jul 1971Jan 19720246Jan 1970Jul 1970Jan 1971Jul 1971Jan 19720241970197119720246Jan 1970Jul 1970Jan 1971Jul 1971Jan 1972024Jan 1970Jul 1970Jan 1971Jul 1971Jan 19720123
trace 0trace 1trace 2trace 3trace 4trace 5

See also: Bar Charts

If you want to display information about the individual items within each histogram bar, then create a stacked bar chart with hover information as shown below. Note that this is not technically the histogram chart type, but it will have a similar effect as shown below by comparing the output of px.histogram and px.bar. For more information, see the tutorial on bar charts.

In [26]:
import plotly.express as px
df = px.data.tips()
fig1 = px.bar(df, x='day', y='tip', height=300,
              title='Stacked Bar Chart - Hover on individual items')
fig2 = px.histogram(df, x='day', y='tip', histfunc='sum', height=300,
                    title='Histogram Chart')
fig1.show()
fig2.show()
SunSatThurFri0100200
Stacked Bar Chart - Hover on individual itemsdaytip
SunSatThurFri0100200
Histogram Chartdaysum of tip

Share bins between histograms

In this example both histograms have a compatible bin settings using bingroup attribute. Note that traces on the same subplot, and with the same barmode ("stack", "relative", "group") are forced into the same bingroup, however traces with barmode = "overlay" and on different axes (of the same axis type) can have compatible bin settings. Histogram and histogram2d trace can share the same bingroup.

In [27]:
import plotly.graph_objects as go
import numpy as np

fig = go.Figure(go.Histogram(
    x=np.random.randint(7, size=100),
    bingroup=1))

fig.add_trace(go.Histogram(
    x=np.random.randint(7, size=20),
    bingroup=1))

fig.update_layout(
    barmode="overlay",
    bargap=0.1)

fig.show()
012345605101520
trace 0trace 1

Sort Histogram by Category Order

Histogram bars can also be sorted based on the ordering logic of the categorical values using the categoryorder attribute of the x-axis. Sorting of histogram bars using categoryorder also works with multiple traces on the same x-axis. In the following examples, the histogram bars are sorted based on the total numerical values.

In [28]:
import plotly.express as px

df = px.data.tips()
fig = px.histogram(df, x="day").update_xaxes(categoryorder='total ascending')
fig.show()
FriThurSunSat0102030405060708090
daycount
In [29]:
import plotly.express as px

df = px.data.tips()
fig = px.histogram(df, x="day", color="smoker").update_xaxes(categoryorder='total descending')
fig.show()
SatSunThurFri0102030405060708090
smokerNoYesdaycount

Reference

See function reference for px.histogram() or https://plotly.com/python/reference/histogram/ for more information and chart attribute options!

What About Dash?

Dash is an open-source framework for building analytical applications, with no Javascript required, and it is tightly integrated with the Plotly graphing library.

Learn about how to install Dash at https://dash.plot.ly/installation.

Everywhere in this page that you see fig.show(), you can display the same figure in a Dash application by passing it to the figure argument of the Graph component from the built-in dash_core_components package like this:

import plotly.graph_objects as go # or plotly.express as px
fig = go.Figure() # or any Plotly Express function e.g. px.bar(...)
# fig.add_trace( ... )
# fig.update_layout( ... )

from dash import Dash, dcc, html

app = Dash()
app.layout = html.Div([
    dcc.Graph(figure=fig)
])

app.run_server(debug=True, use_reloader=False)  # Turn off reloader if inside Jupyter