Plotly Express works with Column-oriented, Matrix or Geographic Data¶

Plotly Express is the easy-to-use, high-level interface to Plotly, which operates on a variety of types of data and produces easy-to-style figures.

Plotly Express provides functions to visualize a variety of types of data. Most functions such as px.bar or px.scatter expect to operate on column-oriented data of the type you might store in a DataFrame (in either "long" or "wide" format, see below). These functions use Pandas internally to process the data, but also accept other types of DataFrames as arguments. See the Input Data as Non-Pandas DataFrames section below for more details.

px.imshow operates on matrix-like data you might store in a numpy or xarray array and functions like px.choropleth and px.choropleth_mapbox can operate on geographic data of the kind you might store in a GeoPandas GeoDataFrame. This page details how to provide column-oriented data to most Plotly Express functions.

Plotly Express works with Long-, Wide-, and Mixed-Form Data¶

Until version 4.8, Plotly Express only operated on long-form (previously called "tidy") data, but now accepts wide-form and mixed-form data as well.

There are three common conventions for storing column-oriented data, usually in a data frame with column names:

long-form data has one row per observation, and one column per variable. This is suitable for storing and displaying multivariate data i.e. with dimension greater than 2. This format is sometimes called "tidy".
wide-form data has one row per value of one of the first variable, and one column per value of the second variable. This is suitable for storing and displaying 2-dimensional data.
mixed-form data is a hybrid of long-form and wide-form data, with one row per value of one variable, and some columns representing values of another, and some columns representing more variables. See the wide-form documentation for examples of how to use Plotly Express to visualize this kind of data.

Every Plotly Express function can operate on long-form data (other than px.imshow which operates only on wide-form input), and in addition, the following 2D-Cartesian functions can operate on wide-form and mixed-form data: px.scatter, px.line, px.area, px.bar, px.histogram, px.violin, px.box, px.strip, px.funnel, px.density_heatmap and px.density_contour.

By way of example here is the same data, represented in long-form first, and then in wide-form:

In [1]:

import plotly.express as px
long_df = px.data.medals_long()
long_df

Out[1]:

	nation	medal	count
0	South Korea	gold	24
1	China	gold	10
2	Canada	gold	9
3	South Korea	silver	13
4	China	silver	15
5	Canada	silver	12
6	South Korea	bronze	11
7	China	bronze	8
8	Canada	bronze	12

In [2]:

import plotly.express as px
wide_df = px.data.medals_wide()
wide_df

Out[2]:

	nation	gold	silver	bronze
0	South Korea	24	13	11
1	China	10	15	8
2	Canada	9	12	12

Plotly Express can produce the same plot from either form:

In [3]:

import plotly.express as px
long_df = px.data.medals_long()

fig = px.bar(long_df, x="nation", y="count", color="medal", title="Long-Form Input")
fig.show()

In [4]:

import plotly.express as px
wide_df = px.data.medals_wide()

fig = px.bar(wide_df, x="nation", y=["gold", "silver", "bronze"], title="Wide-Form Input")
fig.show()

You might notice that y-axis and legend labels are slightly different for the second plot: they are "value" and "variable", respectively, and this is also reflected in the hoverlabel text. This is because Plotly Express performed an internal Pandas melt() operation to convert the wide-form data into long-form for plotting, and used the Pandas convention for assign column names to the intermediate long-form data. Note that the labels "medal" and "count" do not appear in the wide-form data frame, so in this case, you must supply these yourself, or you can use a data frame with named row- and column-indexes. You can rename these labels with the labels argument:

In [5]:

import plotly.express as px
wide_df = px.data.medals_wide()

fig = px.bar(wide_df, x="nation", y=["gold", "silver", "bronze"], title="Wide-Form Input, relabelled",
            labels={"value": "count", "variable": "medal"})
fig.show()

Many more examples of wide-form and messy data input can be found in our detailed wide-form support documentation.

Input Data as Pandas `DataFrame`s¶

As shown above, px functions supports natively pandas DataFrame. Arguments can either be passed as dataframe columns, or as column names if the data_frame argument is provided.

Passing columns as arguments¶

In [6]:

import plotly.express as px
df = px.data.iris()
# Use directly Columns as argument. You can use tab completion for this!
fig = px.scatter(df, x=df.sepal_length, y=df.sepal_width, color=df.species, size=df.petal_length)
fig.show()

Passing name strings as arguments¶

In [7]:

import plotly.express as px
df = px.data.iris()
# Use column names instead. This is the same chart as above.
fig = px.scatter(df, x='sepal_length', y='sepal_width', color='species', size='petal_length')
fig.show()

Using the index of a DataFrame¶

In addition to columns, it is also possible to pass the index of a DataFrame as argument. In the example below the index is displayed in the hover data.

In [8]:

import plotly.express as px
df = px.data.iris()
fig = px.scatter(df, x=df.sepal_length, y=df.sepal_width, size=df.petal_length,
                 hover_data=[df.index])
fig.show()

Columns not in the `data_frame` argument¶

In the addition to columns from the data_frame argument, one may also pass columns from a different DataFrame, as long as all columns have the same length. It is also possible to pass columns without passing the data_frame argument.

However, column names are used only if they correspond to columns in the data_frame argument, in other cases, the name of the keyword argument is used. As explained below, the labels argument can be used to set names.

In [9]:

import plotly.express as px
import pandas as pd
df1 = pd.DataFrame(dict(time=[10, 20, 30], sales=[10, 8, 30]))
df2 = pd.DataFrame(dict(market=[4, 2, 5]))
fig = px.bar(df1, x=df1.time, y=df2.market, color=df1.sales)
fig.show()

Using labels to pass names¶

The labels argument can be used to override the names used for axis titles, legend entries and hovers.

In [10]:

import plotly.express as px
import pandas as pd

df = px.data.gapminder()
gdp = df['pop'] * df['gdpPercap']
fig = px.bar(df, x='year', y=gdp, color='continent', labels={'y':'gdp'},
             hover_data=['country'],
             title='Evolution of world GDP')
fig.show()

Input Data as Non-Pandas `DataFrame`s¶

New in 5.15

In the examples above, we've used Pandas DataFrames. You can also provide another type of DataFrame to the data_frame argument if that DataFrame has a to_pandas method, for example, a Polars DataFrame.

Plotly Express uses Pandas internally to process the data. When you provide a Non-Pandas DataFrame to the data_frame argument of a Plotly Express function, the entire DataFrame is converted to a Pandas DataFrame.

In this example, we use a Polars DataFrame. If you are using Polars, you'll need to install pyarrow, which is used by its to_pandas method

In [11]:

import polars as pl
import plotly.express as px

wide_df = pl.DataFrame(
    {
        "nation": ["South Korea", "China", "Canada"],
        "gold": [24, 10, 9],
        "silver": [13, 15, 12],
        "bronze": [9, 12, 12],
    }
)

fig = px.bar(wide_df, x="nation", y=["gold", "silver", "bronze"], title="Wide-Form Input")
fig.show()

New in 5.16

As of version 5.16, you can also provide another type of DataFrame to the data_frame argument if that DataFrame supports the Python dataframe interchange protocol, or has a toPandas or to_pandas_df method.

Even if the DataFrame that you are using supports the Python dataframe interchange protocol, you'll need to have Pandas version 2.0.3 or later installed. If you are using an earlier version of Pandas, Plotly Express will look for a to_pandas, toPandas, and to_pandas_df method, and use whichever one is available.

Input Data as array-like columns: NumPy arrays, lists...¶

px arguments can also be array-like objects such as lists, NumPy arrays, in both long-form or wide-form (for certain functions).

In [12]:

import plotly.express as px

# List arguments
fig = px.line(x=[1, 2, 3, 4], y=[3, 5, 4, 8])
fig.show()

In [13]:

import numpy as np
import plotly.express as px

t = np.linspace(0, 10, 100)
# NumPy arrays arguments
fig = px.scatter(x=t, y=np.sin(t), labels={'x':'t', 'y':'sin(t)'}) # override keyword names with labels
fig.show()

List arguments can also be passed in as a list of lists, which triggers wide-form data processing, with the downside that the resulting traces will need to be manually renamed via fig.data[<n>].name = "name".

In [14]:

import plotly.express as px

# List arguments in wide form
series1 = [3, 5, 4, 8]
series2 = [5, 4, 8, 3]
fig = px.line(x=[1, 2, 3, 4], y=[series1, series2])
fig.show()

Passing dictionaries or array-likes as the data_frame argument¶

The column-based argument data_frame can also be passed with a dict or array. Using a dictionary can be a convenient way to pass column names used in axis titles, legend entries and hovers without creating a pandas DataFrame.

In [15]:

import plotly.express as px
import numpy as np
N = 10000
np.random.seed(0)
fig = px.density_contour(dict(effect_size=5 + np.random.randn(N),
                              waiting_time=np.random.poisson(size=N)),
                         x="effect_size", y="waiting_time")
fig.show()

Integer column names¶

When the data_frame argument is a NumPy array, column names are integer corresponding to the columns of the array. In this case, keyword names are used in axis, legend and hovers. This is also the case for a pandas DataFrame with integer column names. Use the labels argument to override these names.

In [16]:

import numpy as np
import plotly.express as px

ar = np.arange(100).reshape((10, 10))
fig = px.scatter(ar, x=2, y=6, size=1, color=5)
fig.show()

Mixing dataframes and other types¶

It is possible to mix DataFrame columns, NumPy arrays and lists as arguments. Remember that the only column names to be used correspond to columns in the data_frame argument, use labels to override names displayed in axis titles, legend entries or hovers.

In [17]:

import plotly.express as px
import numpy as np
import pandas as pd

df = px.data.gapminder()
gdp = np.log(df['pop'] * df['gdpPercap'])  # NumPy array
fig = px.bar(df, x='year', y=gdp, color='continent', labels={'y':'log gdp'},
             hover_data=['country'],
             title='Evolution of world GDP')
fig.show()

What About Dash?¶

Dash is an open-source framework for building analytical applications, with no Javascript required, and it is tightly integrated with the Plotly graphing library.

Learn about how to install Dash at https://dash.plot.ly/installation.

Everywhere in this page that you see fig.show(), you can display the same figure in a Dash application by passing it to the figure argument of the Graph component from the built-in dash_core_components package like this:

import plotly.graph_objects as go # or plotly.express as px
fig = go.Figure() # or any Plotly Express function e.g. px.bar(...)
# fig.add_trace( ... )
# fig.update_layout( ... )

from dash import Dash, dcc, html

app = Dash()
app.layout = html.Div([
    dcc.Graph(figure=fig)
])

app.run_server(debug=True, use_reloader=False)  # Turn off reloader if inside Jupyter