Scatter Plots in Python

How to make scatter plots in Python with Plotly.


New to Plotly?

Plotly is a free and open-source graphing library for Python. We recommend you read our Getting Started guide for the latest installation or upgrade instructions, then move on to our Plotly Fundamentals tutorials or dive straight in to some Basic Charts tutorials.

Scatter plots with Plotly Express

Plotly Express is the easy-to-use, high-level interface to Plotly, which operates on a variety of types of data and produces easy-to-style figures.

With px.scatter, each data point is represented as a marker point, whose location is given by the x and y columns.

In [1]:
# x and y given as array_like objects
import plotly.express as px
fig = px.scatter(x=[0, 1, 2, 3, 4], y=[0, 1, 4, 9, 16])
fig.show()
00.511.522.533.540246810121416
xy
In [2]:
# x and y given as DataFrame columns
import plotly.express as px
df = px.data.iris() # iris is a pandas DataFrame
fig = px.scatter(df, x="sepal_width", y="sepal_length")
fig.show()
22.533.544.54.555.566.577.58
sepal_widthsepal_length

Setting size and color with column names

Scatter plots with variable-sized circular markers are often known as bubble charts. Note that color and size data are added to hover information. You can add other columns to hover data with the hover_data argument of px.scatter.

In [3]:
import plotly.express as px
df = px.data.iris()
fig = px.scatter(df, x="sepal_width", y="sepal_length", color="species",
                 size='petal_length', hover_data=['petal_width'])
fig.show()
22.533.544.54.555.566.577.58
speciessetosaversicolorvirginicasepal_widthsepal_length

Color can be continuous as follows, or discrete/categorical as above.

In [4]:
import plotly.express as px
df = px.data.iris()
fig = px.scatter(df, x="sepal_width", y="sepal_length", color='petal_length')
fig.show()
22.533.544.54.555.566.577.58
123456petal_lengthsepal_widthsepal_length

The symbol argument can be mapped to a column as well. A wide variety of symbols are available.

In [5]:
import plotly.express as px
df = px.data.iris()
fig = px.scatter(df, x="sepal_width", y="sepal_length", color="species", symbol="species")
fig.show()
22.533.544.54.555.566.577.58
speciessetosaversicolorvirginicasepal_widthsepal_length

Scatter plots in Dash

Dash is the best way to build analytical apps in Python using Plotly figures. To run the app below, run pip install dash, click "Download" to get the code and run python app.py.

Get started with the official Dash docs and learn how to effortlessly style & deploy apps like this with Dash Enterprise.

Out[6]:

Sign up for Dash Club → Free cheat sheets plus updates from Chris Parmer and Adam Schroeder delivered to your inbox every two months. Includes tips and tricks, community apps, and deep dives into the Dash architecture. Join now.

Scatter plots and Categorical Axes

Scatter plots can be made using any type of cartesian axis, including linear, logarithmic, categorical or date axes.

Scatter plots where one axis is categorical are often known as dot plots.

In [7]:
import plotly.express as px
df = px.data.medals_long()

fig = px.scatter(df, y="nation", x="count", color="medal", symbol="medal")
fig.update_traces(marker_size=10)
fig.show()
81012141618202224South KoreaChinaCanada
medalgoldsilverbronzecountnation

Grouped Scatter Points

New in 5.12

By default, scatter points at the same location are overlayed. We can see this in the previous example, with the values for Canada for bronze and silver. Set scattermode='group' to plot scatter points next to one another, centered around the shared location.

In [8]:
import plotly.express as px

df = px.data.medals_long()

fig = px.scatter(df, y="count", x="nation", color="medal")
fig.update_traces(marker_size=10)
fig.update_layout(scattermode="group")
fig.show()
South KoreaChinaCanada0510152025
medalgoldsilverbronzenationcount

New in 5.12

You can configure the gap between groups of scatter points using scattergap. Here we set it to 0.75, which brings the points closer together by allocating more space to the gap between groups. If you don't set scattergap, a default value of 0 is used, unless you have bargap set. If you have bargap set, the scattergap defaults to that value.

In [9]:
import plotly.express as px

df = px.data.medals_long()

fig = px.scatter(df, y="count", x="nation", color="medal")
fig.update_traces(marker_size=10)
fig.update_layout(scattermode="group", scattergap=0.75)
fig.show()
South KoreaChinaCanada0510152025
medalgoldsilverbronzenationcount

Error Bars

Scatter plots support error bars.

In [10]:
import plotly.express as px
df = px.data.iris()
df["e"] = df["sepal_width"]/100
fig = px.scatter(df, x="sepal_width", y="sepal_length", color="species",
                 error_x="e", error_y="e")
fig.show()
22.533.544.54.555.566.577.58
speciessetosaversicolorvirginicasepal_widthsepal_length

Marginal Distribution Plots

Scatter plots support marginal distribution plots

In [11]:
import plotly.express as px
df = px.data.iris()
fig = px.scatter(df, x="sepal_length", y="sepal_width", marginal_x="histogram", marginal_y="rug")
fig.show()
44.555.566.577.5822.533.544.5
sepal_lengthsepal_width

Facetting

Scatter plots support faceting.

In [12]:
import plotly.express as px
df = px.data.tips()
fig = px.scatter(df, x="total_bill", y="tip", color="smoker", facet_col="sex", facet_row="time")
fig.show()
0102030405024681001020304050246810
smokerNoYestotal_billtotal_billtiptipsex=Femalesex=Maletime=Lunchtime=Dinner

Linear Regression and Other Trendlines

Scatter plots support linear and non-linear trendlines.

In [13]:
import plotly.express as px

df = px.data.tips()
fig = px.scatter(df, x="total_bill", y="tip", trendline="ols")
fig.show()
1020304050246810
total_billtip

Line plots with Plotly Express

In [14]:
import plotly.express as px
import numpy as np

t = np.linspace(0, 2*np.pi, 100)

fig = px.line(x=t, y=np.cos(t), labels={'x':'t', 'y':'cos(t)'})
fig.show()
0123456−1−0.500.51
tcos(t)
In [15]:
import plotly.express as px
df = px.data.gapminder().query("continent == 'Oceania'")
fig = px.line(df, x='year', y='lifeExp', color='country')
fig.show()
19601970198019902000707274767880
countryAustraliaNew ZealandyearlifeExp

The markers argument can be set to True to show markers on lines.

In [16]:
import plotly.express as px
df = px.data.gapminder().query("continent == 'Oceania'")
fig = px.line(df, x='year', y='lifeExp', color='country', markers=True)
fig.show()
195019601970198019902000201070727476788082
countryAustraliaNew ZealandyearlifeExp

The symbol argument can be used to map a data field to the marker symbol. A wide variety of symbols are available.

In [17]:
import plotly.express as px
df = px.data.gapminder().query("continent == 'Oceania'")
fig = px.line(df, x='year', y='lifeExp', color='country', symbol="country")
fig.show()
195019601970198019902000201070727476788082
countryAustraliaNew ZealandyearlifeExp

Line plots on Date axes

Line plots can be made on using any type of cartesian axis, including linear, logarithmic, categorical or date axes. Line plots on date axes are often called time-series charts.

Plotly auto-sets the axis type to a date format when the corresponding data are either ISO-formatted date strings or if they're a date pandas column or datetime NumPy array.

In [18]:
import plotly.express as px

df = px.data.stocks()
fig = px.line(df, x='date', y="GOOG")
fig.show()
Jan 2018Apr 2018Jul 2018Oct 2018Jan 2019Apr 2019Jul 2019Oct 20190.90.9511.051.11.151.2
dateGOOG

Data Order in Scatter and Line Charts

Plotly line charts are implemented as connected scatterplots (see below), meaning that the points are plotted and connected with lines in the order they are provided, with no automatic reordering.

This makes it possible to make charts like the one below, but also means that it may be required to explicitly sort data before passing it to Plotly to avoid lines moving "backwards" across the chart.

In [19]:
import plotly.express as px
import pandas as pd

df = pd.DataFrame(dict(
    x = [1, 3, 2, 4],
    y = [1, 2, 3, 4]
))
fig = px.line(df, x="x", y="y", title="Unsorted Input")
fig.show()

df = df.sort_values(by="x")
fig = px.line(df, x="x", y="y", title="Sorted Input")
fig.show()
11.522.533.5411.522.533.54
Unsorted Inputxy
11.522.533.5411.522.533.54
Sorted Inputxy

Connected Scatterplots

In a connected scatterplot, two continuous variables are plotted against each other, with a line connecting them in some meaningful order, usually a time variable. In the plot below, we show the "trajectory" of a pair of countries through a space defined by GDP per Capita and Life Expectancy. Botswana's life expectancy

In [20]:
import plotly.express as px

df = px.data.gapminder().query("country in ['Canada', 'Botswana']")

fig = px.line(df, x="lifeExp", y="gdpPercap", color="country", text="year")
fig.update_traces(textposition="bottom right")
fig.show()
195219571962196719721977198219871992199720022007195219571962196719721977198219871992199720022007455055606570758005k10k15k20k25k30k35k
countryBotswanaCanadalifeExpgdpPercap

Scatter and line plots with go.Scatter

If Plotly Express does not provide a good starting point, it is possible to use the more generic go.Scatter class from plotly.graph_objects. Whereas plotly.express has two functions scatter and line, go.Scatter can be used both for plotting points (makers) or lines, depending on the value of mode. The different options of go.Scatter are documented in its reference page.

Simple Scatter Plot

In [21]:
import plotly.graph_objects as go
import numpy as np

N = 1000
t = np.linspace(0, 10, 100)
y = np.sin(t)

fig = go.Figure(data=go.Scatter(x=t, y=y, mode='markers'))

fig.show()
0246810−1−0.500.51

Line and Scatter Plots

Use mode argument to choose between markers, lines, or a combination of both. For more options about line plots, see also the line charts notebook and the filled area plots notebook.

In [22]:
import plotly.graph_objects as go

# Create random data with numpy
import numpy as np
np.random.seed(1)

N = 100
random_x = np.linspace(0, 1, N)
random_y0 = np.random.randn(N) + 5
random_y1 = np.random.randn(N)
random_y2 = np.random.randn(N) - 5

fig = go.Figure()

# Add traces
fig.add_trace(go.Scatter(x=random_x, y=random_y0,
                    mode='markers',
                    name='markers'))
fig.add_trace(go.Scatter(x=random_x, y=random_y1,
                    mode='lines+markers',
                    name='lines+markers'))
fig.add_trace(go.Scatter(x=random_x, y=random_y2,
                    mode='lines',
                    name='lines'))

fig.show()
00.20.40.60.81−8−6−4−202468
markerslines+markerslines

Bubble Scatter Plots

In bubble charts, a third dimension of the data is shown through the size of markers. For more examples, see the bubble chart notebook

In [23]:
import plotly.graph_objects as go

fig = go.Figure(data=go.Scatter(
    x=[1, 2, 3, 4],
    y=[10, 11, 12, 13],
    mode='markers',
    marker=dict(size=[40, 60, 80, 100],
                color=[0, 1, 2, 3])
))

fig.show()
11.522.533.549.51010.51111.51212.51313.514

Style Scatter Plots

In [24]:
import plotly.graph_objects as go
import numpy as np


t = np.linspace(0, 10, 100)

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=t, y=np.sin(t),
    name='sin',
    mode='markers',
    marker_color='rgba(152, 0, 0, .8)'
))

fig.add_trace(go.Scatter(
    x=t, y=np.cos(t),
    name='cos',
    marker_color='rgba(255, 182, 193, .9)'
))

# Set options common to all traces with fig.update_traces
fig.update_traces(mode='markers', marker_line_width=2, marker_size=10)
fig.update_layout(title=dict(text='Styled Scatter'),
                  yaxis_zeroline=False, xaxis_zeroline=False)


fig.show()
0246810−1−0.500.51
sincosStyled Scatter

Data Labels on Hover

In [25]:
import plotly.graph_objects as go
import pandas as pd

data= pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/2014_usa_states.csv")

fig = go.Figure(data=go.Scatter(x=data['Postal'],
                                y=data['Population'],
                                mode='markers',
                                marker_color=data['Population'],
                                text=data['State'])) # hover text goes here

fig.update_layout(title=dict(text='Population of USA States'))
fig.show()
ALAKAZARCACOCTDEDCFLGAHIIDILINIAKSKYLAMEMDMAMIMNMSMOMTNENVNHNJNMNYNCNDOHOKORPAPRRISCSDTNTXUTVTVAWAWVWIWY05M10M15M20M25M30M35M40M
Population of USA States

Scatter with a Color Dimension

In [26]:
import plotly.graph_objects as go
import numpy as np

fig = go.Figure(data=go.Scatter(
    y = np.random.randn(500),
    mode='markers',
    marker=dict(
        size=16,
        color=np.random.randn(500), #set color equal to a variable
        colorscale='Viridis', # one of plotly colorscales
        showscale=True
    )
))

fig.show()
0100200300400500−3−2−101234
−3−2−1012

Trace Zorder

New in 5.21

For many trace types, including go.Scatter, you can define the order traces are drawn in by setting a zorder. Traces with a higher zorder appear at the front, with traces with a lower zorder at the back. In this example, we give our trace for 'France' the highest zorder, meaning it is drawn in front of the other two traces:

In [27]:
import plotly.graph_objects as go
import plotly.data as data

df = data.gapminder()

df_europe = df[df['continent'] == 'Europe']

trace1 = go.Scatter(x=df_europe[df_europe['country'] == 'France']['year'],
                    y=df_europe[df_europe['country'] == 'France']['lifeExp'],
                    mode='lines+markers',
                    zorder=3,
                    name='France',
                    marker=dict(size=15))

trace2 = go.Scatter(x=df_europe[df_europe['country'] == 'Germany']['year'],
                    y=df_europe[df_europe['country'] == 'Germany']['lifeExp'],
                    mode='lines+markers',
                    zorder=1,
                    name='Germany',
                    marker=dict(size=15))

trace3 = go.Scatter(x=df_europe[df_europe['country'] == 'Spain']['year'],
                    y=df_europe[df_europe['country'] == 'Spain']['lifeExp'],
                    mode='lines+markers',
                    zorder=2,
                    name='Spain',
                    marker=dict(size=15))

layout = go.Layout(title=dict(text='Life Expectancy in Europe Over Time'))

fig = go.Figure(data=[trace1, trace2, trace3], layout=layout)

fig.show()
195019601970198019902000201064666870727476788082
FranceGermanySpainLife Expectancy in Europe Over Time

Large Data Sets

Now in Plotly you can implement WebGL with Scattergl() in place of Scatter()
for increased speed, improved interactivity, and the ability to plot even more data!

In [28]:
import plotly.graph_objects as go
import numpy as np

N = 100000
fig = go.Figure(data=go.Scattergl(
    x = np.random.randn(N),
    y = np.random.randn(N),
    mode='markers',
    marker=dict(
        color=np.random.randn(N),
        colorscale='Viridis',
        line_width=1
    )
))

fig.show()
−4−3−2−101234−4−2024
In [29]:
import plotly.graph_objects as go
import numpy as np

N = 100000
r = np.random.uniform(0, 1, N)
theta = np.random.uniform(0, 2*np.pi, N)

fig = go.Figure(data=go.Scattergl(
    x = r * np.cos(theta), # non-uniform distribution
    y = r * np.sin(theta), # zoom to see more points at the center
    mode='markers',
    marker=dict(
        color=np.random.randn(N),
        colorscale='Viridis',
        line_width=1
    )
))

fig.show()
−1−0.500.51−1−0.500.51

What About Dash?

Dash is an open-source framework for building analytical applications, with no Javascript required, and it is tightly integrated with the Plotly graphing library.

Learn about how to install Dash at https://dash.plot.ly/installation.

Everywhere in this page that you see fig.show(), you can display the same figure in a Dash application by passing it to the figure argument of the Graph component from the built-in dash_core_components package like this:

import plotly.graph_objects as go # or plotly.express as px
fig = go.Figure() # or any Plotly Express function e.g. px.bar(...)
# fig.add_trace( ... )
# fig.update_layout( ... )

from dash import Dash, dcc, html

app = Dash()
app.layout = html.Div([
    dcc.Graph(figure=fig)
])

app.run_server(debug=True, use_reloader=False)  # Turn off reloader if inside Jupyter