Plot Data from MySQL in Python/v3

How to graph data from a MySQL database with Python.


Note: this page is part of the documentation for version 3 of Plotly.py, which is not the most recent version.
See our Version 4 Migration Guide for information about how to upgrade.

New to Plotly?

Plotly's Python library is free and open source! Get started by downloading the client and reading the primer.
You can set up Plotly to work in online or offline mode, or in jupyter notebooks.
We also have a quick-reference cheatsheet (new!) to help you get started!

Version Check

Plotly's python package is updated frequently. Run pip install plotly --upgrade to use the latest version.

In [1]:
import plotly
plotly.__version__
Out[1]:
'2.4.1'

Imports

This notebook uses the MySQL world database:http://dev.mysql.com/doc/index-other.html. Instructions for setting up the world database in MySQL are here. This notebook was created for this article in Modern Data

In [3]:
import plotly.plotly as py
import plotly.graph_objs as go

import MySQLdb
import pandas as pd

Connect to MySQL Database

In [5]:
conn = MySQLdb.connect(host="localhost", user="root", passwd="xxxx", db="world")
cursor = conn.cursor()
cursor.execute('select Name, Continent, Population, LifeExpectancy, GNP from Country');

rows = cursor.fetchall()
str(rows)[0:300]
Out[5]:
"(('Aruba', 'North America', 103000L, 78.4, 828.0), ('Afghanistan', 'Asia', 22720000L, 45.9, 5976.0), ('Angola', 'Africa', 12878000L, 38.3, 6648.0), ('Anguilla', 'North America', 8000L, 76.1, 63.2), ('Albania', 'Europe', 3401200L, 71.6, 3205.0), ('Andorra', 'Europe', 78000L, 83.5, 1630.0), ('Netherla"
In [7]:
df = pd.DataFrame( [[ij for ij in i] for i in rows] )
df.rename(columns={0: 'Name', 1: 'Continent', 2: 'Population', 3: 'LifeExpectancy', 4:'GNP'}, inplace=True);
df = df.sort_values(['LifeExpectancy'], ascending=[1]);

Some country names cause serialization errors in early versions of Plotly's Python client. The code block below takes care of this.

In [8]:
country_names = df['Name']
for i in range(len(country_names)):
    try:
        country_names[i] = str(country_names[i]).decode('utf-8')
    except:
        country_names[i] = 'Country name decode error'
C:\Python27\lib\site-packages\ipykernel_launcher.py:4: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

In [10]:
trace1 = go.Scatter(
    x=df['GNP'],
    y=df['LifeExpectancy'],
    text=country_names,
    mode='markers'
)
layout = go.Layout(
    title='Life expectancy vs GNP from MySQL world database',
    xaxis=dict( type='log', title='GNP' ),
    yaxis=dict( title='Life expectancy' )
)
data = [trace1]
fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='world GNP vs life expectancy')
Out[10]:
In [11]:
# (!) Set 'size' values to be proportional to rendered area,
#     instead of diameter. This makes the range of bubble sizes smaller
sizemode='area'

# (!) Set a reference for 'size' values (i.e. a population-to-pixel scaling).
#     Here the max bubble area will be on the order of 100 pixels
sizeref=df['Population'].max()/1e2**2

colors = {
    'Asia':"rgb(255,65,54)",
    'Europe':"rgb(133,20,75)",
    'Africa':"rgb(0,116,217)",
    'North America':"rgb(255,133,27)",
    'South America':"rgb(23,190,207)",
    'Antarctica':"rgb(61,153,112)",
    'Oceania':"rgb(255,220,0)",
}

# Define a hover-text generating function (returns a list of strings)
def make_text(X):
    return 'Country: %s\
    <br>Life Expectancy: %s years\
    <br>Population: %s million'\
    % (X['Name'], X['LifeExpectancy'], X['Population']/1e6)

# Define a trace-generating function (returns a Scatter object)
def make_trace(X, continent, sizes, color):
    return go.Scatter(
        x=X['GNP'],  # GDP on the x-xaxis
        y=X['LifeExpectancy'],    # life Exp on th y-axis
        name=continent,    # label continent names on hover
        mode='markers',    # (!) point markers only on this plot
        text=X.apply(make_text, axis=1).tolist(),
        marker= dict(
            color=color,           # marker color
            size=sizes,            # (!) marker sizes (sizes is a list)
            sizeref=sizeref,       # link sizeref
            sizemode=sizemode,     # link sizemode
            opacity=0.6,           # (!) partly transparent markers
            line= dict(width=3,color="white")  # marker borders
        )
    )

# Initialize data object 
data = []

# Group data frame by continent sub-dataframe (named X), 
#   make one trace object per continent and append to data object
for continent, X in df.groupby('Continent'):

    sizes = X['Population']                 # get population array 
    color = colors[continent]               # get bubble color

    data.append(
        make_trace(X, continent, sizes, color)  # append trace to data object
    )

    # Set plot and axis titles
title = "Life expectancy vs GNP from MySQL world database (bubble chart)"
x_title = "Gross National Product"
y_title = "Life Expectancy [in years]"

# Define a dictionary of axis style options
axis_style = dict(
    type='log',
    zeroline=False,       # remove thick zero line
    gridcolor='#FFFFFF',  # white grid lines
    ticks='outside',      # draw ticks outside axes 
    ticklen=8,            # tick length
    tickwidth=1.5         #   and width
)

# Make layout object
layout = go.Layout(
    title=title,             # set plot title
    plot_bgcolor='#EFECEA',  # set plot color to grey
    hovermode="closest",
    xaxis=dict(
        axis_style,      # add axis style dictionary
        title=x_title,   # x-axis title
        range=[2.0,7.2], # log of min and max x limits
    ),
    yaxis=dict(
        axis_style,      # add axis style dictionary
        title=y_title,   # y-axis title
    )
)

# Make Figure object
fig = go.Figure(data=data, layout=layout)

# (@) Send to Plotly and show in notebook
py.iplot(fig, filename='s3_life-gdp')
Out[11]:

References

See https://plotly.com/python/getting-started/ for more information about Plotly's Python Open Source Graphing Library!