Plotly and Datashader in Python

How to use datashader to rasterize large datasets, and visualize the generated raster data with plotly.

datashader creates rasterized representations of large datasets for easier visualization, with a pipeline approach consisting of several steps: projecting the data on a regular grid, creating a color representation of the grid, etc.

Passing datashader rasters as a tile map image layer

We visualize here the spatial distribution of taxi rides in New York City. A higher density is observed on major avenues. For more details about tile-based maps, see the tile map layers tutorial.

In [1]:
import pandas as pd
df = pd.read_csv('')
dff = df.query('Lat < 40.82').query('Lat > 40.70').query('Lon > -74.02').query('Lon < -73.91')

import datashader as ds
cvs = ds.Canvas(plot_width=1000, plot_height=1000)
agg = cvs.points(dff, x='Lon', y='Lat')
# agg is an xarray object, see for more details
coords_lat, coords_lon = agg.coords['Lat'].values, agg.coords['Lon'].values
# Corners of the image
coordinates = [[coords_lon[0], coords_lat[0]],
               [coords_lon[-1], coords_lat[0]],
               [coords_lon[-1], coords_lat[-1]],
               [coords_lon[0], coords_lat[-1]]]

from colorcet import fire
import datashader.transfer_functions as tf
img = tf.shade(agg, cmap=fire)[::-1].to_pil()

import as px
# Trick to create rapidly a figure with map axes
fig = px.scatter_map(dff[:1], lat='Lat', lon='Lon', zoom=12)
# Add the datashader image as a tile map layer image
    map_layers=[{"sourcetype": "image", "source": img, "coordinates": coordinates}],

Exploring correlations of a large dataset

Here we explore the flight delay dataset from In order to get a visual impression of the correlation between features, we generate a datashader rasterized array which we plot using a Heatmap trace. It creates a much clearer visualization than a scatter plot of (even a fraction of) the data points, as shown below.

Note that instead of datashader it would theoretically be possible to create a 2d histogram with plotly but this is not recommended here because you would need to load the whole dataset (5M rows !) in the browser for plotly.js to compute the heatmap, which is practically not tractable. Datashader offers the possibility to reduce the size of the dataset before passing it to the browser.

In [2]:
import plotly.graph_objects as go
import pandas as pd
import numpy as np
import datashader as ds
df = pd.read_parquet('')
fig = go.Figure(go.Scattergl(x=df['SCHEDULED_DEPARTURE'][::200],
fig.update_layout(title_text='A busy plot')
In [3]:
import as px
import pandas as pd
import numpy as np
import datashader as ds
df = pd.read_parquet('')

cvs = ds.Canvas(plot_width=100, plot_height=100)
zero_mask = agg.values == 0
agg.values = np.log10(agg.values, where=np.logical_not(zero_mask))
agg.values[zero_mask] = np.nan
fig = px.imshow(agg, origin='lower', labels={'color':'Log10(count)'})
fig.update_layout(coloraxis_colorbar=dict(title='Count', tickprefix='1.e'))

