Peak Finding in Python/v3

Learn how to find peaks and valleys on datasets in Python


Note: this page is part of the documentation for version 3 of Plotly.py, which is not the most recent version.
See our Version 4 Migration Guide for information about how to upgrade.
The version 4 version of this page is here.

New to Plotly?

Plotly's Python library is free and open source! Get started by downloading the client and reading the primer.
You can set up Plotly to work in online or offline mode, or in jupyter notebooks.
We also have a quick-reference cheatsheet (new!) to help you get started!

Imports

The tutorial below imports NumPy, Pandas, SciPy and PeakUtils.

In [1]:
import plotly.plotly as py
import plotly.graph_objs as go
from plotly.tools import FigureFactory as FF

import numpy as np
import pandas as pd
import scipy
import peakutils

Import Data

To start detecting peaks, we will import some data on milk production by month:

In [2]:
milk_data = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/monthly-milk-production-pounds.csv')
time_series = milk_data['Monthly milk production (pounds per cow)']
time_series = time_series.tolist()

df = milk_data[0:15]

table = FF.create_table(df)
py.iplot(table, filename='milk-production-dataframe')
Out[2]:

Original Plot

In [3]:
trace = go.Scatter(
    x = [j for j in range(len(time_series))],
    y = time_series,
    mode = 'lines'
)

data = [trace]
py.iplot(data, filename='milk-production-plot')
Out[3]:

With Peak Detection

We need to find the x-axis indices for the peaks in order to determine where the peaks are located.

In [4]:
cb = np.array(time_series)
indices = peakutils.indexes(cb, thres=0.02/max(cb), min_dist=0.1)

trace = go.Scatter(
    x=[j for j in range(len(time_series))],
    y=time_series,
    mode='lines',
    name='Original Plot'
)

trace2 = go.Scatter(
    x=indices,
    y=[time_series[j] for j in indices],
    mode='markers',
    marker=dict(
        size=8,
        color='rgb(255,0,0)',
        symbol='cross'
    ),
    name='Detected Peaks'
)

data = [trace, trace2]
py.iplot(data, filename='milk-production-plot-with-peaks')
Out[4]:

Only Highest Peaks

We can attempt to set our threshold so that we identify as many of the highest peaks that we can.

In [5]:
cb = np.array(time_series)
indices = peakutils.indexes(cb, thres=0.678, min_dist=0.1)

trace = go.Scatter(
    x=[j for j in range(len(time_series))],
    y=time_series,
    mode='lines',
    name='Original Plot'
)

trace2 = go.Scatter(
    x=indices,
    y=[time_series[j] for j in indices],
    mode='markers',
    marker=dict(
        size=8,
        color='rgb(255,0,0)',
        symbol='cross'
    ),
    name='Detected Peaks'
)

data = [trace, trace2]
py.iplot(data, filename='milk-production-plot-with-higher-peaks')
Out[5]: