Big Data for Python

Dash Enterprise is your front-end for horizontally scalable, big data computation in Python.

From Spark to Snowflake, Dask to Datashader...the Python "big data" tech stack has never been more varied or robust.

Dash Enterprise supports turnkey connections to the most popular "big data" backends for Python, including Vaex, Dask, Datashader, RAPIDS, Databricks (PySpark), Snowflake, Postgres, and Salesforce.

In addition, Dash Enterprise ships with battle-tested, plug-and-play demos for best leveraging Dash with each of these technologies.

Scroll below to demo the latest in Python HPC through Dash user interfaces.

Get pricing


Vaex is a Pandas-like library that can operate on vastly larger datasets through out-of-core memory mapping.

If you’re working with data that is too large to fit in memory, but you don’t want to go through the hassle of setting up Spark or Dask, give Vaex a try.

This Dash app uses Vaex to explore 117 million rows of data (7GB) in real time.


Dask is the de facto parallel computing library for Python. Dask is gaining popularity over PySpark because of its relatively low overhead to set up.

If you have a machine with multiple cores and a numerical computing problem that can be parallelized, give Dask a try.

Dash and Dask also work wonderfully with Datashader.

This Dash app uses Dask and Datashader to explore 40 million rows of data in real time.


Datashader is an open-source Python library for server-side rendering of big data visualizations.

Dash apps integrate closely with Datashader to visualize big data. When zoomed out, Dash uses Datashader to render the entire “big data” visualization server-side. When zoomed in, Dash switches to Plotly graphing for interactive, high resolution data exploration.

Dash + Datashader can be scaled to 100s of millions of points with Dask and RAPIDS — see the Dask and RAPIDS demos for examples. 

This Dash app uses Datashader to visualize 1 million time series points in 300 lines of Python.


cuDF is NVIDIA's Pandas-like library for running dataframe computations in GPU memory. 

If you have access to GPU memory, cuDF is the fastest way to process big data in Python on a single node.

This Dash app uses cuDF to explore 146 million rows in real-time.


Databricks is the company and commercial platform behind Spark and PySpark.

Dash apps that use PySpark and are deployed on Dash Enterprise can call out to Databricks Spark clusters through the Dash Enterprise Job Queue and the databricks-connect utility.

Unlike Dask or RAPIDS, Spark does not work with Datashader, so there is no way to build interactive dashboards with PySpark that can visualize ~100 million rows of data and upsample or downsample in real-time.

This Dash app uses databricks-connect to explore Yelp reviews on a Databricks Spark cluster


Snowflake is a cloud-only, distributed commercial data warehouse with drivers for Python and R.

BI tools like PowerBI or Tableau are typical front-ends for Snowflake — use Dash when you need an AI front-end for NLP, computer vision, predictive analytics, or deep learning using data stored in Snowflake.

This Dash app performs NLP on product reviews stored in Snowflake - all in less than 1,000 lines of Python.


If you’re managing a terabyte or less of tabular data, you may not need Spark, Dask, or Snowflake. Vaex or the Postgres Python driver will do!

Dash Enterprise ships with onboard Postgres and Redis databases to store and cache data for your Dash apps. Both Postgres and Redis are fast and easy to access from Python, R, and Julia.

This Dash app queries NYC complaints stored in Dash Enterprise’s onboard Postgres database.


Salesforce is a cloud-based customer relationship management system for marketing, sales, commerce, and service.

Dash Enterprise 4.1 ships in Fall 2020 with built-in Salesforce Embedding Middleware integration. You'll be able to use Dash Enterprise for Salesforce data processing and authentication, as well as its Kubernetes infrastructure for effortless simultaneous application viewing.

RAPIDS + Dash:

Visualize insanely big data insanely fast

📈 Interactively visualize 300 Million+ datapoints in a web browser with a single GPU.

🌎 GPU acceleration lets analysts zoom between global, national, and individual level data in real-team.

🐼 In this demo, CPU aggregation (Pandas) is >20x slower than GPU aggregation (RAPIDS cuDF).

🎥 Watch an aggregation in this demo that takes Pandas 98 seconds and RAPIDS cuDF 0.59 seconds.

“Modern AI cannot exist without access to
high-performance computing.”

Foteini Agrafioti Chief Science Officer at RBC and head of Borealis AI

We're proud to partner with these best-in-class big data Python solutions.

See Dash in action

Sign up for our next Dash Live Weekly demo session to learn more about our Dash Enterprise offering, including industry applications and all the latest tips and features on how to operationalize your data science models.

Please fill all *required* fields