Big Data for Python
Dash Enterprise is your front-end for horizontally scalable, big data computation in Python.
From Spark to Snowflake, Dask to Datashader...the Python "big data" tech stack has never been more varied or robust.
Dash Enterprise supports turnkey connections to the most popular "big data" backends for Python, including Vaex, Dask, Datashader, RAPIDS, Databricks (PySpark), Snowflake, and Postgres.
In addition, Dash Enterprise ships with battle-tested, plug-and-play demos for best leveraging Dash with each of these technologies.
Scroll below to demo the latest in Python HPC through Dash user interfaces.
Vaex is a Pandas-like library that can operate on vastly larger datasets through out-of-core memory mapping.
If you’re working with data that is too large to fit in memory, but you don’t want to go through the hassle of setting up Spark or Dask, give Vaex a try.
Dask is the de facto parallel computing library for Python. Dask is gaining popularity over PySpark because of its relatively low overhead to set up.
If you have a machine with multiple cores and a numerical computing problem that can be parallelized, give Dask a try.
Dash and Dask also work wonderfully with Datashader.
Datashader is an open-source Python library for server-side rendering of big data visualizations.
Dash apps integrate closely with Datashader to visualize big data. When zoomed out, Dash uses Datashader to render the entire “big data” visualization server-side. When zoomed in, Dash switches to Plotly graphing for interactive, high resolution data exploration.
Dash + Datashader can be scaled to 100s of millions of points with Dask and RAPIDS — see the Dask and RAPIDS demos for examples.
NVIDIA RAPIDS cuDF
cuDF is NVIDIA's Pandas-like library for running dataframe computations in GPU memory.
If you have access to GPU memory, cuDF is the fastest way to process big data in Python on a single node.
Databricks is the company and commercial platform behind Spark and PySpark.
Dash apps that use PySpark and are deployed on Dash Enterprise can call out to Databricks Spark clusters through the Dash Enterprise Job Queue and the databricks-connect utility.
Unlike Dask or RAPIDS, Spark does not work with Datashader, so there is no way to build interactive dashboards with PySpark that can visualize ~100 million rows of data and upsample or downsample in realtime.
Snowflake is a cloud-only, distributed commercial data warehouse with drivers for Python and R.
BI tools like PowerBI or Tableau are typical front-ends for Snowflake — use Dash when you need an AI front-end for NLP, computer vision, predictive analytics, or deep learning using data stored in Snowflake.
If you’re managing a terabyte or less of tabular data, you may not need Spark, Dask, or Snowflake. Vaex or the Postgres Python driver will do!
Dash Enterprise ships with onboard Postgres and Redis databases to store and cache data for your Dash apps. Both Postgres and Redis are fast and easy to access from Python, R, and Julia.
We're proud to partner with these best-in-class big data Python solutions.
See Dash in action
Sign up for our next Dash Live Weekly demo session to learn more about our Dash Enterprise offering, including industry applications and all the latest tips and features on how to operationalize your data science models.