Andrew Pikul
November 15, 2024
Kaleido: The Next Generation of Static Image Export for Data Visualization
Written by: Andrew Pikul with Greg Wilson
Introduction
One of Plotly’s strengths is that the charts it makes are interactive. Sometimes, though, people need a static image to include in a slideshow, report, academic journal, or thesis. We released an open source tool called Kaleido four years ago to meet this need, but maintaining it proved to be difficult.
Thanks to a lot of hard work from a community member named Andrew Pikul, Kaleido has been completely re-architected so that it is easier to install and much (much) easier to modify and maintain. As part of launching it, we are also releasing a new open source project called Choreographer that allows Python programs to drive web browsers. This post explains where we started, what we’ve done, and how you can help us make it even better.
Our starting point
Plotly uses a JavaScript library called D3 to generate SVG representations of charts, which the user’s browser then displays. SVG is a vector format, which makes diagrams scalable and easy for software to manipulate. However, converting SVG to raster image formats like PNG and JPEG is tricky, and gets harder the closer you look.
The good news is that browsers such as Chrome and Firefox include code to do this conversion so that people can print pages. Rather than reinventing the wheel, Jon Mease (Kaleido’s original author) decided to use the browser’s printing engine to handle image conversion, which would also ensure that generated images looked the same as they do interactively.
The first version of Kaleido was therefore a custom build of the Chrome browser wrapped up in Python.
A comparison of the new vs. old architecture of Kaleido.
It worked, but over the years we realized that this approach had several limitations. First, Chrome is a very large application: it consists of tens of thousands of C++ source files, which take over a dozen hours to recompile from scratch. Once it is compiled, it’s still very large: when we updated Kaleido to the most recent version of Chrome, the resulting package was actually larger than PyPI allows.
We could live with the long compile times if we had to, and we could have asked PyPI for a size limit exemption, but we would still have had to deal with the other big problem: maintainability. When we started overhauling Kaleido in July, we discovered that it was tightly coupled to some internal Chrome APIs that had actually been deprecated a couple of years ago. Figuring out what to change, and what to change to, was a big job, and we had no guarantee that we wouldn’t have to do it all again in the future. Finally, even with those changes, Kaleido still had stability problems on some versions of Windows.
The Next Generation
Those stability problems led Andrew Pikul to start overhauling Kaleido in July 2024. His users are geophysicists; almost all of them use Windows, so those problems were a blocker for a new project he was getting off the ground. He realized that Kaleido could be re-engineered to take advantage of a mechanism built into Chrome to support automated browser testing tools like Puppeteer, and that doing this would make the package a lot smaller and a lot easier to maintain.
When a process starts running, it typically opens file descriptor 0 (FD 0) for standard input, FD 1 for standard output, and FD 2 for standard error. If you ask nicely, Chrome will also open FD 3 for input and FD 4 for output. A program can then send commands to it through FD 3 (formatted as JSON) and get output from it via FD 4. Those commands can tell Chrome to open tabs and run code. In particular, they can tell Chrome to “print” a Plotly chart as a PNG or JPEG image and send the resulting bytes back to the program that’s driving it.
The new version of Kaleido implements this design in two parts. The first is a new open source Python package called Choreographer that handles the low-level details of launching a headless browser (i.e., one that doesn’t actually display anything on the screen) and communicating with it. The second part is the re-imagined Kaleido, which gives users the API they’re used to. Some extra tools need to be installed to produce PDF and EPS files, but it’s all a tiny fraction of the size of the original, and more reliable across more platforms.
Of course, there’s no such thing as a free lunch. The new Kaleido requires users to have Chrome installed somewhere on their machine rather than bundling it in the Python package download. We actually think that’s a good thing — system administrators are unlikely to be happy about people installing out-of-date browsers disguised as Python packages — but we do plan to have Choreographer support Firefox and other browsers as well.
The bigger issue is that the new Kaleido is noticeably slower than its predecessor because it has to launch a headless browser in the background. We plan to modify Choreographer and Kaleido to amortize this startup cost, but doing that may require some changes to the Kaleido API, and we want to be sure this approach works across all platforms before optimizing it.
Get Involved!
That’s where you come in. We want to test Kaleido on as many different platforms as possible, and contributions to both Choreographer and Kaleido are very welcome. If you’d like to lend a hand, please get in touch — everything will be made available under the MIT License, and all contributions will be acknowledged in the projects’ documentation.