What I Learned Integrating Designers Wielding LLMs into Our Codebase

Our product designers have started using AI to code design prototypes for UI tweaks and features. As an engineer, it has been awesome to see designers empowered to bring their own designs to life in code. It also raised a tough question for the Plotly Engineering team: where does this help, and what best practices should we follow?

To figure that out, we ran a series of experiments. We’ve found so far that designers using LLMs to introduce code changes works best for clearly scoped changes, with a defined process and the right guardrails in place.

To test where the approach holds up, we tried a range of scenarios, from quick UI adjustments to larger feature builds, and tracked what worked and what didn’t. The approaches we took fell into the following categories:

UI improvements: small enhancements or tweaks to the existing designs
Bug fixes: addressing visual issues with the application’s behavior
Features: building new functionality in the application

What works (and doesn’t) for LLM-assisted design

For small UI tweaks and modest feature additions, the LLM-assisted workflow has been viable. We found the most reliable results came from tightly constrained, pre-approved tasks focused on UI tweaks and small features. Designers draft a plan with the LLM; a developer edits and approves it before coding begins. Reviews focus on correctness and fit, not deep domain context. Tooling—linting, type checks, and scripted self-checks—catches issues early.

Bug fixes have not been a good fit, because they typically require understanding the code to identify the cause and implement a fix, an area that LLMs still struggle with. In one instance, we had a text box with a border that looked too thin on one side. To the designer, it seemed logical to make the border thicker. However, the root cause was that the text box’s border was being partially cut off. The appropriate fix for the root cause would have been to adjust the spacing around the text box so the border was not cut off anymore.

LLMs often favor symptom-level changes over root-cause remedies, and designers' prompts can unintentionally steer them toward patching the visible behavior instead of investigating the root cause. That combination leads to a very brittle codebase.

We also tried having designers implement larger features with an LLM two different ways: with a developer providing a scaffold for the designer, or with the designer implementing the entire feature themselves.

With the scaffolding approach, we would have a developer write the core logic of a feature (e.g. fetching the necessary data from the database), and then hand off to a designer for UI implementation. While we had a decent success rate with this approach, the projects took significantly longer, as the designer was often waiting for the developer. Overall, this approach did not seem to increase productivity. It required onboarding two people to the same project, and introduced costly context switching for designer and developer alike.

On the other hand, asking the designer/LLM team to build the entire feature was not the best use of time either. Often the LLM would not use the appropriate patterns in our codebase, duplicate existing logic, and not cleanly separate state. This would either lead to extra cycles of code review or returning a ticket to the backlog for a developer to implement.

Engineering guardrails for LLM-assisted code

Figuring out where LLMs fit was only the first step. The harder part was making those contributions repeatable. One-off wins don’t scale, and without structure even small design changes can add complexity or slip away from established patterns. To keep the codebase healthy, we’ve put guardrails in place and built a set of guidelines that every contribution follows.

Scope, scope, scope

Scoping is critical for any software development task, but we have found it to be even more so with designer/LLM duos.

Agreeing with the designer on the set of changes to be included in a pull request is necessary to keep the code reviewable. Designers are typically unable to assess reviewability of the code the AI agents write, and AI agents do not take this into account unless prompted to do so.

This is particularly evident with visual flourishes. What might seem like a simple tweak to add a transition animation could actually introduce significant additional complexity to the code. This is a key area to agree on when scoping—prefer to start with functional changes and layer on visual flourishes in follow-ups.

Use AI guidelines

In addition to setting up your designer for success, it is crucial to set up your LLM for success. We have been iterating through our configuration, adapting with the latest changes in the field, but some things have remained constant.

Give your LLM documentation. You can start with human documentation such as README files, but LLM-specific documentation is typically warranted. Most coding agents encourage you to use their own format, but all LLMs can easily understand markdown. We created a file called “AI_RULES.md” in our repository root and configured our coding agents to include it.

One challenge that continues to prove difficult is designing rules that encapsulate the nuance you would expect from a seasoned developer. For example, we noticed that the LLMs our designers were using would often write HTML styled with inline styles. In our codebase we use a component library, so we want our LLM to use components from it when available. If a component that suits the use-case is not available, we want the LLM to fall-back to our CSS-in-JS library using CSS variables from our design system. Therefore, in our tech stack, inline styles are almost never appropriate. However, we don't want to say "never use inline styles", as they're warranted sometimes, just rarely.

Rely on automation

Using checks such as linting, type-checking, and unit tests that the LLM could run locally helped tighten the loop for designers evaluating LLM's results. Having these checks run in CI provides extra guard rails.

One amazing tool our designers have been using is the Puppeteer MCP server. During development, it is possible to take screenshots of the app and provide them to the LLM. However, we found it much more effective to give the LLM direct access to the browser, as it includes access to the browser's developer tools.

Another option we are examining is how much to invest in automated checks that primarily benefit the LLM. For example, adding a linter rule to warn when inline styles are used. However, we have had to weigh this against how noisy it will be for the human developers who can understand nuance. Additionally, the effort to implement a new linting rule can vary drastically. It could be as simple as configuring an existing rule or as complex as having to write it from scratch.

Review differently

Don't review a designer's PR like you'd review a developer's PR. If you have suggestions, provide them in a way that a designer can give to an LLM. If there are deeper issues, you may want to fix them yourself. If the issues are extensive, you may want to ask the designer to try again and provide guidance on a different approach, or close the PR and assign the ticket to a developer.

AI-driven reviews like CodeRabbit can help here, as they often include instructions you can give to an LLM agent. However, suggestions they provide are often inaccurate. For example, they frequently give advice that is correct for older versions of Python, but not on the version of Python we use. As a result, we recommend developers screen these review comments for designers and dismiss or resolve any irrelevant comments.

Persistent challenges and where we stand

Our most persistent challenges come from decision points that require knowledge of the codebase. LLMs cannot help directly with this due to their context window. Instead, they rely on documentation or a “memory” system. Experienced developers will be familiar with the tradeoff of trying to keep documentation up-to-date versus relying on tribal knowledge. This is complicated further by the fact that the users that would benefit most from the curation of this documentation are not well-equipped to write it. While the developers on our team also use LLM tools, we tend to provide much more specific prompts to the LLMs that obviate the need for detailed documentation/memory systems.

AI confidently ignoring the only rule that mattered

This is an ongoing experiment! The process has not been without its frustrations, including many hilarious screenshots shared by our designers of LLMs gaslighting them. We think we have found some specific workflows LLM-assisted coding by non-Engineering members works really well for, but we're continuing to try new things and tweak our approach.