How we reduced Mermaid rendering from three seconds to under 200 ms

Published

May 13, 2026

By Simon Zajdela

MermaidPerformanceNestJSPuppeteerDocker

The problem was not Mermaid. It was the cold start.

The first version of the diagram rendering service did the obvious thing: accept Mermaid code, write it to a temporary file, and call mermaid-cli. It worked. It was simple. It was also slow in the same way a tank is slow when you bring it from the warehouse for every single shot.

The measurement was clear enough: roughly three seconds for one SVG. For an internal or rarely used endpoint, that might be acceptable. For a public diagram service where fast feedback matters, it is not. The issue was not that Mermaid cannot render quickly. The issue was that every request paid for a new process and a fresh Chromium launch.

Version one: the official path, the official pain

The classic approach is straightforward: an HTTP request reaches the server, the server starts mmdc, mmdc starts Chromium, Chromium renders the diagram, the result is returned, and the process exits. It is nicely isolated, but expensive when repeated for every request.

Docker added its own entertainment: Chromium sandboxing, crashpad, system libraries, non-root users, and all the small Linux details that make you negotiate with apt-get like it is a moody office printer. Once it finally worked, it was functional, but not elegant.

Diagram

The difference is not Mermaid syntax. It is the renderer lifecycle.

The optimization: keep Chromium alive

The key change was simple: do not start the renderer for every request. On NestJS startup, open one headless Chromium process, create one page, load the Mermaid runtime from node_modules, and initialize Mermaid once.

When a request arrives, the server no longer calls the CLI. Instead, it sends the Mermaid code into the already open browser context and calls mermaid.render. For SVG output, no screenshot is needed. The browser returns an SVG string, and the server responds with image/svg+xml.

What this means for concurrency

Because the first production version uses a single browser page, I added a small internal queue. If two requests arrive at the same time, they do not mix. The second waits, the first finishes, then the second renders. With render times below 200 ms, that is a perfectly acceptable tradeoff.

The next step would be a small page pool. One Chromium process can hold multiple isolated page contexts, for example two or four. That allows parallel rendering without opening a new browser for every diagram. But the queue-based version is calmer, more predictable, and already fast enough for a small public service.

Memory, Docker, and production hygiene

With a long-running Chromium process, the main question is RAM. Cleaning the DOM after each render is a cheap safety measure, especially if Mermaid or the browser context keeps elements around over time. It costs almost nothing and helps avoid slow memory buildup.

The Docker image stays reasonable when using system Chromium instead of downloading Puppeteer's bundled browser. The important details are a non-root user, writable tmp directories for Chromium profile/crashpad data, and a minimal set of system libraries. Not glamorous, but exactly the sort of work that separates a demo from a deployed service.

The result

The practical result was a jump from roughly three seconds to under 200 ms for SVG rendering. CPU barely moves for basic diagrams because the expensive part — browser startup — is no longer on the request path.

This is one of those optimizations where we did not need a complex cache, Redis, or a separate worker system. Moving heavy initialization from request time to startup was enough. Sometimes production-ready means adding less infrastructure and removing repeated stupidity. Keep the tank on the field.