17 Jan 2026

I Cut Vercel's json-render LLM Costs by 89% Using TOON

Mateo Lafalce - Blog

When building json-render, Vercel created an elegant solution for dynamically generating UIs using generative AI. The tool leverages Claude Opus 4.5 to produce structured JSON output that can be rendered as interactive interfaces. However, there was a hidden inefficiency in the output format that was costing developers significantly more than necessary.

The original implementation used JSONL as the output format. While JSONL is human-readable and easy to work with, it's verbose, and when you're paying 3x more for output tokens than input tokens with Claude Opus 4.5, verbosity becomes expensive.

Source: Claude Docs (17/01/2026)

The Hypothesis

I hypothesized that switching from JSONL to TOON would dramatically reduce costs, even after accounting for the additional context needed to explain the TOON format to the LLM.

The math was compelling: if we could reduce output tokens significantly, the savings would more than offset the small increase in input tokens required to explain TOON formatting.

The Benchmark

To validate this hypothesis, I created a comprehensive benchmark comparing two implementations:

  1. json-render: Returns JSONL responses
  2. toon-render: Returns TOON responses, then decodes to JSON

I tested both implementations across 10 different UI generation prompts, measuring three critical metrics:

The Results

The results exceeded my expectations:

Token Efficiency:

Cost Savings

Performance

Trade-off

There's one important limitation: TOON doesn't support streaming in chunks like JSONL does, so the entire response must be generated before decoding, which means no "hot loading" of partial results :(

The Broader Lesson

This benchmark reveals an important principle for working with LLMs: optimize for compact output formats when output tokens cost more than input tokens. 

Many developers focus on reducing input context, but when output tokens are 3x more expensive, the real opportunity lies in minimizing what the LLM generates. TOON is just one example. The key insight is to think critically about your output format:

Conclusion

By switching from JSONL to TOON, we achieved:

This isn't just about json-render, it's a blueprint for optimizing any LLM application that generates structured output. When output tokens are expensive, compact formats aren't just nice to have; they're essential for building cost-effective AI applications at scale.

Repository 


This blog is open source. See an error? Go ahead and propose a change.