24 Nov 2025

TOON vs. JSON: A Mathematical Evaluation of Byte Efficiency in Structured Data

Mateo Lafalce - Blog

After many years, I'm excited to return to scientific publishing with this research paper. This paper presents a rigorous mathematical analysis of TOON compared to JSON, demonstrating quantifiable efficiency gains in structured data serialization for LLMs.

The research establishes formal mathematical byte-length functions for both JSON and TOON formats, enabling precise comparison across different data structures. Through recursive mathematical definitions, we proved that TOON achieves strictly positive efficiency delta () by eliminating structural redundancy inherent in JSON in most of the cases.

The analysis revealed one specific case where TOON underperforms: Arrays of Arrays. In this scenario, TOON actually increases byte usage. For 1,000,000 nested arrays with m=2, approximately 2.86 MB would be wasted compared to JSON. Additionally, TOON becomes less efficient with increasing depth (d) due to cumulative indentation costs in nested structures.

Since UTF-8 byte length serves as the base unit for tokenization in LLMs, the reduction in byte count directly correlates to decreased token consumption.

Full Paper: Read on ResearchGate


This blog is open source. See an error? Go ahead and propose a change.

TOONvs.JSONAMathematicalEvaluationofByteEffic