Why TOON?

The data serialization format designed specifically for Large Language Models. Reduce token costs, improve comprehension, and maintain human readability.

The Problem with JSON

JSON was created in the early 2000s as a data interchange format for web applications. It's human-readable, universal, and has served us well for over two decades. But in the age of AI and Large Language Models, JSON has a critical flaw: it's incredibly verbose.

Excessive Quoting

Every key and string value is wrapped in quotes, consuming unnecessary tokens

Key Repetition

In arrays of objects, the same keys are repeated for every single record

Syntactic Clutter

Braces, brackets, colons, and commas everywhere – all counting as tokens

The Cost of Verbosity

In LLM interactions, tokens are currency. Every character you send costs money. When you're working with large datasets or making thousands of API calls, JSON's verbosity directly impacts your budget:

  • Higher API costs: More tokens = more money spent on every request
  • Context window limitations: Verbose data leaves less room for actual content
  • Slower processing: More tokens take longer to generate and process

See the Difference

JSON

411 tokens Verbose
{
  "users": [
    {
      "id": 1,
      "name": "Alice",
      "role": "admin",
      "active": true
    },
    {
      "id": 2,
      "name": "Bob",
      "role": "user",
      "active": false
    },
    {
      "id": 3,
      "name": "Charlie",
      "role": "editor",
      "active": true
    }
  ]
}

TOON

162 tokens Compact
users[3]{id,name,role,active}:
  1,Alice,admin,true
  2,Bob,user,false
  3,Charlie,editor,true

60.6% Token Reduction

That's 249 tokens saved on just 3 records!

411
JSON tokens
162
TOON tokens
-249
Tokens saved

The TOON Solution

TOON (Token-Oriented Object Notation) was designed from the ground up with LLMs in mind. It's a lossless serialization format that maintains all the structure and data of JSON, but optimizes for token efficiency and LLM comprehension.

Tabular Arrays

For uniform arrays of objects (the most common case in structured data), TOON uses a CSV-style tabular format. Declare the schema once, then list values as rows.

users[3]{id,name,role}:
1,Alice,admin
2,Bob,user

YAML-style Indentation

For nested objects, TOON uses indentation instead of braces. This is familiar to developers and natural for LLMs to parse.

user:
id: 1
name: Alice

Minimal Quoting

TOON only quotes strings when absolutely necessary. Simple values, numbers, and even strings with spaces don't need quotes.

name: Alice Smith
age: 30
city: New York

LLM Guardrails

Explicit array lengths and field declarations help LLMs track structure, reducing errors when generating or validating data.

items[5]{id,qty}:
# LLM knows to expect 5 rows

Real-World Performance

Independent benchmarks show TOON's advantages in both token efficiency and LLM comprehension:

Token Efficiency (Uniform Employee Records - 100 items)

CSV (Most Efficient) 46,954 tokens
TOON 49,831 tokens (+6.1% vs CSV)
JSON Compact 78,856 tokens
JSON 126,860 tokens
XML (Least Efficient) 146,444 tokens

TOON adds only 6% overhead vs CSV while providing full JSON structure and validation

LLM Retrieval Accuracy (209 questions across 4 models)

TOON 73.9% accuracy
JSON Compact 70.7% accuracy
JSON 69.7% accuracy
YAML 69.0% accuracy
XML 67.1% accuracy

TOON achieves higher accuracy while using 39.6% fewer tokens than JSON

Efficiency Ranking (Accuracy per 1K Tokens)

1. TOON
26.9
2. JSON Compact
22.9
3. YAML
18.6
4. JSON
15.3
5. XML
13.0

This metric balances both accuracy and token cost, showing TOON's overall superiority for LLM interactions.

When to Use TOON

TOON Excels At

  • Uniform arrays of objects

    Multiple records with the same fields - TOON's sweet spot

  • LLM prompt optimization

    Reduce token costs for data-heavy prompts

  • Structured data exchange with AI

    API responses, database exports, analytics data

  • RAG systems

    Retrieval-augmented generation with structured context

  • Fine-tuning datasets

    Training data with reduced token overhead

  • Agent frameworks

    Compact data exchange in multi-agent systems

Stick with JSON When

  • Deeply nested structures

    Complex trees with 0% tabular eligibility

  • Non-uniform data

    Arrays where objects have different field sets

  • Pure tabular data

    Use CSV instead - it's more efficient

  • Existing JSON pipelines

    No need to convert if you're not using LLMs

  • Browser/client-side only

    JSON is native to JavaScript environments

  • Strict schema validation needed

    JSON Schema has mature tooling

Hybrid Approach

You don't have to choose one format forever. Many teams use JSON for internal APIs and storage, then convert to TOON only when sending data to LLMs. This gives you the best of both worlds: mature JSON tooling for your application layer, and token-efficient TOON for AI interactions.

Real-World Impact

Cost Savings

$1,500+

Estimated monthly savings for a service processing 1M records/day at GPT-4 pricing

Context Window

2.5x

Fit 2.5x more records in the same context window compared to formatted JSON

Faster Processing

40%

Average reduction in generation time due to fewer tokens to process

Example: E-commerce Analytics

Scenario

  • • Analyzing 10,000 order records daily
  • • Sending to GPT-4 for insights generation
  • • 50 API calls per day
  • • Each record: 8 fields average

TOON Savings

JSON tokens/day: ~850K
TOON tokens/day: ~340K
Monthly savings: ~$50

Getting Started with TOON

Ready to start saving tokens? TOON is easy to integrate into your existing workflow.

1

Try the Converter

Use our online converter to see how TOON handles your data. Get instant token count comparisons.

Open Converter
2

Install the Library

Add TOON to your project with npm, Python pip, or any of our 15+ language implementations.

npm install @toon-format/toon
View on GitHub
3

Integrate & Save

Convert your data to TOON before sending to LLMs. Start saving tokens immediately.

See Features

Quick Example (TypeScript/JavaScript)

import { encode } from '@toon-format/toon';

const data = {
  orders: [
    { id: 1, customer: 'Alice', total: 99.99, status: 'shipped' },
    { id: 2, customer: 'Bob', total: 149.50, status: 'pending' }
  ]
};

// Convert to TOON
const toonData = encode(data);

// Send to LLM
const response = await openai.chat.completions.create({
  model: 'gpt-4',
  messages: [{
    role: 'user',
    content: `Analyze these orders:\n\n${toonData}`
  }]
});

// Result: 60% fewer tokens, same data!

Community & Support

Open Source

TOON is fully open source under MIT license. Contribute, report issues, or suggest improvements on GitHub.

Visit GitHub

Documentation

Full specification, API documentation, and examples to help you get the most out of TOON.

Read Docs