TOON Features

Discover the key features that make TOON the most efficient data format for Large Language Models

30-60% Token Reduction

Dramatically reduce token usage compared to standard JSON, cutting costs and fitting more data in context windows.

JSON: 59 tokens
TOON: 24 tokens

LLM-Friendly Guardrails

Explicit array lengths and field declarations help models validate data structure and reduce generation errors.

users[2]{`{id,name,role}`}:
  1,Alice,admin
  2,Bob,user

Minimal Syntax

Removes redundant braces, brackets, and quotes. Only quotes when necessary for ambiguous values.

{"name": "value"}
name: value

Indentation-Based Structure

Uses whitespace like YAML for nested objects, making structure visually clear without extra punctuation.

user:
  id: 123
  profile:
    age: 30

Tabular Arrays

Declare field names once for uniform arrays, then stream data as CSV-style rows for maximum efficiency.

items[3]{`{sku,qty,price}`}:
  A1,2,9.99
  B2,1,14.50
  C3,5,7.25

Optional Key Folding

Collapse single-key wrapper chains into dotted paths to reduce indentation and save even more tokens.

// Without folding
data:
  metadata:
    items[2]: a,b
// With key folding
data.metadata.items[2]: a,b

Format Comparison

Standard JSON

411 tokens
{
  "employees": [
    {
      "id": 1,
      "name": "Alice Johnson",
      "department": "Engineering",
      "salary": 95000,
      "active": true
    },
    {
      "id": 2,
      "name": "Bob Smith",
      "department": "Marketing",
      "salary": 72000,
      "active": true
    },
    {
      "id": 3,
      "name": "Carol White",
      "department": "Engineering",
      "salary": 88000,
      "active": false
    }
  ]
}

TOON Format

162 tokens 60.6% savings
employees[3]{id,name,department,salary,active}:
  1,Alice Johnson,Engineering,95000,true
  2,Bob Smith,Marketing,72000,true
  3,Carol White,Engineering,88000,false
Real-world impact: On a dataset with 100 employee records, TOON uses approximately 3,800 fewer tokens than formatted JSON, translating to significant cost savings when used repeatedly in LLM applications.

Advanced Features

Flexible Delimiters

Choose between comma, tab, or pipe delimiters based on your data characteristics for optimal tokenization.

Comma (default)
items[2]: a,b,c
Tab
items[2 ]: a b c
Pipe
items[2|]: a|b|c

Smart Quoting Rules

TOON only quotes strings when necessary, maximizing token efficiency while maintaining clarity.

✓ Unquoted (safe)
name: John Smith
emoji: hello 👋 world
unicode: café résumé
⚠ Quoted (required)
value: " padded spaces "
csv: "contains, comma"
boolean: "true"

Recursive Tabular Format

Tabular formatting applies recursively to nested arrays, maintaining efficiency at every level.

teams[2]:
  - name: Alpha
    members[2]{id,name}:
      1,Alice
      2,Bob
  - name: Beta
    members[3]{id,name}:
      3,Carol
      4,Dave
      5,Eve

Automatic Type Normalization

Non-JSON types are automatically converted to LLM-safe representations.

Input
NaN → null
Infinity → null
undefined → null
Date → "ISO string"
Numbers
-0 → 0
1e6 → 1000000
BigInt → decimal

Optimized for Specific Use Cases

Best For

Uniform Arrays of Objects

When you have multiple records with identical fields, TOON's tabular format shines with 30-60% token savings.

LLM Prompt Data

Structured data in prompts benefits from explicit lengths and fields that help models validate output.

Token Cost Optimization

Applications where token costs matter, especially with repeated API calls or large datasets.

API Response Compression

Compress API responses with repeated structures before passing to LLMs.

Consider Alternatives When

Deeply Nested Structures

With minimal tabular eligibility (0-30%), JSON compact may be more efficient.

Non-Uniform Data

When objects have varying field sets, TOON's benefits diminish significantly.

Pure Tabular Data

For flat tables without nesting, CSV is simpler and slightly more compact.

Existing JSON Pipelines

If your infrastructure expects JSON, adding conversion may not be worth the complexity.