chatpack: 13x Token Compression for Chat Exports

From manual scripts to automation: A Rust CLI tool for preparing chat exports for LLM analysis

The Problem Nobody Notices

Gemini refused to analyze my chat. "Failed to generate content. Please try again." And I needed to understand what patterns exist in how our community communicates.

I was trying to analyze a private developer group at a programming school where I study. I needed to identify trends in how students communicate and what topics come up most often. I needed to identify trends: why do so few students make it to the end? What changed when the platform was updated? What advice did graduates give?

I exported the chat from Telegram. 34 thousand messages, three years of history. Opened result.json - 26 megabytes. The problem isn't the file size. The problem is that 80% of those 26 megabytes is noise: JSON brackets, "type": "message" keys on every line, metadata about stickers you don't need.

Before this, I solved similar tasks with scripts that searched for keywords to find relevant sections. I sorted by nicknames and dates. The most annoying part - everything had to be done manually, and some data got lost due to this crude initial filtering.

That's how chatpack was born.

What chatpack Does

Telegram JSON (11M tokens)
        ↓
    [chatpack]
        ↓
   CSV (850K tokens) → LLM → Insights

One command:

chatpack tg result.json -o context.csv

Result:

Input: 11.2M tokens (raw Telegram JSON)
Output: 850K tokens (optimized CSV)
Compression: 13x

Now you can upload this CSV to LLM with an actual prompt:

Here's our team's chat history for 2023.

1. Identify the top 5 most discussed topics
2. Who was most active in each quarter?
3. Is there a correlation between time of day and message tone?

Supported Platforms

chatpack tg chat.json      # Telegram
chatpack wa chat.txt       # WhatsApp
chatpack ig message_1.json # Instagram
chatpack dc export.json    # Discord

I started with Telegram work groups. Then I added other messengers - to automate my processes and identify common trends.

Why CSV Compresses Best

Comparison on real data (34K messages):

Format	Tokens	Compression
Raw JSON	11,177,258	baseline
CSV	849,915	13.2x
JSONL	1,029,130	10.9x
JSON (clean)	1,333,586	8.4x

Why such a difference?

// JSON: each message is
{"sender": "Alice", "content": "Hello"}

// CSV: just data
Alice;Hello

JSON wastes tokens on:

Brackets {} and []
Keys "sender":, "content": - for every message
Quotes around every string

CSV uses ; as a delimiter (one character) and writes keys only in the header.

You can also work with CSV in Excel - calculate response times, average message length, build engagement charts.

Merge Consecutive: The Obvious Feature Everyone Ignores

A typical chat looks like this:

Alice: Hey
Alice: How are you?
Alice: Did you see the new project?
Bob: Yeah, I looked
Bob: Pretty good overall

After merge:

Alice: Hey
How are you?
Did you see the new project?
Bob: Yeah, I looked
Pretty good overall

Here's what the actual CSV output looks like:

# Without merge (5 rows, ~45 tokens)
Sender;Content
Alice;Hey
Alice;How are you?
Alice;Did you see the new project?
Bob;Yeah, I looked
Bob;Pretty good overall

# With merge (2 rows, ~35 tokens)
Sender;Content
Alice;Hey
How are you?
Did you see the new project?
Bob;Yeah, I looked
Pretty good overall

5 messages → 2 entries. On 34K messages, this gives 24% reduction before even choosing a format.

But the main benefit isn't tokens. Embedding a single complete thought is processed much better than 5 fragmented pieces of the same thought. For RAG pipelines, this is critical.

Anyone who has worked with large chats eventually arrives at merge. This isn't an insight - it's common knowledge among those who analyze conversations.

Why Not Just Use jq or a Python Script?

You could write jq '.messages[] | {from, text}', but that won't give you merge, date filters, or fix Mojibake in Instagram exports. chatpack handles all the edge cases so you don't have to.

Architecture: Why Traits, Not If-Else

When I wrote the Telegram parser, I built in extensibility from the start:

pub trait ChatParser: Send + Sync {
    fn name(&self) -> &'static str;
    fn parse(&self, file_path: &str) -> Result<Vec<InternalMessage>, Box<dyn Error>>;
}

Adding a new messenger - three steps:

// 1. Implement the parser
pub struct SignalParser;
impl ChatParser for SignalParser {
    fn name(&self) -> &'static str { "Signal" }
    fn parse(&self, path: &str) -> Result<Vec<InternalMessage>, _> {
        // ...
    }
}

// 2. Add to Source enum
pub enum Source { Telegram, WhatsApp, Instagram, Discord, Signal }

// 3. Add to factory
Source::Signal => Box::new(SignalParser::new())

This was a planned decision from day one. The Strategy pattern isn't overengineering when you know you'll be extending.

Practical Use Cases

Engagement Analysis

I needed to add an engagement metric - to identify when a person is most active and how this changed over the years.

chatpack tg group.json -t -o timeline.csv

The -t flag adds timestamps. Now the CSV has a date column for each message:

Timestamp;Sender;Content
2024-01-15 10:30:00;Alice;Good morning!
2024-01-15 10:31:00;Bob;Hey

Upload to Claude: "Build an activity chart by hour for each participant."

An LLM can't understand what someone is doing outside the chat. But general trends - when the platform changed, who was active during which period, how team mood shifted - it identifies perfectly.

HR Patterns

I use filters often when I need to identify patterns for a specific person:

chatpack tg chat.json --from "Ivan" -o ivan.csv

I get all messages from one participant. Then analyze:

Grammar
Humor style
Average message length
Activity times

It's like HR screening before team selection - just automated.

Project Archaeology

The most useful case for me personally:

chatpack tg chat.json --after 2022-01-01 --before 2023-01-01 -o year_2022.csv

I was looking for information about the old platform, former staff and students, advice from graduates about where they got jobs. The LLM handled the task, identifying what I needed. Over time, I formulated concrete ideas for improving the platform.

Technical Details

Performance

Speed: 20-50K messages/sec
Memory: ~3x file size (entire JSON in memory)
Ceiling: files up to 500MB comfortably, 1GB+ needs streaming (not implemented)

Why no streaming? I just needed a library that would solve my problem. Most of the time, message exports don't exceed 1 MB. I consciously chose not to overcomplicate things.

When chatpack Won't Help

Files >1GB - no streaming, will eat your RAM
Media context needed - photos, voice messages are ignored, text only
Real-time analysis - this is a batch tool
Preserving exact JSON structure - chatpack normalizes everything to a flat format

Why Rust

Rust is currently my main stack. I used it because the project fits the criteria:

CLI tool - Rust is ideal
Need speed - check
Need reliability - types prevent mistakes

Thanks to this project, I also learned how to work with WASM and built a web version.

Handling Edge Cases

WhatsApp exports dates in different formats depending on locale:

[1/15/24, 10:30 AM]      // US
[15.01.24, 10:30]        // EU
26.10.2025, 20:40 -      // RU

chatpack auto-detects the format from the first 20 lines and parses correctly.

Instagram stores UTF-8 text as ISO-8859-1 (Mojibake). "Привет" becomes "ÐŸÑ€Ð¸Ð²ÐµÑ‚". The parser fixes this automatically.

WASM: chatpack in the Browser

Not everyone wants to install a CLI. Some people just need to convert one file without touching the terminal. That's why there's chatpack.berektassuly.com.

The workflow is simple:

Drag & drop your export file
Choose output format
Download the result

The key point: your file never leaves your machine. Everything is processed locally in the browser via WebAssembly. No server, no uploads, no privacy concerns.

Repository: github.com/Berektassuly/chatpack-web

Installation and Usage

Pre-built Binaries

Download for your platform from GitHub Releases:

Windows (x64)
macOS (Intel and Apple Silicon)
Linux (x64)

Via Cargo

cargo install chatpack

As a Library

[dependencies]
chatpack = "0.3"

use chatpack::prelude::*;

let parser = create_parser(Source::Telegram);
let messages = parser.parse("chat.json")?;
let merged = merge_consecutive(messages);
write_csv(&merged, "output.csv", &OutputConfig::new())?;

Conclusion

chatpack is a tool I wrote for myself. The problem was specific: analyze chat history through an LLM. The solution turned out to be universal.

If I were starting today, I'd keep the development process the same:

Telegram parser
Trait for extensibility
Other messengers as needed
WASM for those who find CLI overkill

If anyone needs it - feel free to use it in your project. If there are issues, I'll listen to what users need and fix it.

Links:

GitHub: github.com/berektassuly/chatpack
Web: chatpack.berektassuly.com
WASM repo: github.com/Berektassuly/chatpack-web
Author: Mukhammedali Berektassuly

chatpack: How I Compressed 11 Million Tokens Down to 850K for Chat Analysis

The Problem Nobody Notices

What chatpack Does

Supported Platforms

Why CSV Compresses Best

Merge Consecutive: The Obvious Feature Everyone Ignores

Why Not Just Use jq or a Python Script?

Architecture: Why Traits, Not If-Else

Practical Use Cases

Engagement Analysis

HR Patterns

Project Archaeology

Technical Details

Performance

When chatpack Won't Help

Why Rust

Handling Edge Cases

WASM: chatpack in the Browser

Installation and Usage

Pre-built Binaries

Via Cargo

As a Library

Conclusion

Comments

More from this blog

Architecture as LEGO: Building a Testable Rust Service with Blockchain Abstraction

Solana, PostgreSQL, and the Perils of Dual-Writes: A Rust Case Study

Command Palette

The Problem Nobody Notices

What chatpack Does

Supported Platforms

Why CSV Compresses Best

Merge Consecutive: The Obvious Feature Everyone Ignores

Why Not Just Use jq or a Python Script?

Architecture: Why Traits, Not If-Else

Practical Use Cases

Engagement Analysis

HR Patterns

Project Archaeology

Technical Details

Performance

When chatpack Won't Help

Why Rust

Handling Edge Cases

WASM: chatpack in the Browser

Installation and Usage

Pre-built Binaries

Via Cargo

As a Library

Conclusion

Comments

More from this blog