Make money doing the work you believe in

Give me 60 seconds, and I'll teach you how Cursor indexes code (no BS):

It took 12 months for Cursor to reach $100M ARR.

Cursor IDE is the best I've ever seen.

Here’s how Merkle trees make it possible:

0. Merkle trees 101:

↳ Hierarchical hash chains that fingerprint data blocks

↳ Leaf nodes = hash of code chunks

↳ Parent nodes = hash of child hashes

↳ Root hash = single fingerprint for the entire codebase

Key benefit: Detect changes instantly by comparing root hashes.

1. Code chunking strategies:

↳ AST-based splitting: Uses tree-sitter to parse code into logical blocks (functions, classes)

↳ Token limits: Merge sibling AST nodes without exceeding model token caps (e.g., 8k for OpenAI)

↳ Semantic boundaries: Avoid mid-function splits for better embeddings

2. Merkle tree construction:

↳ Local hashing: Compute SHA-256 hashes for all code chunks

↳ Tree sync: Compare root hash with server to identify changed files

↳Incremental uploads: Only modified chunks get re-embedded

Result: 90% fewer uploads vs. full re-indexing.

3. Embedding and privacy:

↳ Uses OpenAI’s text-embedding-3-small or custom code-specific models

↳ Obfuscates file paths with client-side encryption (e.g., src/utils . py → a1b2/c3d4/e5f6)

↳ No raw code stored, embeddings purged after request

4. RAG for code generation:

Here's what happens when you ask about your codebase.

↳ Query vector DB (Turbopuffer) for relevant chunks

↳ Inject top matches into LLM context

↳ Generate answers using GPT-4 + codebase context

5. Why Merkle trees?

↳ Bandwidth savings: Sync only delta changes (Git-like)

↳ Cache optimization: Hash-indexed embeddings enable instant re-indexing

↳ Data integrity: Tamper-proof codebase fingerprints

6. Technical Challenges:

⚠️ Network overhead: Retries due to server load spike traffic

⚠️ AST parsing edge cases: Language-specific syntax quirks

⚠️ Embedding inversion risks: Theoretical code leaks from vectors (mitigated by short TTLs)

I find Cursor best with clear guidelines, task definition, and small to mid-size projects.

With OpenAI buying Windsurf for $3Bn, I'm excited to see where Cursor will go.

Are you using Cursor?

May 13, 2025
at
8:08 AM
Relevant people

Log in or sign up

Join the most interesting and insightful discussions.