ChromaFs: Virtual Filesystem for AI Assistants
Summary
Mintlify built ChromaFs, a virtual filesystem that replaces heavy VM sandboxes with lightweight ChromaDB queries. This clever infrastructure optimization reduced session boot time from 46 seconds to 100 milliseconds and eliminated marginal compute costs.
The Problem
- Traditional approach: Spin up isolated micro-VM sandbox, clone repo
- P90 boot time: ~46 seconds (including GitHub clone + setup)
- Infrastructure cost: At 850K conversations/month, naive approach costs $70,000+/year
- Latency unacceptable: Users stare at loading spinner during session creation
The Solution: ChromaFs
The key insight: the agent doesn't need a real filesystem—it just needs the illusion of one. Documentation was already indexed and chunked in Chroma for search, so they built ChromaFs to intercept UNIX commands and translate them into ChromaDB queries.
Performance Comparison
P90 Boot Time
46s → 100ms
Marginal Compute Cost
$0.0137/conversation → $0
How It Works
1. Bootstrapping the Directory Tree
- Entire file tree stored as gzipped JSON in Chroma collection
- On init: fetch and decompress into in-memory structures
- ls, cd, find resolve in local memory—no network calls
- Tree cached for subsequent sessions
2. Access Control
- Path tree includes isPublic and groups fields
- Before building tree, prune slugs using user's session token
- Files user can't access are excluded from tree entirely
- Built-in RBAC without new infrastructure
3. Reassembling Pages from Chunks
- Pages split into chunks for embedding
- cat /auth/oauth.mdx fetches all chunks with matching slug
- Sort by chunk_index, join into full page
- Results cached for repeated reads
4. Grep Optimization
- Built on just-bash (Vercel Labs)—TypeScript reimplementation of bash
- Intercept just-bash's grep, parse flags with yargs-parser
- Chroma acts as coarse filter
- Bulk prefetch matching chunks into Redis cache
- Rewrite grep to target only matched files for fine in-memory execution
Key Takeaways
- Virtual over real: Sometimes the illusion is enough—no need for heavyweight infrastructure
- Leverage existing: Reusing Chroma DB they already pay for = zero marginal cost
- RBAC built-in: Per-user access control without Linux user groups or isolated containers
- Read-only design: EROFS errors on writes = stateless, no session cleanup, no corruption risk
Impact
Powers documentation assistant for hundreds of thousands of users across 30,000+ conversations/day. The pattern applies to any team with indexed documentation and agents that need filesystem-like access.
探索时间: 2026-04-03 12:45 UTC