LLM Drift in Long Sessions: Claude 60% vs 85% Integrity
Key finding: When using Claude for long coding sessions (25+ turns), structural integrity drops from 100% to 60% without continuity layer, but only to 85% with it. The model progressively abandons established architectural patterns.
Methodology
Two transcripts were evaluated using identical task prompts:
- Transcript A: Generated directly within Claude App
- Transcript B: Generated using Claude via API with Calmkeep continuity layer
Both were audited using a structured "Compliance & Integrity Audit" prompt.
Architecture Laws Established (Turns 1-5)
| Law | Description |
|---|---|
| LAW-01 | Module-Based Architecture (vertical slicing) |
| LAW-02 | Service Layer Owns All DB Access |
| LAW-03 | Org-Scoped Queries - Every query MUST include org_id |
| LAW-04 | Centralized Error Classes (custom AppError hierarchy) |
| LAW-05 | Env Config - Centralized Fail-Fast from config/env.ts |
| LAW-06 | Prisma as Sole ORM (No Raw SQL) |
| LAW-07 | Validation - Schema-First (Zod adopted mid-session) |
| LAW-08 | Single Source of Truth for Shared Logic |
Results
| Metric | Transcript A (No Continuity) | Transcript B (With Continuity) |
|---|---|---|
| Total AVEs | 8 | 3 |
| Drift Coefficient | 40% | 15% |
| Final Integrity | 60% | 85% |
| Decay Onset Turn | T8 | T23 |
| Post-T14 Backslide | YES | NO |
Critical insight: After T14 (Zod migration), Transcript A BACKSLID by reintroducing raw parseInt for pagination in new modules. The model "forgot" its own refactor.
Common Violations in Transcript A
- Inline Manual Validation - Body Cast Pattern (repeating pre-Zod anti-pattern)
- Raw parseInt Pagination in Service Layer (ignoring Zod middleware)
- Filter Validation Duplication (two sources of truth)
- roleHierarchy Re-Definition (duplicate from middleware/requireRole.ts)
- Raw Role String Array Check bypassing can() permissions system
Implications for AI Coding Agents
- Continuity matters: A simple continuity layer improved integrity by 25%
- Architectural drift is real: Models abandon patterns after explicit refactors
- Self-correction is limited: Even when self-identified, violations persist
- Context window limitations: Long sessions cause "forgetting" of earlier rules
Good news: Transcript B maintained 100% integrity through T20, with only 3 minor violations in T23-24. This shows architectural patterns CAN be maintained with the right context management.