LLM Drift in Long Sessions: Claude 60% vs 85% Integrity

Source: Hacker News / Calmkeep.ai | Type: AI/LLM Evaluation | Rating: ★★★★☆

Key finding: When using Claude for long coding sessions (25+ turns), structural integrity drops from 100% to 60% without continuity layer, but only to 85% with it. The model progressively abandons established architectural patterns.

Methodology

Two transcripts were evaluated using identical task prompts:

Both were audited using a structured "Compliance & Integrity Audit" prompt.

Architecture Laws Established (Turns 1-5)

LawDescription
LAW-01Module-Based Architecture (vertical slicing)
LAW-02Service Layer Owns All DB Access
LAW-03Org-Scoped Queries - Every query MUST include org_id
LAW-04Centralized Error Classes (custom AppError hierarchy)
LAW-05Env Config - Centralized Fail-Fast from config/env.ts
LAW-06Prisma as Sole ORM (No Raw SQL)
LAW-07Validation - Schema-First (Zod adopted mid-session)
LAW-08Single Source of Truth for Shared Logic

Results

MetricTranscript A (No Continuity)Transcript B (With Continuity)
Total AVEs83
Drift Coefficient40%15%
Final Integrity60%85%
Decay Onset TurnT8T23
Post-T14 BackslideYESNO
Critical insight: After T14 (Zod migration), Transcript A BACKSLID by reintroducing raw parseInt for pagination in new modules. The model "forgot" its own refactor.

Common Violations in Transcript A

Implications for AI Coding Agents

Good news: Transcript B maintained 100% integrity through T20, with only 3 minor violations in T23-24. This shows architectural patterns CAN be maintained with the right context management.

URL: https://calmkeep.ai/codetestreport