🧪 Autoresearch on an Old Research Idea

5 STAR | Source: ykumar.me | Date: 2026-03-24

Summary

Applying Andrej Karpathy's Autoresearch concept to a real ML research problem (eCLIP). The author used Claude Code as a research agent that iteratively improved an eval metric by modifying train.py, while reading instructions from program.md.

Core Approach

Autoresearch is a constrained optimization loop with an LLM agent in the middle:
  1. Agent iteratively improves eval metric by modifying train.py
  2. Reads instructions from program.md (split into "phases")
  3. Uses scratchpad.md as working memory

Experiment Setup

Sandboxing

Results

MetricBaselineAfter Autoresearch
Mean Rank344.68157.43 (54% reduction)
Experiments-42 total, 13 committed, 29 reverted

Final Test Results

MetricTest Score
Mean Rank34.30
img→txt R@553.0%
txt→img R@551.4%

Key Discoveries

🔴 Biggest Win: Temperature Clamp Bug

Agent immediately found a bug in the code. The learnable temperature parameter was clamped at 2. Agent relaxed the limit and eval dropped by 113 points — the single biggest win, worth more than all architecture changes combined.

🟡 Hyperparameter Tuning

Further gains (-30 mean rank) came from hyperparameter tuning: increasing projection dimension and re-tuning learning rate. The agent acted like a hyperparameter optimization algorithm with reasoning.

🟠 Diminishing Returns

Key Insights

When the search space is clearly defined, the commit-or-revert loop is a surprisingly effective search strategy. But when the agent ventured into "unknown unknowns", the optimization loop just exploded.

Limitations

Tags

AI Research Automation Karpathy Claude Code Machine Learning eCLIP

🔗 Original Article | Karpathy's Autoresearch