karpathy/autoresearch
↗ GitHubAI agents running research on single-GPU nanochat training automatically
63,257
Stars
8,887
Forks
521
Watchers
163
Open Issues
Safety Rating A
No hardcoded secrets, malicious code patterns, suspicious dependencies, or prompt injection attempts were detected. The repository is a straightforward Python/PyTorch project authored by a well-known researcher (Andrej Karpathy). The README contains narrative flavor text describing a fictional future, but this is clearly creative writing and not an attempt to manipulate AI analysis. The setup script fetches uv via curl (a common pattern), which is a minor operational consideration but not a security finding given the tool's legitimacy. Overall the project appears to be a legitimate open-source research tool.
ℹAI-assisted review, not a professional security audit.
AI Analysis
autoresearch is a framework for autonomous AI-driven machine learning research. It provides a minimal single-GPU LLM training setup (based on nanochat) and a looping agent harness that allows an AI agent (e.g., Claude or Codex) to iteratively modify the training code, run 5-minute experiments, evaluate results via validation bits-per-byte, and keep or discard changes — repeating this cycle overnight without human intervention. The human-facing interface is a Markdown 'program.md' file that acts as an instruction set for the agent, rather than direct code modification.
Use Cases
- Autonomous overnight hyperparameter and architecture search for small LLMs
- AI-agent-driven research experimentation on a single NVIDIA GPU
- Rapid iteration on GPT-style model training code without manual researcher intervention
- Educational demonstration of agentic AI research workflows
- Platform-specific LLM training optimization (H100, Mac, Windows via forks)
Tags
Project Connections
claude-scientific-skills
→claude-scientific-skills provides 136 domain-specific research skills covering bioinformatics, genomics, drug discovery, materials science, and more. These skills are directly applicable as the knowledge base powering the AI agent that runs inside the autoresearch experiment loop.
ClawWork
→Both evaluate AI agents on real-world tasks in autonomous looping harnesses. autoresearch focuses on ML experiment iteration with single-GPU training and validation metrics; ClawWork benchmarks agent performance across 220 professional tasks with a real economic budget constraint.
OpenJarvis
→OpenJarvis provides local-first agent infrastructure with multiple inference backends (Ollama, vLLM, llama.cpp) and energy or cost evaluation as first-class constraints. It can serve as the on-device compute layer hosting the LLM agent driving autoresearch overnight experiment loops.