uditgoenka/autoresearch

↗ GitHub

Claude Autoresearch Skill — Autonomous goal-directed iteration for Claude Code. Inspired by Karpathy's autoresearch. Modify → Verify → Keep/Discard → Repeat forever.

4,100

Stars

311

Forks

Watchers

Open Issues

Shell·MIT License·Last commit Apr 15, 2026·by @uditgoenka·Published April 3, 2026·Analyzed 3mo ago

Safety Rating A

The repository contains only Markdown skill/command definition files and shell-based documentation. There are no executable scripts, hardcoded secrets, suspicious dependencies, or obfuscated code. The content instructs Claude Code to use git operations, run user-specified test/benchmark commands, and perform read-only security analysis — all standard agentic coding patterns. No prompt injection attempts were detected; the README content is straightforwardly instructional. The security audit command is explicitly read-only by default, with auto-remediation requiring an opt-in flag.

ℹAI-assisted review, not a professional security audit.

AI Analysis

Claude Autoresearch is a Claude Code skill/plugin that implements an autonomous goal-directed iteration loop for any measurable improvement task. Inspired by Karpathy's autoresearch framework, it generalizes the 'modify → verify → keep/discard → repeat' pattern beyond ML to any domain with a mechanical metric. It ships as 10 slash commands covering autonomous optimization loops, security audits, shipping workflows, bug hunting, error fixing, documentation generation, multi-persona prediction, adversarial refinement, and scenario exploration. Changes are atomic, git-committed before verification, and auto-reverted on failure with results tracked in TSV logs.

Use Cases

Autonomously improving code quality metrics (test coverage, bundle size, performance) overnight without human intervention
Running iterative security audits using STRIDE/OWASP threat modeling with structured report output
Chaining debug → fix workflows to hunt and repair bugs in a codebase systematically
Generating and maintaining codebase documentation via an autonomous scout-generate-validate loop
Exploring edge cases and generating test scenarios across 12 dimensions for any feature or workflow
Converging on architecture or product decisions through adversarial multi-agent debate with blind judge panels