uditgoenka/autoresearch
↗ GitHubClaude Autoresearch Skill — Autonomous goal-directed iteration for Claude Code. Inspired by Karpathy's autoresearch. Modify → Verify → Keep/Discard → Repeat forever.
3,100
Stars
234
Forks
12
Watchers
3
Open Issues
Safety Rating A
The repository contains only Markdown skill/command definition files and shell-based documentation. There are no executable scripts, hardcoded secrets, suspicious dependencies, or obfuscated code. The content instructs Claude Code to use git operations, run user-specified test/benchmark commands, and perform read-only security analysis — all standard agentic coding patterns. No prompt injection attempts were detected; the README content is straightforwardly instructional. The security audit command is explicitly read-only by default, with auto-remediation requiring an opt-in flag.
ℹAI-assisted review, not a professional security audit.
AI Analysis
Claude Autoresearch is a Claude Code skill/plugin that implements an autonomous goal-directed iteration loop for any measurable improvement task. Inspired by Karpathy's autoresearch framework, it generalizes the 'modify → verify → keep/discard → repeat' pattern beyond ML to any domain with a mechanical metric. It ships as 10 slash commands covering autonomous optimization loops, security audits, shipping workflows, bug hunting, error fixing, documentation generation, multi-persona prediction, adversarial refinement, and scenario exploration. Changes are atomic, git-committed before verification, and auto-reverted on failure with results tracked in TSV logs.
Use Cases
- Autonomously improving code quality metrics (test coverage, bundle size, performance) overnight without human intervention
- Running iterative security audits using STRIDE/OWASP threat modeling with structured report output
- Chaining debug → fix workflows to hunt and repair bugs in a codebase systematically
- Generating and maintaining codebase documentation via an autonomous scout-generate-validate loop
- Exploring edge cases and generating test scenarios across 12 dimensions for any feature or workflow
- Converging on architecture or product decisions through adversarial multi-agent debate with blind judge panels
Tags
Project Connections
karpathy/autoresearch
Claude Autoresearch explicitly states it is based on Karpathy's autoresearch and generalizes its core loop (modify → verify → keep/discard → repeat) beyond ML training to any domain with a mechanical metric.
garrytan/gstack
Both are Claude Code skill collections implementing structured multi-role/multi-command workflows for AI-assisted development, but gstack focuses on a virtual engineering team with specialist personas while autoresearch focuses on autonomous metric-driven iteration loops.
gsd-build/get-shit-done
Both extend Claude Code with slash-command workflows for autonomous coding tasks; GSD focuses on spec-driven planning and parallel subagent orchestration while autoresearch focuses on iterative metric optimization with automatic rollback.
One-Man-Company/Skills-ContextManager
Skills-ContextManager provides a local web UI and MCP server for managing and dynamically loading skills into AI agents, making it a natural complement for organizing and deploying autoresearch skills alongside other skill libraries.
shanraisshan/claude-code-best-practice
The Claude Code best-practices reference documents patterns, hooks, and orchestration techniques that directly apply to configuring and extending autoresearch workflows, serving as a learning companion for users adopting the skill.