← Back to Catalog

uditgoenka/autoresearch

↗ GitHub

Claude Autoresearch Skill — Autonomous goal-directed iteration for Claude Code. Inspired by Karpathy's autoresearch. Modify → Verify → Keep/Discard → Repeat forever.

3,100

Stars

234

Forks

12

Watchers

3

Open Issues

Shell·MIT License·Last commit Mar 31, 2026·by @uditgoenka·Published April 3, 2026·Analyzed 5d ago
A

Safety Rating A

The repository contains only Markdown skill/command definition files and shell-based documentation. There are no executable scripts, hardcoded secrets, suspicious dependencies, or obfuscated code. The content instructs Claude Code to use git operations, run user-specified test/benchmark commands, and perform read-only security analysis — all standard agentic coding patterns. No prompt injection attempts were detected; the README content is straightforwardly instructional. The security audit command is explicitly read-only by default, with auto-remediation requiring an opt-in flag.

AI-assisted review, not a professional security audit.

AI Analysis

Claude Autoresearch is a Claude Code skill/plugin that implements an autonomous goal-directed iteration loop for any measurable improvement task. Inspired by Karpathy's autoresearch framework, it generalizes the 'modify → verify → keep/discard → repeat' pattern beyond ML to any domain with a mechanical metric. It ships as 10 slash commands covering autonomous optimization loops, security audits, shipping workflows, bug hunting, error fixing, documentation generation, multi-persona prediction, adversarial refinement, and scenario exploration. Changes are atomic, git-committed before verification, and auto-reverted on failure with results tracked in TSV logs.

Use Cases

  • Autonomously improving code quality metrics (test coverage, bundle size, performance) overnight without human intervention
  • Running iterative security audits using STRIDE/OWASP threat modeling with structured report output
  • Chaining debug → fix workflows to hunt and repair bugs in a codebase systematically
  • Generating and maintaining codebase documentation via an autonomous scout-generate-validate loop
  • Exploring edge cases and generating test scenarios across 12 dimensions for any feature or workflow
  • Converging on architecture or product decisions through adversarial multi-agent debate with blind judge panels

Tags

#ai-agents#workflow-automation#code-generation#prompt-management#context-engineering#multi-agent#evaluation#plugin#cli-tool#template#testing#security

Project Connections

Inspired by / successor to

karpathy/autoresearch

Claude Autoresearch explicitly states it is based on Karpathy's autoresearch and generalizes its core loop (modify → verify → keep/discard → repeat) beyond ML training to any domain with a mechanical metric.

Alternative to

garrytan/gstack

Both are Claude Code skill collections implementing structured multi-role/multi-command workflows for AI-assisted development, but gstack focuses on a virtual engineering team with specialist personas while autoresearch focuses on autonomous metric-driven iteration loops.

Alternative to

gsd-build/get-shit-done

Both extend Claude Code with slash-command workflows for autonomous coding tasks; GSD focuses on spec-driven planning and parallel subagent orchestration while autoresearch focuses on iterative metric optimization with automatic rollback.

Complements

One-Man-Company/Skills-ContextManager

Skills-ContextManager provides a local web UI and MCP server for managing and dynamically loading skills into AI agents, making it a natural complement for organizing and deploying autoresearch skills alongside other skill libraries.

Complements

shanraisshan/claude-code-best-practice

The Claude Code best-practices reference documents patterns, hooks, and orchestration techniques that directly apply to configuring and extending autoresearch workflows, serving as a learning companion for users adopting the skill.