jundot/omlx
↗ GitHubLLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar
8,165
Stars
682
Forks
49
Watchers
52
Open Issues
Safety Rating A
No hardcoded secrets, malicious code patterns, suspicious dependencies, or prompt injection attempts were found. The repository is a well-structured open source inference server project under Apache 2.0, with clear attribution to upstream dependencies (MLX, mlx-lm, vllm-mlx). The optional API key feature is a user-supplied value at runtime, not embedded in code. No red flags identified.
ℹAI-assisted review, not a professional security audit.
AI Analysis
oMLX is an LLM inference server optimized for Apple Silicon Macs, featuring continuous batching, tiered KV caching (hot RAM + cold SSD), and a native macOS menu bar app. It provides an OpenAI and Anthropic-compatible API endpoint, supports text LLMs, vision-language models, embeddings, and rerankers, and includes a web-based admin dashboard for real-time monitoring, model management, and benchmarking. KV cache blocks persist across requests and server restarts via SSD offloading, making local LLM serving practical for agentic coding workflows.
Use Cases
- Running local LLMs on Apple Silicon Macs with OpenAI-compatible API
- Serving multiple models concurrently with LRU eviction and model pinning
- Persisting KV cache to SSD to avoid recomputation across long coding sessions (e.g., with Claude Code)
- Downloading and managing MLX-format models from HuggingFace via a web dashboard
- Using MCP (Model Context Protocol) tool calling with locally served models
- Embedding and reranking documents for RAG pipelines on Apple Silicon
Tags
Project Connections
mlx-lm
oMLX directly depends on mlx-lm for its BatchGenerator and LLM inference pipeline on Apple Silicon
mlx-vlm
oMLX uses mlx-vlm for vision-language model inference support
vLLM
oMLX's block-based paged KV cache design is explicitly inspired by vLLM, and it evolved from vllm-mlx v0.1.0
LM Studio
Both provide local LLM serving with GUI management on Apple Silicon, targeting a similar developer audience
Ollama
Both are self-hosted local LLM servers with OpenAI-compatible APIs, though Ollama is cross-platform while oMLX is Apple Silicon-specific with deeper macOS integration