Lemonade by AMD: Local AI for Everyone
TL;DR: AMD releases Lemonade - an open source, fast local LLM server that runs on GPU and NPU, with OpenAI API compatibility and cross-platform support.
What is Lemonade?
Lemonade is a refreshingly fast local AI server built by AMD for GPUs and NPUs. It exists because local AI should be free, open, fast, and private.
Key Features
- Native C++ Backend: Lightweight service that's only 2MB
- One Minute Install: Simple installer that sets up the stack automatically
- OpenAI API Compatible: Works with hundreds of apps out-of-box
- Auto-configures for Hardware: Configures dependencies for your GPU and NPU
- Multi-engine Compatibility: Works with llama.cpp, Ryzen AI SW, FastFlowLM, and more
- Multiple Models at Once: Run more than one model simultaneously
- Cross-platform: Consistent experience across Windows, Linux, and macOS (beta)
- Built-in App: GUI to download, try, and switch models quickly
Capabilities
Text/LLM
Load up models like gpt-oss-120b or Qwen-Coder-Next for advanced tool use with 128 GB unified RAM.
Image Generation
Generate images directly from the local server.
Speech
Transcription and speech generation capabilities built-in.
Unified API
One local service for every modality. Point your app at Lemonade and get chat, vision, image generation, transcription, speech generation, and more with standard APIs.
POST /api/v1/chat/completions
Why It Matters
Lemonade addresses critical needs in the local AI space:
- Privacy: All processing happens locally - no data leaves your machine
- Cost: No API fees - run models on your own hardware
- Accessibility: One-minute install makes local AI approachable
- Compatibility: OpenAI API means easy integration with existing tools
- Performance: AMD GPU/NPU optimization delivers fast inference
With Lemonade, AMD is making local AI more accessible to developers and users who want privacy, control, and cost savings.