🍋 Lemonade: Fast Open Source Local LLM Server

Source: Hacker News (413 points) | ★★★★☆ | 2026-04-02

open-source local-ai llm gpu npu

What is Lemonade?

A refreshingly fast local LLM server that runs on GPUs and NPUs. Open source, private, and ready in minutes on any PC.

Native C++ Backend - Lightweight service that is only 2MB
One Minute Install - Simple installer that sets up the stack automatically
OpenAI API Compatible - Works with hundreds of apps out-of-box
Auto-configures for your hardware - Configures dependencies for your GPU and NPU
Multi-engine compatibility - Works with llama.cpp, Ryzen AI SW, FastFlowLM, and more
Multiple Models at Once - Run more than one model at the same time
Cross-platform - Windows, Linux, and macOS (beta)
Built-in GUI - Download, try, and switch models quickly

One local service for every modality - chat, vision, image generation, transcription, speech generation with standard APIs.

Local AI should be free, open, fast, and private. Lemonade brings enterprise-grade local AI capabilities to any desktop without cloud dependencies.

With 128GB unified RAM, you can load models like gpt-oss-120b or Qwen-Coder-Next for advanced tool use.