shannon

MCP-LLM Backend

An elegant, Anthropic-compatible LLM backend with native MCP (Model Context Protocol) support. Run local LLMs with tool-calling capabilities while seamlessly integrating with MCP servers.

Features

  • Anthropic API Compatibility: Drop-in replacement for Claude API
  • Native MCP Support: Automatically discovers and exposes MCP tools
  • Fast Local Inference: Uses llama.cpp for efficient GGUF model serving
  • Plugin Architecture: Extensible design for custom functionality
  • Streaming Support: Full SSE streaming compatible with Anthropic clients
  • Tool Calling: Native support for models with tool-calling capabilities

Architecture

User Application → Anthropic API → Orchestrator → LLM Engine (llama.cpp)
                                    MCP Manager → MCP Servers

The system acts as an MCP Host, managing connections to multiple MCP servers while using a local LLM to intelligently orchestrate tool calls.

Installation

# Install dependencies
pip install -e .

# Download a compatible GGUF model (e.g., Llama 3.2 Instruct)
mkdir models
wget -O models/llama-3.2-3b-instruct.gguf [model-url]

Configuration

Edit config.yaml:

models:
  local:
    - name: "local-llama-3.2"
      path: "models/llama-3.2-3b-instruct.gguf"
      context_size: 8192
      gpu_layers: 35

mcp:
  servers:
    - name: "filesystem"
      transport: "stdio"
      command: ["npx", "-y", "@modelcontextprotocol/server-filesystem", "/tmp"]

Usage

Start the Server

python main.py

The server runs on http://localhost:8000 by default.

Using with Claude Desktop

Configure Claude Desktop to use the local backend:

{
  "apiProvider": "anthropic",
  "apiUrl": "http://localhost:8000/v1",
  "apiKey": "local"
}

Using with Python

from anthropic import Anthropic

client = Anthropic(
    base_url="http://localhost:8000/v1",
    api_key="local"
)

response = client.messages.create(
    model="local-llama-3.2",
    messages=[
        {"role": "user", "content": "List files in /tmp"}
    ],
    tools="auto",  # Automatically use MCP tools
    max_tokens=1024
)

print(response.content)

Using with curl

curl -X POST http://localhost:8000/v1/messages \
  -H "Content-Type: application/json" \
  -d '{
    "model": "local-llama-3.2",
    "messages": [
      {"role": "user", "content": "What files are in /tmp?"}
    ],
    "tools": "auto",
    "max_tokens": 1024
  }'

MCP Integration

The backend automatically:

  1. Connects to configured MCP servers on startup
  2. Discovers available tools from each server
  3. Exposes them as Anthropic-format tools to the LLM
  4. Routes tool calls to the appropriate MCP server
  5. Returns results in the response

Supported MCP Servers

  • @modelcontextprotocol/server-filesystem - File system access
  • @modelcontextprotocol/server-memory - Memory/knowledge base
  • @modelcontextprotocol/server-postgres - Database access
  • Any MCP-compliant server

API Endpoints

  • POST /v1/messages - Anthropic Messages API
  • GET /v1/models - List available models
  • GET /health - Health check and MCP server status

Development

Project Structure

src/
├── api/           # Anthropic-compatible API
├── core/          # Orchestration and types
├── llm/           # LLM engine abstraction
├── mcp/           # MCP client and manager
├── adapters/      # Format converters
└── plugins/       # Extension system

Adding a New LLM Engine

Implement the LLMEngine interface:

from src.llm.base import LLMEngine

class MyEngine(LLMEngine):
    async def generate(self, messages, tools, **kwargs):
        # Your implementation
        pass

Creating Plugins

from src.plugins.base import Plugin

class MyPlugin(Plugin):
    async def on_request(self, context):
        # Pre-process request
        pass
    
    async def on_response(self, context):
        # Post-process response
        pass

Requirements

  • Python 3.10+
  • CUDA-capable GPU (optional, for acceleration)
  • Compatible GGUF model with tool-calling support

License

MIT

Contributing

Contributions welcome! The design prioritizes:

  • Clean abstractions
  • Minimal dependencies
  • Anthropic API compatibility
  • Efficient MCP integration

Repository Stats

  • Stars: 0
  • Forks: 0
  • Primary Language: Python