ebk
Resources & Distribution
ebk
ebk is a powerful eBook metadata management tool with a SQLAlchemy + SQLite database backend. It provides a comprehensive fluent API for programmatic use, a rich Typer-based CLI (with colorized output courtesy of Rich), full-text search with FTS5 indexing, automatic text extraction and chunking for semantic search, hash-based file deduplication, and optional AI-powered features including knowledge graphs and semantic search.
Table of Contents
- Features
- Installation
- Quick Start
- Configuration
- CLI Usage
- Python API
- Integrations
- Architecture
- Development
- Contributing
- License
- Documentation
- Stay Updated
- Support
Features
- SQLAlchemy + SQLite Backend: Robust database with normalized schema, proper relationships, and FTS5 full-text search
- Fluent Python API: Comprehensive programmatic interface with method chaining and query builders
- Typer + Rich CLI: A colorized, easy-to-use command-line interface
- Automatic Text Extraction: Extract and index text from PDFs, EPUBs, and plaintext files
- PyMuPDF (primary) with pypdf fallback for PDFs
- ebooklib with HTML parsing for EPUBs
- Automatic chunking (500-word overlapping chunks) for semantic search
- Hash-based Deduplication: SHA256-based file deduplication
- Same file (same hash) = skipped
- Same book, different format = added as additional format
- Hash-prefixed directory storage for scalability
- Full-Text Search: Fast FTS5-powered search across titles, descriptions, and extracted text
- Import from Multiple Sources:
- Calibre libraries (reads metadata.opf files)
- Individual ebook files with auto-metadata extraction
- Batch import with progress tracking
- Cover Extraction: Automatic cover extraction and thumbnail generation
- PDFs: First page rendered as image
- EPUBs: Cover from metadata or naming patterns
- AI-Powered Features (optional):
- LLM Provider Abstraction: Support for multiple LLM backends (Ollama, OpenAI-compatible APIs)
- Metadata Enrichment: Auto-generate tags, categories, and enhanced descriptions using LLMs
- Local & Remote LLM: Connect to local Ollama or remote GPU servers
- Knowledge Graph: NetworkX-based concept extraction and relationship mapping
- Semantic Search: Vector embeddings for similarity search (with TF-IDF fallback)
- Reading Companion: Track reading sessions with timestamps
- Question Generator: Generate active recall questions
- Web Server Interface:
- FastAPI-based REST API for library management
- URL-based navigation with filters, pagination, and sorting
- Clickable covers and file formats to open books
- Book details modal with comprehensive metadata display
- Flexible Exports:
- Export to ZIP archives
- Hugo-compatible Markdown with multiple organization options
- Jinja2 template support for customizable export formats
- Integrations (optional):
- Streamlit Dashboard: Interactive web interface
- MCP Server: AI assistant integration
- Visualizations: Network graphs for analysis
Installation
Basic Installation
pip install ebk
From Source
git clone https://github.com/queelius/ebk.git
cd ebk
pip install .
With Optional Features
# With Streamlit dashboard
pip install ebk[streamlit]
# With visualization tools
pip install ebk[viz]
# With all optional features
pip install ebk[all]
# For development
pip install ebk[dev]
Note: Requires Python 3.10+
Quick Start
1. Initialize Configuration
# Create default configuration file at ~/.config/ebk/config.json
ebk config init
# View current configuration
ebk config show
# Set default library path
ebk config set library.default_path ~/my-library
2. Create and Populate Library
# Initialize a new library
ebk init ~/my-library
# Import a single ebook with auto-metadata extraction
ebk import book.pdf ~/my-library
# Import from Calibre library
ebk import-calibre ~/Calibre/Library --output ~/my-library
# Search using full-text search
ebk search "python programming" ~/my-library
# List books with filtering
ebk list ~/my-library --author "Knuth" --limit 20
# Get statistics
ebk stats ~/my-library
3. Launch Web Interface
# Start web server (uses config defaults)
ebk serve ~/my-library
# Custom port and host
ebk serve ~/my-library --port 8080 --host 127.0.0.1
# Auto-open browser
ebk config set server.auto_open_browser true
ebk serve ~/my-library
4. AI-Powered Metadata Enrichment
# Configure LLM provider
ebk config set llm.provider ollama
ebk config set llm.model llama3.2
ebk config set llm.host localhost
# Enrich library metadata using LLM
ebk enrich ~/my-library
# Enrich with all features
ebk enrich ~/my-library --generate-tags --categorize --enhance-descriptions
# Use remote GPU server
ebk enrich ~/my-library --host 192.168.1.100
Configuration
ebk uses a centralized configuration system stored at ~/.config/ebk/config.json. This configuration file manages settings for LLM providers, web server, CLI defaults, and library preferences.
Configuration File Structure
{
"llm": {
"provider": "ollama",
"model": "llama3.2",
"host": "localhost",
"port": 11434,
"api_key": null,
"temperature": 0.7,
"max_tokens": null
},
"server": {
"host": "0.0.0.0",
"port": 8000,
"auto_open_browser": false,
"page_size": 50
},
"cli": {
"verbose": false,
"color": true,
"page_size": 50
},
"library": {
"default_path": null
}
}
Configuration Management
# Initialize configuration (creates default config file)
ebk config init
# View current configuration
ebk config show
# Edit configuration in your default editor
ebk config edit
# Set specific values
ebk config set llm.provider ollama
ebk config set llm.model mistral
ebk config set server.port 8080
ebk config set library.default_path ~/my-library
# Get specific value
ebk config get llm.model
LLM Provider Configuration
Configure LLM providers for metadata enrichment:
# Local Ollama (default)
ebk config set llm.provider ollama
ebk config set llm.host localhost
ebk config set llm.port 11434
ebk config set llm.model llama3.2
# Remote GPU server
ebk config set llm.host 192.168.1.100
# OpenAI-compatible API (future)
ebk config set llm.provider openai
ebk config set llm.api_key sk-...
ebk config set llm.model gpt-4
CLI Overrides
All commands support CLI arguments that override configuration defaults:
# These override config settings
ebk serve ~/library --port 9000 --host 127.0.0.1
ebk enrich ~/library --host 192.168.1.50 --model mistral
CLI Usage
ebk uses Typer with Rich for a beautiful, colorized CLI experience.
General CLI Structure
ebk --help # See all available commands
ebk <command> --help # See specific command usage
ebk --verbose <command> # Enable verbose output
Database Commands
Core library management with SQLAlchemy + SQLite backend:
# Initialize library
ebk init ~/my-library
# Import books
ebk import book.pdf ~/my-library
ebk import ~/books/*.epub ~/my-library
ebk import-calibre ~/Calibre/Library --output ~/my-library
# Search with full-text search (FTS5)
ebk search "machine learning" ~/my-library
ebk search "author:Knuth" ~/my-library
# List and filter
ebk list ~/my-library
ebk list ~/my-library --author "Knuth" --language en --limit 20
ebk list ~/my-library --format pdf --rating 4
# Statistics
ebk stats ~/my-library
ebk stats ~/my-library --format json
# Manage reading status
ebk rate ~/my-library <book-id> 5
ebk favorite ~/my-library <book-id>
ebk tag ~/my-library <book-id> --add "must-read" "technical"
# Remove books
ebk purge ~/my-library --rating 1 --confirm
Web Server
Launch FastAPI-based web interface:
# Start server (uses config defaults)
ebk serve ~/my-library
# Custom host and port
ebk serve ~/my-library --host 127.0.0.1 --port 8080
# Auto-open browser
ebk serve ~/my-library --auto-open
# Configure defaults in config
ebk config set server.port 8080
ebk config set server.auto_open_browser true
AI-Powered Features
Enrich metadata using LLMs:
# Basic enrichment (uses config settings)
ebk enrich ~/my-library
# Full enrichment
ebk enrich ~/my-library \
--generate-tags \
--categorize \
--enhance-descriptions \
--assess-difficulty
# Enrich specific book
ebk enrich ~/my-library --book-id 42
# Use remote GPU server
ebk enrich ~/my-library --host 192.168.1.100 --model mistral
# Dry run (preview changes without saving)
ebk enrich ~/my-library --dry-run
Configuration Management
Manage global configuration:
# Initialize configuration
ebk config init
# View configuration
ebk config show
ebk config show --section llm
# Edit in default editor
ebk config edit
# Set values
ebk config set llm.model llama3.2
ebk config set server.port 8080
ebk config set library.default_path ~/books
# Get values
ebk config get llm.model
Export and Advanced Features
# Export library
ebk export zip ~/my-library ~/backup.zip
ebk export json ~/my-library ~/metadata.json
# Virtual libraries (filtered views)
ebk vlib create ~/my-library "python-books" --subject Python
ebk vlib list ~/my-library
# Notes and annotations
ebk note add ~/my-library <book-id> "Great chapter on algorithms"
ebk note list ~/my-library <book-id>
Documentation
Comprehensive documentation is available at: https://queelius.github.io/ebk/
Documentation Contents
Getting Started
User Guide
Advanced Topics
Development
Python API
ebk provides a comprehensive SQLAlchemy-based API for programmatic library management:
from pathlib import Path
from ebk.library_db import Library
# Open or create a library
lib = Library.open(Path("~/my-library"))
# Import books with automatic metadata extraction
book = lib.add_book(
Path("book.pdf"),
metadata={"title": "My Book", "creators": ["Author Name"]},
extract_text=True,
extract_cover=True
)
# Fluent query interface
results = (lib.query()
.filter_by_language("en")
.filter_by_author("Knuth")
.filter_by_subject("Algorithms")
.order_by("title", desc=False)
.limit(20)
.all())
# Full-text search (FTS5)
results = lib.search("machine learning", limit=50)
# Get book by ID
book = lib.get_book(42)
print(f"{book.title} by {', '.join([a.name for a in book.authors])}")
# Update reading status
lib.update_reading_status(book.id, "reading", progress=50, rating=4)
# Add tags
lib.add_tags(book.id, ["must-read", "technical"])
# Get statistics
stats = lib.stats()
print(f"Total books: {stats['total_books']}")
print(f"Total authors: {stats['total_authors']}")
print(f"Languages: {', '.join(stats['languages'])}")
# Query with filters
from ebk.db.models import Book, Author
from sqlalchemy import and_
books = lib.session.query(Book).join(Book.authors).filter(
and_(
Author.name.like("%Knuth%"),
Book.language == "en"
)
).all()
# Always close when done
lib.close()
# Or use context manager
with Library.open(Path("~/my-library")) as lib:
results = lib.search("Python programming")
for book in results:
print(book.title)
AI-Powered Metadata Enrichment
from ebk.ai.llm_providers.ollama import OllamaProvider
from ebk.ai.metadata_enrichment import MetadataEnrichmentService
# Initialize provider (local or remote)
provider = OllamaProvider.remote(
host="192.168.1.100",
model="llama3.2"
)
service = MetadataEnrichmentService(provider)
async with provider:
# Generate tags
tags = await service.generate_tags(
title="Introduction to Algorithms",
authors=["Cormen", "Leiserson"],
description="Comprehensive algorithms textbook"
)
# Categorize
categories = await service.categorize(
title="Introduction to Algorithms",
subjects=["Algorithms", "Data Structures"]
)
# Enhance description
description = await service.enhance_description(
title="Introduction to Algorithms",
text_sample="Chapter 1: The Role of Algorithms..."
)
See the CLAUDE.md file for architectural details and API documentation for complete reference.
Contributing
Contributions are welcome! Here’s how to get involved:
- Fork the Repo
- Create a Branch for your feature or fix
- Commit & Push your changes
- Open a Pull Request describing the changes
We appreciate code contributions, bug reports, and doc improvements alike.
License
Distributed under the MIT License.
Integrations
ebk follows a modular architecture where the core library remains lightweight, with optional integrations available:
Streamlit Dashboard
pip install ebk[streamlit]
streamlit run ebk/integrations/streamlit/app.py
MCP Server (AI Assistants)
pip install ebk[mcp]
# Configure your AI assistant to use the MCP server
Visualizations
pip install ebk[viz]
# Visualization tools will be available as a separate script
# Documentation coming soon in integrations/viz/
See the Integrations Guide for detailed setup instructions.
Architecture
ebk is designed with a clean, layered architecture:
- Core Library (
ebk.library): Fluent API for all operations - CLI (
ebk.cli): Typer-based commands using the fluent API - Import/Export (
ebk.imports,ebk.exports): Modular format support - Integrations (
integrations/): Optional add-ons (web UI, AI, viz)
This design ensures the core remains lightweight while supporting powerful extensions.
Development
# Clone the repository
git clone https://github.com/queelius/ebk.git
cd ebk
# Create virtual environment
make venv
# Install in development mode
make setup
# Run tests
make test
# Check coverage
make coverage
Stay Updated
- GitHub: https://github.com/queelius/ebk
- Website: https://metafunctor.com
Support
- Issues: Open an Issue on GitHub
- Contact: lex@metafunctor.com
Happy eBook managing! 📚✨