active library

netloom

Declarative language for constructing complex networks from structured data

Started 2026

Resources & Distribution

Source Code

GitHub Repository

Package Registries

netloom

Declarative language for constructing complex networks from structured data.

What is netloom?

netloom is a YAML-based DSL that describes how to build a weighted directed graph from structured documents. You declare node types (the units of your graph) and link types (how nodes relate), and netloom constructs a NetworkX DiGraph for analysis.

The output is a directed graph you can analyze with standard tools: community detection, centrality measures, shortest paths, visualization.

import netloom

G = netloom.build("config.yaml")   # returns nx.DiGraph

Or from the command line:

netloom build config.yaml                            # -> output.graphml
netloom build config.yaml -o graph.json --format json
netloom build config.yaml --format gexf

Why not just use a vector DB?

Vector databases are fast nearest-neighbor lookup engines. They answer “what’s similar to X?” netloom answers a different question: “what is the structure of similarity across my corpus?”

	Vector DB	Custom Python	netloom
Computes	Single embedding, cosine/L2	Anything you code	Multi-field similarity with composition
Scaling	ANN indexes, millions of docs	Depends	O(n^2) pairwise, practical to ~10K docs
Retrieval	Fast nearest-neighbor	Custom	Graph-aware: communities, hubs, bridges
Metadata	Filter only (WHERE clauses)	Custom	First-class similarity participant
Configuration	Code it	Code it	Declarative YAML

netloom wins when:

Metadata fields (tags, authors, categories) should contribute to similarity, not just filter results
You care about graph structure: which documents are hubs, which bridge two communities
You want declarative control over how similarity is composed from multiple signals
You have multiple node types in the same graph (heterogeneous networks)
You need directed relationships like citations or containment alongside similarity

Sweet spot: Small-to-medium corpora (<10K documents) where you want to understand structure, not just retrieve.

Source formats

netloom ingests structured data from multiple formats. A single config can combine multiple sources:

source:
  - path: data/conversations.jsonl
    format: jsonl
  - path: data/papers/
    format: json

Format	Example
JSONL	`data/conversations.jsonl`
JSON files	`data/papers/*.json`
YAML	`data/config.yaml` (single or multi-document)
Markdown + frontmatter	`notes/*.md` (YAML frontmatter + body)
Plain markdown	`docs/*.md` (headings and sections extracted as structured data)
Plain text	`corpus/*.txt` (whole file becomes `body`)

Markdown is treated as structured data: headings become title, ## sections become a sections list, and the content becomes body. Every record gets a _meta block with full provenance (source path, timestamps, content hash).

Core abstractions

Defaults reduce repetition across node types:

defaults:
  embed:
    model: tfidf

Nodes define the units of your graph. A single source document can produce multiple node types:

nodes:
  conversation:
    from: .
    fields:
      title: { pluck: title }
      tags: { pluck: tags }
    embed:
      field: title              # inherits model: tfidf from defaults

  user_turn:
    from: turns
    where: { role: user }
    fields:
      text: { pluck: text }
    embed:
      field: text

Links define relationships between nodes – similarity, attribute overlap, structural containment, and foreign-key references:

links:
  intent_similarity:
    between: [user_turn, user_turn]
    method: cosine
    min: 0.3

  tag_overlap:
    between: [conversation, conversation]
    method: jaccard
    field: tags

  contains_turns:
    between: [conversation, user_turn]
    method: parent

  cites:
    between: [paper, paper]
    method: reference
    field: references
    target_field: paper_id

Network controls graph construction:

network:
  min: 0.3
  communities:
    algorithm: louvain

Link methods

Method	Description
`cosine`	Cosine similarity on embedding vectors
`jaccard`	Jaccard set similarity on list fields
`dice`	Dice coefficient on list fields
`overlap`	Overlap coefficient on list fields
`exact`	Boolean equality (1.0 or 0.0)
`numeric`	Gaussian kernel similarity on number fields
`parent`	Structural containment (directed: parent->child)
`reference`	Foreign-key lookup (directed: source->target)

Symmetric methods produce bidirectional edges. Parent and reference produce directed edges.

Full example

Given a corpus of conversation JSON documents like this:

{
  "id": "conv-2024-0142",
  "title": "Debug authentication middleware",
  "created_at": "2024-11-15T09:23:00Z",
  "model": "claude-sonnet-4-20250514",
  "turns": [
    {
      "role": "user",
      "text": "The auth middleware is rejecting valid tokens after the Redis upgrade",
      "timestamp": "2024-11-15T09:23:00Z"
    },
    {
      "role": "assistant",
      "text": "Let me check the Redis connection config and token validation logic.",
      "timestamp": "2024-11-15T09:23:05Z",
      "tool_calls": ["read_file", "grep"]
    },
    {
      "role": "tool",
      "text": "Contents of auth/middleware.py...",
      "timestamp": "2024-11-15T09:23:06Z"
    },
    {
      "role": "assistant",
      "text": "Found it -- the Redis key prefix changed from 'session:' to 'sess:' in v7. The token lookup is using the old prefix.",
      "timestamp": "2024-11-15T09:23:15Z"
    },
    {
      "role": "user",
      "text": "Ah that makes sense, we upgraded Redis last week. Can you fix it?",
      "timestamp": "2024-11-15T09:23:30Z"
    }
  ],
  "tags": ["debugging", "auth", "redis"],
  "project": "backend-api",
  "outcome": "resolved",
  "tools_used": ["read_file", "grep", "edit"]
}

This netloom config builds a heterogeneous graph where conversations and individual turns are separate node types, connected by semantic similarity, tag overlap, and structural containment:

defaults:
  embed:
    model: tfidf

source:
  path: data/conversations/
  format: jsonl

nodes:
  conversation:
    from: .
    fields:
      id: { pluck: id }
      title: { pluck: title }
      tags: { pluck: tags }
      project: { pluck: project }
    embed:
      field: title

  user_turn:
    from: turns
    where: { role: user }
    fields:
      text: { pluck: text }
      timestamp: { pluck: timestamp }
    embed:
      field: text
      chunking:
        method: sentences
        max_tokens: 256
      aggregate: mean

  assistant_turn:
    from: turns
    where: { role: assistant }
    fields:
      text: { pluck: text }
      tools: { pluck: tool_calls, default: [] }
    embed:
      field: text

links:
  user_intent_similarity:
    between: [user_turn, user_turn]
    method: cosine
    min: 0.3

  cross_role_similarity:
    between: [user_turn, assistant_turn]
    method: cosine
    min: 0.4

  tag_overlap:
    between: [conversation, conversation]
    method: jaccard
    field: tags

  same_project:
    between: [conversation, conversation]
    method: exact
    field: project

  contains_user_turns:
    between: [conversation, user_turn]
    method: parent

  contains_assistant_turns:
    between: [conversation, assistant_turn]
    method: parent

network:
  min: 0.3
  communities:
    algorithm: louvain

Each source document produces one conversation node and multiple user_turn and assistant_turn nodes. The resulting graph has:

Semantic edges between turns with similar content (cosine similarity)
Attribute edges between conversations sharing tags (Jaccard) or the same project (exact match)
Structural edges connecting conversations to their constituent turns (parent links)

Plugin architecture

netloom uses a registry pattern for all pluggable components. Built-in providers and user-written providers are structurally identical:

netloom/
  embeddings/tfidf.py                # built-in, no heavy deps
  embeddings/ollama.py                # local Ollama models
  embeddings/openai.py                # OpenAI API
  embeddings/sentence_transformers.py # HuggingFace models
  metrics/cosine.py                   # built-in
  metrics/jaccard.py                  # built-in
  metrics/dice.py                     # built-in
  metrics/overlap.py                  # built-in
  metrics/exact.py                    # built-in
  metrics/numeric.py                  # built-in
  chunking/sentences.py               # built-in
  chunking/paragraphs.py              # built-in
  chunking/fixed_tokens.py            # built-in

Install optional providers via extras:

pip install netloom[openai]
pip install netloom[ollama]

Or write your own:

from netloom import register_embedding

@register_embedding("my-model")
class MyEmbedding:
    def embed(self, text: str) -> list[float]:
        ...

Then reference it in the DSL:

embed:
  field: text
  model: my-model

Use cases

Conversation analysis: Nodes are conversations and turns. Links are semantic similarity, topic overlap, temporal proximity, structural containment.
Paper citation networks: Nodes are papers. Links are citation (reference), co-authorship (jaccard), topic similarity (cosine).
Codebase analysis: Nodes are files, functions, modules. Links are imports, call graphs, semantic similarity.
Multi-modal documents: Nodes are text chunks, images, tables from the same document. Links are co-occurrence and cross-modal similarity.
E-commerce catalogs: Nodes are products. Links are semantic description similarity, shared categories, price-range proximity (numeric).

Status

Design phase. The DSL specification is in docs/spec.md. No implementation code yet. We’re refining the design before building.

License

MIT

Resources & Distribution

Source Code

Package Registries

netloom

What is netloom?

Why not just use a vector DB?

Source formats

Core abstractions

Link methods

Full example

Plugin architecture

Use cases

Status

License

Discussion