Production-grade agentic document-to-dataset pipeline with GraphRAG support.

These details have not been verified by PyPI

Project links

Project description

neuraparse

Production-grade agentic document-to-dataset pipeline with GraphRAG support.

⚠️ Alpha Release: This is an early alpha version (0.1.0a1). APIs may change. Feedback and contributions welcome!

🚀 What is neuraparse?

neuraparse transforms documents into high-quality datasets for:

GraphRAG systems (entity extraction, graph neighborhoods, hierarchical summaries)
Retrieval evaluation (graded relevance, cross-document ranking, multi-context ranking)
LLM fine-tuning (QA pairs, instruction datasets, summarization)
Agentic workflows (memory, tool usage, knowledge graphs)

Key Features

✅ Multi-format ingestion: Web pages, PDFs, Office docs, Markdown, plain text ✅ Hierarchical parsing: Layout-aware DocumentTree (sections, paragraphs, metadata) ✅ GraphRAG-ready: DocumentGraph with structural + semantic nodes (entities, summaries) ✅ 10+ dataset recipes: RAG chunks, QA pairs, entity knowledge, graded relevance, cross-doc ranking ✅ Profile system: Bundle recipes into workflows (graphrag, eval_ranking, eval_advanced) ✅ Real LLM integration: OpenAI, Anthropic (Claude), Ollama (local models) ✅ Production-ready: 39 tests, type hints, comprehensive error handling

📦 Installation

# Basic installation
pip install neuraparse

# With LLM providers
pip install neuraparse[llm-openai]      # OpenAI GPT-4/3.5
pip install neuraparse[llm-anthropic]   # Anthropic Claude
pip install neuraparse[llm-ollama]      # Local Ollama models
pip install neuraparse[llm-all]         # All LLM providers

# With document parsing
pip install neuraparse[pdf]             # PDF support
pip install neuraparse[office]          # DOCX support
pip install neuraparse[recipes-yaml]    # YAML recipe configs

# Full installation
pip install neuraparse[llm-all,pdf,office,recipes-yaml]

🎯 Quick Start

1. Ingest a document

# From a web page
neuraparse ingest https://example.com/article.html

# From a local file
neuraparse ingest path/to/document.pdf

# From markdown
neuraparse ingest path/to/notes.md

2. Build a document graph

neuraparse build-graph <document_id>

This creates a DocumentGraph with:

Structural nodes: DOCUMENT → SECTION → PARAGRAPH hierarchy
Semantic nodes: ENTITY (keywords), SUMMARY (section summaries)
Edges: parent_of, next_sibling, mentions, summarizes

3. Generate datasets

Option A: Run a single recipe

# Generate RAG chunks
neuraparse run-recipe <document_id> --recipe examples/rag_chunks.json

# Generate QA pairs with OpenAI
neuraparse run-recipe <document_id> --recipe examples/recipe_with_openai.json

# Generate graded relevance dataset
neuraparse run-recipe <document_id> --recipe examples/graded_relevance.json

Option B: Run a profile (multiple recipes)

# GraphRAG profile (6 recipes: chunks, QA, summaries, entities, neighborhoods, relevance)
neuraparse run-profile <document_id> --profile graphrag

# Evaluation ranking profile (2 recipes: section_relevance, multi_context_ranking)
neuraparse run-profile <document_id> --profile eval_ranking

# Advanced evaluation profile (3 recipes: graded_relevance, cross_doc_ranking, entity_context_ranking)
neuraparse run-profile <document_id> --profile eval_advanced

📚 Available Recipes

Recipe	Description	Output Format
`rag_chunks`	Paragraph chunks for RAG	`{chunk_id, text, metadata}`
`basic_qa`	QA pairs per paragraph	`{question, answer, context}`
`outline_summary`	Hierarchical section summaries	`{section, summary, level}`
`entity_knowledge`	Entity-centric knowledge aggregation	`{entity, mentions, contexts}`
`graph_neighborhood`	Paragraph + graph context	`{paragraph, siblings, summary}`
`section_relevance`	Binary relevance pairs	`{query, context, label}`
`multi_context_ranking`	Multi-context ranking	`{query, contexts: [{text, label}]}`
`graded_relevance`	Graded relevance (0-3)	`{query, context, grade}`
`cross_document_ranking`	Cross-doc ranking	`{query, contexts: [{text, label, source_doc}]}`
`entity_context_ranking`	Entity + summary ranking	`{query, contexts: [{text, label, type}]}`

🧠 LLM Integration

OpenAI

{
  "kind": "basic_qa",
  "params": {
    "llm": {
      "provider": "openai",
      "model": "gpt-4",
      "api_key": "sk-...",  // or set OPENAI_API_KEY env var
      "temperature": 0.7,
      "max_tokens": 512
    }
  }
}

Anthropic (Claude)

{
  "kind": "outline_summary",
  "params": {
    "llm": {
      "provider": "anthropic",
      "model": "claude-3-5-sonnet-20241022",
      "temperature": 0.5,
      "max_tokens": 1024
    }
  }
}

Ollama (Local)

{
  "kind": "basic_qa",
  "params": {
    "llm": {
      "provider": "ollama",
      "model": "llama3.2",
      "base_url": "http://localhost:11434",
      "temperature": 0.6
    }
  }
}

🏗️ Architecture

┌─────────────────┐
│  Raw Documents  │  (Web, PDF, DOCX, Markdown, Text)
└────────┬────────┘
         │ Ingestion
         ▼
┌─────────────────┐
│ DocumentTree    │  (Hierarchical: sections, paragraphs, metadata)
└────────┬────────┘
         │ Graph Building
         ▼
┌─────────────────┐
│ DocumentGraph   │  (Nodes: DOCUMENT, SECTION, PARAGRAPH, ENTITY, SUMMARY)
└────────┬────────┘
         │ Recipe Execution
         ▼
┌─────────────────┐
│   Datasets      │  (RAG chunks, QA pairs, rankings, evaluations)
└─────────────────┘

🔬 Advanced Usage

Custom Profiles

Create my_profiles.json:

{
  "profiles": {
    "my_custom_profile": [
      "rag_chunks",
      "graded_relevance",
      "entity_context_ranking"
    ]
  }
}

Run it:

neuraparse run-profile <document_id> --profile my_custom_profile --profiles-config my_profiles.json

Python API

from neuraparse.core.ingestion import ingest_from_url
from neuraparse.core.graph_builder import build_document_graph
from neuraparse.recipes import execute_recipe, execute_profile

# Ingest
doc = ingest_from_url("https://example.com/article.html", base_dir="./data")

# Build graph
graph = build_document_graph(doc.id, base_dir="./data")

# Run recipe
output_path = execute_recipe(
    config_path="examples/rag_chunks.json",
    graph=graph,
    base_dir="./data",
    document_id=doc.id
)

# Or run profile
outputs = execute_profile(
    profile_name="graphrag",
    graph=graph,
    base_dir="./data",
    document_id=doc.id
)

🧪 Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=neuraparse --cov-report=html

# Run specific test file
pytest tests/test_advanced_eval_recipes.py -v

Current status: ✅ 39 passed, 1 skipped

📖 Documentation

Full Documentation (coming soon)
Recipe Guide (coming soon)
LLM Integration Guide (coming soon)
Examples

🛣️ Roadmap

Core ingestion + parsing + graph building
10+ dataset recipes
Profile system
Real LLM integration (OpenAI, Anthropic, Ollama)
Advanced evaluation recipes (graded relevance, cross-doc ranking)
Multi-document graph merging
Streaming ingestion for large documents
Web UI for graph visualization
PyPI package release

📄 License

MIT License - see LICENSE for details.

🤝 Contributing

Contributions welcome! Please:

Fork the repo
Create a feature branch
Add tests for new features
Ensure all tests pass (pytest)
Submit a pull request

🙏 Acknowledgments

Built with modern 2025 GraphRAG and agentic data pipeline patterns, inspired by:

Microsoft GraphRAG
LlamaIndex
LangChain
Recent ACL/NAACL/ICLR papers on retrieval evaluation

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0a1 pre-release

Nov 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neuraparse-0.1.0a1.tar.gz (43.6 kB view details)

Uploaded Nov 24, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

neuraparse-0.1.0a1-py3-none-any.whl (53.1 kB view details)

Uploaded Nov 24, 2025 Python 3

File details

Details for the file neuraparse-0.1.0a1.tar.gz.

File metadata

Download URL: neuraparse-0.1.0a1.tar.gz
Upload date: Nov 24, 2025
Size: 43.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for neuraparse-0.1.0a1.tar.gz
Algorithm	Hash digest
SHA256	`aa1d2d39aad10327b748007208ec9ed7e7581da73352024dc7b01152ebc54d72`
MD5	`a875aaf0168356c3d4163841c7e2e16e`
BLAKE2b-256	`fcb4917fc50eae420999bb52b0805705fc18ab0e3c26f014a7c0a40c732108b5`

See more details on using hashes here.

File details

Details for the file neuraparse-0.1.0a1-py3-none-any.whl.

File metadata

Download URL: neuraparse-0.1.0a1-py3-none-any.whl
Upload date: Nov 24, 2025
Size: 53.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for neuraparse-0.1.0a1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8f7f53a89d9df7aa4d2d6b65590ae9e6c864cbfafeb89f34d96856578730c8ba`
MD5	`e2cbc8452117a8d8d33c230bd154e402`
BLAKE2b-256	`a7a6fd3eb89cf422502785172b605cac7b77c1cc9e84f759c6bd3ca63b1a2f00`

See more details on using hashes here.

neuraparse 0.1.0a1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

neuraparse

🚀 What is neuraparse?

Key Features

📦 Installation

🎯 Quick Start

1. Ingest a document

2. Build a document graph

3. Generate datasets

Option A: Run a single recipe

Option B: Run a profile (multiple recipes)

📚 Available Recipes

🧠 LLM Integration

OpenAI

Anthropic (Claude)

Ollama (Local)

🏗️ Architecture

🔬 Advanced Usage

Custom Profiles

Python API

🧪 Testing

📖 Documentation

🛣️ Roadmap

📄 License

🤝 Contributing

🙏 Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes