Skip to main content

Simple semantic search for Notion HTML exports using AI embeddings

Project description

Notion Archive

A simple Python library for adding semantic search to Notion HTML exports.

What is Notion Archive?

Notion Archive parses your exported Notion workspace and adds AI-powered search using embeddings. It's a basic tool that lets you search through your Notion content using natural language instead of just keywords.

What it does

  • Parses Notion HTML exports
  • Generates embeddings using OpenAI or local models
  • Stores them in a vector database (ChromaDB)
  • Provides basic search functionality
  • Extracts some metadata (tags, titles, workspace structure)

Installation

pip install notion-archive

How to use it

1. Export your Notion workspace

  1. In Notion, go to Settings & Members → Settings
  2. Click "Export all workspace content"
  3. Choose "HTML" format (not Markdown)
  4. Download and unzip the file
  5. You'll get a folder like Export-abc123.../

2. Use the library

from notion_archive import NotionArchive

# Initialize with persistent storage
archive = NotionArchive(
    embedding_model="text-embedding-3-large",
    db_path="./my_archive"  # Saves data permanently
)

# Add your export
archive.add_export('./Export-abc123-def456-etc')

# Build index (automatically skips if already exists)
archive.build_index()  # Smart - won't rebuild unnecessarily

# Search (always fast after first build)
results = archive.search("meeting notes")
for result in results:
    print(f"{result['title']}: {result['content'][:100]}...")

To force a rebuild:

archive.build_index(force_rebuild=True)  # Rebuilds even if index exists

Embedding Models

# OpenAI (requires API key, costs money)
archive = NotionArchive(embedding_model="text-embedding-3-large")
archive = NotionArchive(embedding_model="text-embedding-3-small")

# Local models (free, slower)
archive = NotionArchive(embedding_model="all-MiniLM-L6-v2")

How it works

  1. You export your Notion workspace as HTML
  2. The parser extracts text and basic metadata
  3. Text gets chunked and turned into embeddings
  4. Embeddings are stored in ChromaDB
  5. Search queries get embedded and matched against stored chunks

Limitations

  • Only works with HTML exports (not live Notion)
  • No incremental updates - you have to rebuild the index
  • Basic metadata extraction
  • Search quality depends on your embedding model choice
  • Large workspaces can be expensive with OpenAI models

API

# Initialize
archive = NotionArchive(embedding_model="model-name", db_path="./archive_db")

# Add export folder  
archive.add_export("./path/to/export")

# Build search index (smart - skips if exists)
archive.build_index()

# Force rebuild if needed
archive.build_index(force_rebuild=True)

# Check if index exists
if archive.has_index():
    print("Ready to search!")

# Search
results = archive.search("query", limit=10)

# Get info
stats = archive.get_stats()

Requirements

  • Python 3.8+
  • A Notion workspace exported as HTML
  • OpenAI API key if using OpenAI models

Common issues

"No documents found" - Make sure you exported as HTML, not Markdown, and pointed to the unzipped folder.

"OpenAI API error" - Set your API key: export OPENAI_API_KEY=sk-your-key-here

"Memory error" - Large workspaces need lots of RAM. Try using a smaller embedding model or chunking your export.

License

MIT License - see LICENSE file for details.


A simple tool for adding semantic search to your Notion exports.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

notion_archive-0.1.0.tar.gz (13.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

notion_archive-0.1.0-py3-none-any.whl (13.3 kB view details)

Uploaded Python 3

File details

Details for the file notion_archive-0.1.0.tar.gz.

File metadata

  • Download URL: notion_archive-0.1.0.tar.gz
  • Upload date:
  • Size: 13.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for notion_archive-0.1.0.tar.gz
Algorithm Hash digest
SHA256 af3469ae0a655207da194d8cb47c7245b65c03c81580c8de02d5f31e2fa12b92
MD5 e7d00805b0a42d7b41808100db93cc88
BLAKE2b-256 b5e75dd26fbfae8d5b6a1f4cdee94de5d8be4472d24f488810d18c130021198b

See more details on using hashes here.

File details

Details for the file notion_archive-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: notion_archive-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 13.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for notion_archive-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 234021f7be947285e14e1e264acaa57f746d9252b7d08e1018d70dd173b5eb11
MD5 b1d9b908d4f1ea4ede936710392ee4d5
BLAKE2b-256 9c56638736f3388dca9cf20b4815ce6094c8d2d605a0fe727f4a4e6e1353f9fc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page