Skip to main content

No project description provided

Project description

notion-haystack: Export Notion pages to Haystack Documents

This python package allows you to easily export your Notion pages to Haystack Documents by providing a Notion API token.

Given that the Notion API is subject to some rate limits, this component will automatically retry failed requests and wait for the rate limit to reset before retrying. This is especially useful when exporting a large number of pages. Furthermore, this component uses asyncio to make requests in parallel, which can significantly speed up the export process.

Installation

pip install notion-haystack

Usage

To use this package, you will need a Notion API token. You can follow the steps outlined in the Notion documentation to create a new Notion integration, connect it to your pages, and obtain your API token.

To enable your Notion integration to work on specific pages and the child pages in Notion, make sure to enable it in the 'Connections' setting of the page.

The following minimal example demonstrates how to export a list of pages to Haystack Documents:

from notion_haystack import NotionExporter

exporter = NotionExporter(api_token="<your-token>")
exported_pages = exporter.run(page_ids=["<list-of-page-ids>"])

# exported_pages will be a list of Haystack Documents where each Document corresponds to a Notion page

The following example shows how to use the NotionExporter inside an indexing pipeline:

from haystack import Pipeline

from notion_haystack import NotionExporter
from haystack.components.preprocessors import DocumentSplitter
from haystack.components.writers import DocumentWriter
from haystack.document_stores import InMemoryDocumentStore

document_store = InMemoryDocumentStore()
exporter = NotionExporter(api_token='YOUR_NOTION_API_KEY')
splitter = DocumentSplitter()
writer = DocumentWriter(document_store=document_store)

indexing_pipeline = Pipeline()
indexing_pipeline.add_component(instance=exporter, name="exporter")
indexing_pipeline.add_component(instance=splitter, name="splitter")
indexing_pipeline.add_component(instance=writer, name="writer")

indexing_pipeline.connect("exporter.documents", "splitter.documents")
indexing_pipeline.connect("splitter", "writer")

indexing_pipeline.run(data={"exporter": {"page_ids": ["your_page_id"] }})
# The pages will now be indexed in the document store

The NotionExporter class takes the following arguments:

  • api_token: Your Notion API token. You can find information on how to get an API token in Notion's documentation
  • export_child_pages: Whether to recursively export all child pages of the provided page ids. Defaults to False.
  • extract_page_metadata: Whether to extract metadata from the page and add it as a frontmatter to the markdown. Extracted metadata includes title, author, path, URL, last editor, and last editing time of the page. Defaults to False.
  • exclude_title_containing: If specified, pages with titles containing this string will be excluded. This might be useful for example to exclude pages that are archived. Defaults to None.

The NotionExporter.run method takes the following arguments:

  • page_ids: A list of page ids to export. If export_child_pages is True, all child pages of these pages will be exported as well.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

notion_haystack-1.0.0.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

notion_haystack-1.0.0-py3-none-any.whl (8.5 kB view details)

Uploaded Python 3

File details

Details for the file notion_haystack-1.0.0.tar.gz.

File metadata

  • Download URL: notion_haystack-1.0.0.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.25.2

File hashes

Hashes for notion_haystack-1.0.0.tar.gz
Algorithm Hash digest
SHA256 3e86e60040259390f31d7c2d47acd8c9856b37b611e1c07c140315140bee1ef8
MD5 640e208abc67206e6cb298df14294c25
BLAKE2b-256 4c07ee12199a77860a0c937c7a15bcb4ec96443e20b1441383799ddfd7f56091

See more details on using hashes here.

File details

Details for the file notion_haystack-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for notion_haystack-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d554f8b1a04eacddc872240daad35faa05ed70990543a5ed6e744d4780387f5a
MD5 bd68f726669d12aa76f7bc7e4dd5e1f9
BLAKE2b-256 345427d483254f27ff5a755984850f3b9b17e00e2d4f9d8c4879c145f4439ce2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page