Skip to main content

A Python package for extracting data from Notion using Polars dataframes.

Project description

Notion ETL

A Python package for extracting, transforming, and loading data from Notion using Polars DataFrames and the Notion API Client.

The package provides a simple API for loading raw and clean data from Notion databases into Polars DataFrames, allowing for efficient data manipulation and analysis.

Installation

The package is available on PyPI and can be installed using pip:

pip install notion-etl

Usage

Authentication

Create a Notion integration and get your Notion API key. You can find instructions on how to do this in the Notion API documentation. Remember to share the pages and databases you want to access with your integration.

To authenticate, set your Notion API key as an environment variable:

export NOTION_TOKEN=secret_...

You can also set the token in your code:

import os
from notion_etl.loader import NotionDataLoader

loader = NotionDataLoader(os.environ["NOTION_TOKEN"])

Loading Data from a Notion Database

Use the NotionDataLoader class to load data from a Notion database. The get_database method retrieves the database and its records.

The database id can be found in the URL of the database page. For example, in the URL https://www.notion.so/your_workspace/Database-Name-1234567890abcdef1234567890abcdef, the database id is 1234567890abcdef1234567890abcdef.

from notion_etl.loader import NotionDataLoader

loader = NotionDataLoader()
database = loader.get_database("database_id")
database.records # List of records in the database
database.to_dataframe() # Convert to clean Polars DataFrame
database.to_dataframe(clean=False) # Convert to raw Polars DataFrame

Loading Data from a Notion Page

For loading data from a Notion page, use the get_page_contents method. The results of a page can be converted to a Polars DataFrame, plain text, or markdown.

Same as with the database, the page id can be found in the URL of the page. For example, in the URL https://www.notion.so/your_workspace/Page-Name-1234567890abcdef1234567890abcdef, the page id is 1234567890abcdef1234567890abcdef.

from notion_etl.loader import NotionDataLoader

loader = NotionDataLoader()
page = loader.get_page_contents("page_id")
print(page.as_plain_text()) # Print the page content as plain text
print(page.as_markdown()) # Print the page content as markdown
page.as_dataframe() # Convert to Polars DataFrame, every block in the page is a row in the DataFrame

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

notion_etl-0.1.1.tar.gz (30.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

notion_etl-0.1.1-py3-none-any.whl (44.7 kB view details)

Uploaded Python 3

File details

Details for the file notion_etl-0.1.1.tar.gz.

File metadata

  • Download URL: notion_etl-0.1.1.tar.gz
  • Upload date:
  • Size: 30.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.7.4

File hashes

Hashes for notion_etl-0.1.1.tar.gz
Algorithm Hash digest
SHA256 1f90fd7ecbb830145dfbaf613d54535d4c68701ea235a1340535cd2bb68e496d
MD5 21823d5937388b5f94b7ec729e78c0e8
BLAKE2b-256 71bb4b0d408033bcc38ddc6c91807d140246a02698b0543dfc08a39b064463c4

See more details on using hashes here.

File details

Details for the file notion_etl-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: notion_etl-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 44.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.7.4

File hashes

Hashes for notion_etl-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 27818c290c41e7ba50c4b258e53f404d26e62b8d5451258c8d102c512da77a22
MD5 f0df489f43e32b2ea4bfe124741dce87
BLAKE2b-256 36c16e1b369d89c2ddb6e3301f2256abc5026890a7e1ead200800c9989cf9364

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page