Skip to main content

Notion Client extension to import notion Database into pandas Dataframe

Project description

Notion2Pandas

Notion2Pandas is a Python 3 package that extends the capabilities of the excellent notion-sdk-py by Ramnes, It enables the seamless import of a Notion database into a pandas dataframe and vice versa, requiring just a single line of code.

Installation

pip install notion2pandas

Usage

  • As shown in the gif, you just need to import the Notion2PandasClient class.
from notion2pandas import Notion2PandasClient
  • Create an instance by passing your authentication token.
n2p = Notion2PandasClient(auth=os.environ["NOTION_TOKEN"])
  • Use the 'from_notion_DB_to_dataframe' method to get the data into a dataframe.
df = n2p.from_notion_DB_to_dataframe(os.environ["DATABASE_ID"])
  • When you're done working with your dataframe, use the 'update_notion_DB_from_dataframe' method to save the data back to Notion.
n2p.update_notion_DB_from_dataframe(os.environ["DATABASE_ID"], df)
  • If you need a queried or sorted database, you can create your filter / sort object with this structure and pass it to the from_notion_DB_to_dataframe method:
published_filter = {"filter": {
            "property": "Published",
            "checkbox": {
                "equals": True
            }
        }}

df = n2p.from_notion_DB_to_dataframe(os.environ["DATABASE_ID"], published_filter)

PageID and Row_Hash

As you can see, in the pandas dataframe there are two additional columns compared to those in the original database, PageID and Row_Hash. As you can imagine, PageID it's the ID related to the page of that entry in Notion. Row_Hash is a value calculated based on the fields' values of the entry, this value is used by the update_notion_DB_from_dataframe function to determine if a row in the dataframe has been modified, and if not, it avoids making the API call to Notion for that row. Any change to those functions can lead to malfunctions, so please do not change them!

Utility functions

Notion2Pandas is a class that extend Client from notion_client, so you can find every feature present in notion_client. In addition to the functions for importing and exporting dataframes, I've added some other convenient functions that wrap the usage of the notion_client functionality and allow them to be used more directly. These are:

  • get_database_columns(database_ID)
  • create_page(page_ID)
  • update_page(page_ID)
  • retrieve_page(page_ID)
  • delete_page(page_ID)
  • retrieve_block(block_ID)
  • retrieve_block_children_list(block_ID)
  • update_block(block_ID, field, field_value_updated)

read_write_lambdas

Notion2Pandas has the ability to transform a Notion database into a Pandas dataframe without having to specify how to parse the data. However, in some cases, the default parsing may not be what you want to achieve. Therefore, it's possible to specify how to parse the data. In Notion2Pandas, each data type in Notion is associated with a tuple consisting of two functions: one for reading the data and the other for writing it.

In this example, I'm changing the functions for reading and writing dates so that I can work only with the start date.

n2p.date_read_write_lambdas = (lambda notion_property:
                                    notion_property.get('date').get('start')
                                    if notion_property.get('date') is not None
                                    else '',
                                    lambda row_value:
                                    {'date': {'start': row_value}
                                    if row_value != ''
                                    else None})

My suggestion for changing the read and write functions is to take the original function directly from the Notion2Pandas.py code and modify it until the desired result is achieved. These are the names of the tuple for each kind of Notion Data:

NotionData LambdaFunction
Title title_read_write_lambdas
Rich Text rich_text_read_write_lambdas
Check box checkbox_read_write_lambdas
Number number_read_write_lambdas
Date date_read_write_lambdas
Select select_read_write_lambdas
Multi Select multi_select_read_write_lambdas
Status status_read_write_lambdas
Email email_read_write_lambdas
People people_read_write_lambdas
Phone number phone_number_read_write_lambdas
URL url_read_write_lambdas
Relation relation_read_write_lambdas
Roll Up rollup_read_write_lambdas
Files files_read_write_lambdas
Formula formula_read_write_lambdas
String string_read_write_lambdas
Unique ID unique_id_read_write_lambdas
Button button_read_write_lambdas
Created by created_by_read_write_lambdas
Created time created_time_read_write_lambdas
Last edited by last_edited_by_read_write_lambdas
Last edited time last_edited_time_read_write_lambdas

Adding and removes rows

If you add a row to the dataframe and then update the Notion database from it, Notion2Pandas is capable of adding the new row to the database. On the contrary, if a row is removed, Notion2Pandas will not automatically delete the row during the update. In this case, you need to use the delete_page method manually.

Roadmap

For the upcoming releases, I plan to release:

  • Managing the limit of 2700 API calls in 15 minutes
  • Asynchronous client version of notion2pandas

Support

Notion2Pandas is an open-source project; anyone can contribute to the project by reporting issues or proposing merge requests. I will commit to evaluating every proposal and responding to all. If you disagree with the decisions made and the direction the project may take, you are free to fork the project, and you will have my blessing!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

notion2pandas-1.0.0-py3-none-any.whl (8.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page