Skip to main content

Reading rich text information from a Notion database and performing simple NLP analysis.

Project description

Notion Rich Text Data Analysis

Notion NLP

To read text from a Notion database and perform natural language processing analysis.

Tests Passing codecov GitHub Contributors visitors

English / 简体中文

Introduction

When flomo first came out, a database was built in notion to implement similar functionality. It has been a few years since I recorded my thoughts and summaries, and I have accumulated some corpus. flomo's roaming function is not very suitable for my needs, so I wanted to write my own small tool to access the notion API and do NLP analysis.

Last year I wrote a demo using a notebook, but I put it on hold for a while and then improved it. Currently, it supports batch analysis tasks, you can add multiple databases and properties in the configuration file to filter the sorting criteria, and then output the keywords and the corresponding statement paragraph markdown by TF-IDF.

For example, I have added the following task myself.

  • Reflections from the last year
  • Summary optimisation for the year
  • Self-caution for all periods
  • List for the week

Pipline

flowchart TB

A[(Notion Database)] --> B([read rich text via API]) --> C([split word / cleaning / word-phrase mapping]) --> D[/calculate TF-IDF/] --> E[[Output the top-n keywords and their corresponding sentences in markdown format]]

Installation

python3.8 -m pip install notion-nlp

Quick use

Configuration file reference configs/config.sample.yaml (hereinafter config, please rename to config.yaml as your own configuration file)

Get the integration token

In notion integrations create a new integration, get your own token and fill in the token in the config.yaml file afterwards.

graphic tutorial in tango website / graphic tutorial in markdown format

Add integration to database/get database ID

If you open the notion database page in your browser or click on the share copy link, you will see the database id in the address link (similar to a string of jumbles) and fill in the database_id under the task of config.

graphic tutorial in tango website / graphic tutorial in markdown format

Configure the filter sort database entry extra parameter

The task's extra is used to filter and sort the database, see notion filter API for format and content, the config.sample.yaml file already provides 2 configurations.

Run all tasks

python3.8 -m notion-nlp run-all-task --config-file ${Your Config file Path}

Development

Welcome to fork and add new features/fix bugs.

  • After cloning the project, use the create_python_env_in_new_machine.sh script to create a Poetry virtual environment.

  • After completing the code development, use the invoke command to perform a series of formatting tasks, including black/isort tasks added in task.py.

    invoke check
    
  • After submitting the formatted changes, run unit tests to check coverage.

    poetry run tox
    

Note

  • The word segmentation tool has two built-in options: jieba/pkuseg. (Considering adding language analysis to automatically select the most suitable word segmentation tool for that language.)

    • jieba is used by default.
    • pkuseg cannot be installed with poetry and needs to be installed manually with pip. In addition, this library is slow and requires high memory usage. It has been tested that a VPS with less than 1G memory needs to load virtual memory to use it.
  • The analysis method using tf-idf is too simple. Consider integrating the API of LLM (such as chatGPT) for further analysis.

Contributions

License and Copyright

  • MIT License
    • The MIT License is a permissive open-source software license. This means that anyone is free to use, copy, modify, and distribute your software, as long as they include the original copyright notice and license in their derivative works.

    • However, the MIT License comes with no warranty or liability, meaning that you cannot be held liable for any damages or losses arising from the use or distribution of your software.

    • By using this software, you agree to the terms and conditions of the MIT License.

Contact information

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

notion_nlp-1.0.3.tar.gz (18.3 kB view hashes)

Uploaded Source

Built Distribution

notion_nlp-1.0.3-py3-none-any.whl (17.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page