Skip to main content

Detect duplicated pages in a Notion database and optionallly delete them

Project description

Purpose

Detect the duplicated pages in a Notion database and optionally delete the dupes

What's a duplicated page?

It's a page with the both same title and last_edited_time as another document.

Motivation

I recently decided to move away from Evernote (after being a subsciber since 2008). My reason? They started to jack up their price to a level that wasn't justifiable to me.

The price of the yearly subscription went from $35 in 2022, to $50 in 2023 and for this year they want $130! </RANT>

After I imported many pages from Evernote, I ended up with 100s if not 1000s of duplicated pages.

This script solved the problem!

Install

pip install notion-duplicates

Prerequisites

You first need to create an integration from Notion that will create a token:

  • Go to https://www.notion.so/my-integrations

  • Click on [ + New Integration ]

  • Specify the name say: notion_duplicates

  • Click on Show under Internal Integration Secret and copy the secret which looks like:

    • secret_WhGbvv7jUxt88WXYZDlhxoiBtgtzGXBqPrVSA00aaBo
  • That's the value to use as NOTION_TOKEN

Next, you need to connect the notion_duplicates integration with your Notion database:

Finally, you need your database_id that can easily be extracted from your database URL:

It's the 32 characters from the / to the ?. See the example below where the database_id=a769a042d8f544ce860ba408d295ab28

https://www.notion.so/a769a042d8f544ce860ba408d295ab28?v=8603013e8753451cb46496a62e6ac55f
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Usage

Help (-h)

notion_duplicates -h
usage: notion_duplicates [-h] [-m [MAX_PAGE_COUNT]] [-D] [-M [MAX_DELETE_PAGE_COUNT]] database_id

Detect duplicated pages in a Notion database and optionally delete them

positional arguments:
  database_id           Notion database on which to conduct the duplicate search. See README.md for more details

optional arguments:
  -h, --help            show this help message and exit
  -m [MAX_PAGE_COUNT], --max_page_count [MAX_PAGE_COUNT]
                        Maximum number of pages to scan for duplicated pages (default: None)
  -D, --delete          Do the actual deletion (set in_trash=True) (default: False)
  -M [MAX_DELETE_PAGE_COUNT], --max_delete_page_count [MAX_DELETE_PAGE_COUNT]
                        Maximum number of pages to delete (default: None)

Example with no duplicate

notion_duplicates a769a042d8f544ce860ba408d295ab28
Iterated over 3 pages in the database:a769a042d8f544ce860ba408d295ab28. Found 0 duplicated page(s) and deleted 0 page(s)
Elapased time:0.12 seconds

Example showing duplicates only (no deletion)

notion_duplicates 5ae487a972e345b09450c181150a7AAA
Scanned 100 in 0.61 secs or 164 pages/sec
Scanned 200 in 1.52 secs or 131 pages/sec
Scanned 300 in 2.22 secs or 135 pages/sec
Scanned 400 in 3.02 secs or 132 pages/sec
Scanned 500 in 3.63 secs or 138 pages/sec
This page is a dupe -> title:(1) Facebook | last_edited:2013-07-05T01:34:00.000Z | url:https://www.notion.so/1-Facebook-a7df306435694572be8460ac45b75950
This page is a dupe -> title:Patio Lounger RE 11.2in Nicollet : Target | last_edited:2013-07-04T23:09:00.000Z | url:https://www.notion.so/Patio-Lounger-RE-11-2in-Nicollet-Target-706e30effb4345b4b50ee0db3328ebbb
This page is a dupe -> title:ÄPPLARÖ Drop-leaf table - IKEA | last_edited:2013-07-04T23:03:00.000Z | url:https://www.notion.so/PPLAR-Drop-leaf-table-IKEA-9fe474b0f5424c499f3fe78aeb005deb
Reached max page count
Iterated over 521 pages in the database:5ae487a972e345b09450c181150a77b2. Found 3 duplicated page(s) and deleted 0 page(s)
Elapased time:4.52 seconds

Example deleting duplicates (use -D)

notion_duplicates -D 5ae487a972e345b09450c181150a7AAA
Scanned 100 in 0.61 secs or 164 pages/sec
Scanned 200 in 1.52 secs or 131 pages/sec
Scanned 300 in 2.22 secs or 135 pages/sec
Scanned 400 in 3.02 secs or 132 pages/sec
Scanned 500 in 3.63 secs or 138 pages/sec
DELETING dupe page -> title:(1) Facebook | last_edited:2013-07-05T01:34:00.000Z | url:https://www.notion.so/1-Facebook-a7df306435694572be8460ac45b75950
DELETING dupe page -> title:Patio Lounger RE 11.2in Nicollet : Target | last_edited:2013-07-04T23:09:00.000Z | url:https://www.notion.so/Patio-Lounger-RE-11-2in-Nicollet-Target-706e30effb4345b4b50ee0db3328ebbb
DELETING dupe page -> title:ÄPPLARÖ Drop-leaf table - IKEA | last_edited:2013-07-04T23:03:00.000Z | url:https://www.notion.so/PPLAR-Drop-leaf-table-IKEA-9fe474b0f5424c499f3fe78aeb005deb
Iterated over 521 pages in the database:5ae487a972e345b09450c181150a7AAA. Found 3 duplicated page(s) and deleted 3 page(s)
Elapased time:4.77 seconds

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

notion_duplicates-0.6.0.tar.gz (3.8 kB view details)

Uploaded Source

Built Distribution

notion_duplicates-0.6.0-py3-none-any.whl (4.8 kB view details)

Uploaded Python 3

File details

Details for the file notion_duplicates-0.6.0.tar.gz.

File metadata

  • Download URL: notion_duplicates-0.6.0.tar.gz
  • Upload date:
  • Size: 3.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.9.5 Darwin/23.4.0

File hashes

Hashes for notion_duplicates-0.6.0.tar.gz
Algorithm Hash digest
SHA256 a671797315a0af0161012695aca78d03292706e394124560c82bf9d97ad103d5
MD5 e732bbf1e714130a7535b4a2e9a7fad0
BLAKE2b-256 35230fac009a0199344c11dd2188f73e3e03226dbb4dcb187642e514b00b3860

See more details on using hashes here.

File details

Details for the file notion_duplicates-0.6.0-py3-none-any.whl.

File metadata

File hashes

Hashes for notion_duplicates-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e83dc031b071c27b93f84b49f19dba1f6413d852936e5139b517be985884715f
MD5 cc4e03f2df20cddf2977b32c623a186c
BLAKE2b-256 1f5e055946e2f33532b90c06616edfb42aa0d538c900fc8779a6f12d3027ff8f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page