A tool to scrape articles from Instapaper.
Project description
Instapaper Scraper
A Python tool to scrape all your saved Instapaper bookmarks and export them to various formats.
Features
- Scrapes all bookmarks from your Instapaper account.
- Supports scraping from specific folders.
- Exports data to CSV, JSON, or a SQLite database.
- Securely stores your session for future runs.
- Modern, modular, and tested architecture.
Getting Started
1. Requirements
- Python 3.9+
2. Installation
This package is available on PyPI and can be installed with pip:
pip install instapaper-scraper
3. Usage
Run the tool from the command line, specifying your desired output format:
# Scrape and export to the default CSV format
instapaper-scraper
# Scrape and export to JSON
instapaper-scraper --format json
# Scrape and export to a SQLite database with a custom name
instapaper-scraper --format sqlite --output my_articles.db
Configuration
Authentication
The script authenticates using one of the following methods, in order of priority:
-
Command-line Arguments: Provide your username and password directly when running the script:
instapaper-scraper --username your_username --password your_password
-
Session Files (
.session_key,.instapaper_session): The script attempts to load these files in the following order: a. Path specified by--session-fileor--key-filearguments. b. Files in the current working directory (e.g.,./.session_key). c. Files in the user's configuration directory (~/.config/instapaper-scraper/). After the first successful login, the script creates an encrypted.instapaper_sessionfile and a.session_keyfile to reuse your session securely. -
Interactive Prompt: If no other method is available, the script will prompt you for your username and password.
Note on Security: Your session file (
.instapaper_session) and the encryption key (.session_key) are stored with secure permissions (read/write for the owner only) to protect your credentials.
Folder Configuration
You can define and quickly access your Instapaper folders using a config.toml file. The scraper will look for this file in the following locations (in order of precedence):
- The path specified by the
--config-pathargument. config.tomlin the current working directory.~/.config/instapaper-scraper/config.toml
Here is an example of config.toml:
# Default output filename for non-folder mode
output_filename = "home-articles.csv"
[[folders]]
key = "ml"
id = "1234567"
slug = "machine-learning"
output_filename = "ml-articles.json"
[[folders]]
key = "python"
id = "7654321"
slug = "python-programming"
output_filename = "python-articles.db"
- output_filename (top-level): The default output filename to use when not in folder mode.
- key: A short alias for the folder.
- id: The folder ID from the Instapaper URL.
- slug: The human-readable part of the folder URL.
- output_filename (folder-specific): A preset output filename for scraped articles from this specific folder.
When a config.toml file is present and no --folder argument is provided, the scraper will prompt you to select a folder. You can also specify a folder directly using the --folder argument with its key, ID, or slug. Use --folder=none to explicitly disable folder mode and scrape all articles.
Command-line Arguments
| Argument | Description |
|---|---|
--config-path <path> |
Path to the configuration file. Searches ~/.config/instapaper-scraper/config.toml and config.toml in the current directory by default. |
--folder <value> |
Specify a folder by key, ID, or slug from your config.toml. Requires a configuration file to be loaded. Use none to explicitly disable folder mode. If a configuration file is not found or fails to load, and this option is used (not set to none), the program will exit. |
--format <format> |
Output format (csv, json, sqlite). Default: csv. |
--output <filename> |
Specify a custom output filename. |
--username <user> |
Your Instapaper account username. |
--password <pass> |
Your Instapaper account password. |
Output Formats
You can control the output format using the --format argument. The supported formats are:
csv(default): Exports data tooutput/bookmarks.csv.json: Exports data tooutput/bookmarks.json.sqlite: Exports data to anarticlestable inoutput/bookmarks.db.--output <filename>: Specify a custom output filename.
If the --format flag is omitted, the script will default to csv.
Opening Articles in Instapaper
The output data includes a unique id for each article. To open an article directly in Instapaper's reader view, append this ID to the base URL:
https://www.instapaper.com/read/<article_id>
How It Works
The tool is designed with a modular architecture for reliability and maintainability.
- Authentication: The
InstapaperAuthenticatorhandles secure login and session management. - Scraping: The
InstapaperClientiterates through all pages of your bookmarks, fetching the metadata for each article with robust error handling and retries. - Data Collection: All fetched articles are aggregated into a single list.
- Export: Finally, the collected data is written to a file in your chosen format (
.csv,.json, or.db).
Example Output
CSV (output/bookmarks.csv)
id,title,url
999901234,"Article 1",https://www.example.com/page-1/
999002345,"Article 2",https://www.example.com/page-2/
JSON (output/bookmarks.json)
[
{
"id": "999901234",
"title": "Article 1",
"url": "https://www.example.com/page-1/"
},
{
"id": "999002345",
"title": "Article 2",
"url": "https://www.example.com/page-2/"
}
]
SQLite (output/bookmarks.db)
A SQLite database file is created with an articles table containing id, title, and url columns.
Development & Testing
This project uses pytest for testing, black for code formatting, and ruff for linting.
Setup
To install the development dependencies:
pip install -e .[dev]
Running the Scraper
To run the scraper directly without installing the package:
python -m src.instapaper_scraper.cli
Testing
To run the tests, execute the following command from the project root:
pytest
To check test coverage:
pytest --cov=src/instapaper_scraper --cov-report=term-missing
Code Quality
To format the code with black:
black .
To check for linting errors with ruff:
ruff check .
To automatically fix linting errors:
ruff check . --fix
Disclaimer
This script requires valid Instapaper credentials. Use it responsibly and in accordance with Instapaper’s Terms of Service.
License
This project is licensed under the terms of the GNU General Public License v3.0. See the LICENSE file for the full license text.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file instapaper_scraper-1.0.0.post1.tar.gz.
File metadata
- Download URL: instapaper_scraper-1.0.0.post1.tar.gz
- Upload date:
- Size: 36.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
053bb5295cd0062ac72b0ebf2099551b925ecf1a1ee79490d5fe195c48ab460b
|
|
| MD5 |
e6c8fd7554a1a5ab0dfbd080be64ae54
|
|
| BLAKE2b-256 |
54bc4ed66a7de405a70b9aff50203fdd02c9dd8a043d789bd33f707e2719909a
|
File details
Details for the file instapaper_scraper-1.0.0.post1-py3-none-any.whl.
File metadata
- Download URL: instapaper_scraper-1.0.0.post1-py3-none-any.whl
- Upload date:
- Size: 27.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d2ab88a7ee5ca2fd06e1539b9eb8183b6cc9bb5f5908ce781185204a92c6a42e
|
|
| MD5 |
c3d578cbbf5591e791c8ba5f1eca33fa
|
|
| BLAKE2b-256 |
67075ff648134125f5639ad12eeeeae08ff19fc9b5a3b723d186b1efaa495904
|