A minimalistic, recursive web crawling library for Python.
Project description
Memorious
The solitary and lucid spectator of a multiform, instantaneous and almost intolerably precise world.
-- Funes the Memorious, Jorge Luis Borges
memorious is a light-weight web scraping toolkit. It supports scrapers that
collect structured or un-structured data. This includes the following use cases:
- Make crawlers modular and simple tasks reusable
- Provide utility functions to do common tasks such as data storage, HTTP session management
- Integrate crawlers with the Aleph and FollowTheMoney ecosystem
- Get out of your way as much as possible
memorious is part of the OpenAleph suite but can be used standalone as well.
Design
When writing a scraper, you often need to paginate through through an index page, then download an HTML page for each result and finally parse that page and insert or update a record in a database.
memorious handles this by managing a set of crawlers, each of which
can be composed of multiple stages. Each stage is implemented using a
Python function, which can be reused across different crawlers.
The basic steps of writing a Memorious crawler:
- Make YAML crawler configuration file
- Add different stages
- Write code for stage operations (optional)
- Test, rinse, repeat
Documentation
The documentation for Memorious is available at
docs.investigraph.dev/lib/memorious.
Feel free to edit the source files in the docs folder and send pull requests for improvements.
To serve the documentation locally, run mkdocs serve
License and Copyright
memorious, (C) -2024 Organized Crime and Corruption Reporting Project
memorious, (C) 2025 Data and Research Center – DARC
memorious4, (C) 2026 Data and Research Center – DARC
memorious4 is licensed under the AGPLv3 or later license.
Prior to version 4.0.0, memorious was released under the MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file memorious4-4.0.1.tar.gz.
File metadata
- Download URL: memorious4-4.0.1.tar.gz
- Upload date:
- Size: 72.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.4.1 CPython/3.13.5 Linux/6.12.74+deb13+1-amd64
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c47ab5355f8afda60706bfc257e2c05c792189a60c0f0ddf9ababe407e5bfe3e
|
|
| MD5 |
c0b85da539ec67646e74257587810222
|
|
| BLAKE2b-256 |
4afc94fe76a99d590574a47764ffdfc94ceff0ae90acacd7bccca903c243ad7c
|
File details
Details for the file memorious4-4.0.1-py3-none-any.whl.
File metadata
- Download URL: memorious4-4.0.1-py3-none-any.whl
- Upload date:
- Size: 90.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.4.1 CPython/3.13.5 Linux/6.12.74+deb13+1-amd64
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
162c55ec2daec5e3edfd50329b1e3049d6b83dfdc695b0e4dfee7089f0be0eae
|
|
| MD5 |
37aa3a51d4bfdd26bf87e11b65995842
|
|
| BLAKE2b-256 |
f264371d1bf14e76a7101d2f3e0c030d34c277ba894bdc5832f29ee05360e34b
|