Skip to main content

A minimalistic, recursive web crawling library for Python.

Project description

Docs memorious4 on pypi PyPI Downloads PyPI - Python Version Python test and package pre-commit Coverage Status AGPLv3+ License Pydantic v2

Memorious

The solitary and lucid spectator of a multiform, instantaneous and almost intolerably precise world.

-- Funes the Memorious, Jorge Luis Borges

memorious is a light-weight web scraping toolkit. It supports scrapers that collect structured or un-structured data. This includes the following use cases:

  • Make crawlers modular and simple tasks reusable
  • Provide utility functions to do common tasks such as data storage, HTTP session management
  • Integrate crawlers with the Aleph and FollowTheMoney ecosystem
  • Get out of your way as much as possible

memorious is part of the OpenAleph suite but can be used standalone as well.

Design

When writing a scraper, you often need to paginate through through an index page, then download an HTML page for each result and finally parse that page and insert or update a record in a database.

memorious handles this by managing a set of crawlers, each of which can be composed of multiple stages. Each stage is implemented using a Python function, which can be reused across different crawlers.

The basic steps of writing a Memorious crawler:

  1. Make YAML crawler configuration file
  2. Add different stages
  3. Write code for stage operations (optional)
  4. Test, rinse, repeat

Documentation

The documentation for Memorious is available at docs.investigraph.dev/lib/memorious. Feel free to edit the source files in the docs folder and send pull requests for improvements.

To serve the documentation locally, run mkdocs serve

License and Copyright

memorious, (C) -2024 Organized Crime and Corruption Reporting Project

memorious, (C) 2025 Data and Research Center – DARC

memorious4, (C) 2026 Data and Research Center – DARC

memorious4 is licensed under the AGPLv3 or later license.

Prior to version 4.0.0, memorious was released under the MIT license.

see NOTICE and LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

memorious4-4.0.1.tar.gz (72.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

memorious4-4.0.1-py3-none-any.whl (90.2 kB view details)

Uploaded Python 3

File details

Details for the file memorious4-4.0.1.tar.gz.

File metadata

  • Download URL: memorious4-4.0.1.tar.gz
  • Upload date:
  • Size: 72.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.4.1 CPython/3.13.5 Linux/6.12.74+deb13+1-amd64

File hashes

Hashes for memorious4-4.0.1.tar.gz
Algorithm Hash digest
SHA256 c47ab5355f8afda60706bfc257e2c05c792189a60c0f0ddf9ababe407e5bfe3e
MD5 c0b85da539ec67646e74257587810222
BLAKE2b-256 4afc94fe76a99d590574a47764ffdfc94ceff0ae90acacd7bccca903c243ad7c

See more details on using hashes here.

File details

Details for the file memorious4-4.0.1-py3-none-any.whl.

File metadata

  • Download URL: memorious4-4.0.1-py3-none-any.whl
  • Upload date:
  • Size: 90.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.4.1 CPython/3.13.5 Linux/6.12.74+deb13+1-amd64

File hashes

Hashes for memorious4-4.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 162c55ec2daec5e3edfd50329b1e3049d6b83dfdc695b0e4dfee7089f0be0eae
MD5 37aa3a51d4bfdd26bf87e11b65995842
BLAKE2b-256 f264371d1bf14e76a7101d2f3e0c030d34c277ba894bdc5832f29ee05360e34b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page