memorious

A minimalistic, recursive web crawling library for Python.

These details have not been verified by PyPI

Project links

Homepage

Intended Audience
- Developers
Operating System
- OS Independent
Programming Language
- Python

Project description

The solitary and lucid spectator of a multiform, instantaneous and almost intolerably precise world.

—Funes the Memorious, Jorge Luis Borges

https://github.com/alephdata/memorious/workflows/memorious/badge.svg

memorious is a light-weight web scraping toolkit. It supports scrapers that collect structured or un-structured data. This includes the following use cases:

Make crawlers modular and simple tasks re-usable
Provide utility functions to do common tasks such as data storage, HTTP session management
Integrate crawlers with the Aleph and FollowTheMoney ecosystem
Get out of your way as much as possible

Design

When writing a scraper, you often need to paginate through through an index page, then download an HTML page for each result and finally parse that page and insert or update a record in a database.

memorious handles this by managing a set of crawlers, each of which can be composed of multiple stages. Each stage is implemented using a Python function, which can be re-used across different crawlers.

The basic steps of writing a Memorious crawler:

Make YAML crawler configuration file
Add different stages
Write code for stage operations (optional)
Test, rinse, repeat

Documentation

The documentation for Memorious is available at alephdata.github.io/memorious. Feel free to edit the source files in the docs folder and send pull requests for improvements.

To build the documentation, inside the docs folder run make html

You’ll find the resulting HTML files in /docs/_build/html.

Project details

These details have not been verified by PyPI

Project links

Homepage

Intended Audience
- Developers
Operating System
- OS Independent
Programming Language
- Python

Release history Release notifications | RSS feed

This version

2.6.5

Jan 10, 2024

2.6.4

Aug 29, 2023

2.6.3

Jul 12, 2023

2.6.2

May 4, 2023

2.5.0

Feb 28, 2022

2.4.5

Oct 27, 2021

2.4.4

Oct 22, 2021

2.4.3

Oct 21, 2021

2.4.2

Sep 22, 2021

2.4.1

Sep 2, 2021

2.4.0

Sep 2, 2021

2.3.4

Jul 2, 2021

2.3.3

Jul 1, 2021

2.3.2

Jul 1, 2021

2.3.1

Jul 1, 2021

2.3.0

Jun 22, 2021

2.2.0

May 4, 2021

2.1.1

Apr 8, 2021

2.1.0

Apr 8, 2021

2.0.0

Mar 30, 2021

1.9.0

Jan 20, 2021

1.8.4

Dec 8, 2020

1.8.3

Oct 27, 2020

1.8.2

Sep 27, 2020

1.8.0

Jul 10, 2020

1.7.4

Jul 1, 2020

1.7.3

Jul 1, 2020

1.7.2

Jun 30, 2020

1.7.1

Jun 30, 2020

1.7.0

Jun 24, 2020

1.6.2

Jun 10, 2020

1.6.1

May 28, 2020

1.6.0

May 4, 2020

1.5.6

Apr 23, 2020

1.5.5

Apr 6, 2020

1.5.4

Mar 8, 2020

1.5.3

Mar 8, 2020

1.5.2

Jan 30, 2020

1.5.1

Jan 30, 2020

1.5.0

Jan 29, 2020

1.4.3

Jan 27, 2020

1.4.2

Jan 19, 2020

1.4.1

Dec 6, 2019

1.4.0

Dec 4, 2019

1.3.0

Dec 3, 2019

1.2.10

Dec 3, 2019

1.2.9

Oct 22, 2019

1.2.8

Oct 22, 2019

1.2.5

Oct 21, 2019

1.2.4

Oct 21, 2019

1.2.3

Oct 3, 2019

1.2.1

Sep 21, 2019

1.2.0

Sep 19, 2019

1.1.3

Sep 3, 2019

1.1.2

Aug 2, 2019

1.1.1

Aug 1, 2019

1.1.0

Aug 1, 2019

1.0.0

Jul 30, 2019

0.14.2

Jul 12, 2019

0.14.1

Jul 12, 2019

0.14.0

Jul 12, 2019

0.13.0

Jul 10, 2019

0.12.0

May 2, 2019

0.11.1

Apr 5, 2019

0.11.0

Mar 17, 2019

0.10.1

Mar 5, 2019

0.10.0

Jan 31, 2019

0.9.2

Jan 9, 2019

0.9.1

Jan 9, 2019

0.9.0

Jan 9, 2019

0.8.0

Dec 28, 2018

0.7.20

Dec 14, 2018

0.7.19

Oct 10, 2018

0.7.18

Sep 2, 2018

0.7.17

Aug 27, 2018

0.7.16

Aug 27, 2018

0.7.15

Aug 27, 2018

0.7.14

Aug 23, 2018

0.7.13

Aug 19, 2018

0.7.12

Aug 19, 2018

0.7.11

Aug 19, 2018

0.7.10

Aug 18, 2018

0.7.9

Aug 15, 2018

0.7.8

Aug 10, 2018

0.7.7

Aug 10, 2018

0.7.6

Aug 1, 2018

0.7.4

Jul 18, 2018

0.7.3

Jul 17, 2018

0.7.2

Jul 17, 2018

0.7.1

Jul 14, 2018

0.7.0

Jul 14, 2018

0.6.1

Jul 9, 2018

0.6.0

Jul 9, 2018

0.5.5

May 21, 2018

0.5.4

May 16, 2018

0.5.3

May 15, 2018

0.5.2

May 4, 2018

0.5.1

May 4, 2018

0.5.0

May 1, 2018

0.4.12

Apr 13, 2018

0.4.11

Apr 12, 2018

0.4.10

Apr 12, 2018

0.4.9

Apr 11, 2018

0.4.8

Apr 11, 2018

0.4.7

Apr 11, 2018

0.4.6

Mar 28, 2018

0.4.5

Mar 28, 2018

0.4.4

Mar 14, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

memorious-2.6.5.tar.gz (41.0 kB view details)

Uploaded Jan 10, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

memorious-2.6.5-py2.py3-none-any.whl (52.4 kB view details)

Uploaded Jan 10, 2024 Python 2Python 3

File details

Details for the file memorious-2.6.5.tar.gz.

File metadata

Download URL: memorious-2.6.5.tar.gz
Upload date: Jan 10, 2024
Size: 41.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for memorious-2.6.5.tar.gz
Algorithm	Hash digest
SHA256	`5690d32309cc7a269190bd157df7b6a4c9f9f9e896367ea1ba02d483c211e76d`
MD5	`69beecfbb546ca35eff82b47771f8ef6`
BLAKE2b-256	`508597ec7c1f8bdd90f73347b3972a5b6c663f5995e7a49e4cd3f73c46af8510`

See more details on using hashes here.

File details

Details for the file memorious-2.6.5-py2.py3-none-any.whl.

File metadata

Download URL: memorious-2.6.5-py2.py3-none-any.whl
Upload date: Jan 10, 2024
Size: 52.4 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for memorious-2.6.5-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`5997259e0e5e3e92012bd87d506dfd947f8900c53e8c5717696169a523c48780`
MD5	`3c64862426bff79ca7744238b1057dcd`
BLAKE2b-256	`70dcf8543dbc42b92a041bfa59a5aa57e61e9a8906cd8bde1bcbe6fcca51dbe1`

See more details on using hashes here.

memorious 2.6.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Design

Documentation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes