Data collection manager
Project description
aswan
collect and organize data into a T1 data depot named after the Aswan Dam
Collect and compress data from the internet for later parsing
- quick, parallel, customizable to collect
- compressed to store
- quick to sync with a remote store
- sync to continue collecting
- sync to parse
- immutable collection
To Setup a Remote
set the environment variables ASWAN_AUTH_HEX and ASWAN_AUTH_PASS according to the zimmauth package, and ASWAN_REMOTE with the name of the default remote.
Concepts
- objects
- saved by collection events
- events
- collection
- registration (v2: registration for parsing)
- (v2) parsing
- runs
- manual run vs automated run
- makes manual adding of urls easy but revertible
- has unique id
- generates events
- linked to a specific version of the code
- ideally commit hash + pip freeze
- manual run vs automated run
- statuses
- determined by base status + runs integrated
- contains
- what urls need to be collected
- (v2) what collected objects need to be parsed
- sqlite file, constantly trimmed
Structure
-
objects
- 00, 01, ...
-
runs
- run-hash
- context.yaml
- commit-hash, pip-freeze, ...
- events.zip
- context.yaml
- run-hash
-
statuses
- status-hash
- context.yaml
- parent-status, integrated
- db.sqlite.zip
- context.yaml
- status-hash
-
current-run
- context.yaml
- events
- these to be compressed into ../runs
- status.sqlite
-
there is a 'TEST' status
- cannot be integrated whatever is based on it
- a test run can be made on it...
when starting a run:
- check if current-run is empty
- if not, fail with
- find latest status
- if it has not integrated all past runs, create a new status that has
- start collection (+ registration)
- either stops or breaks, all events and objects are saved to disk
- if properly stops, move and compress stuff
- based on one that was the starter, and current run id
Pre v1.0 laundry list
-
parallelize push / pull
-
parsing/connection/broken session error docs
-
transferring / ignoring cookies
-
template projects
- oddsportal
- updating thingy, based on latest match in season
- footy
- rotten
- boxoffice
- oddsportal
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aswan-0.5.15.tar.gz.
File metadata
- Download URL: aswan-0.5.15.tar.gz
- Upload date:
- Size: 45.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.32.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4397c61d99bb062f759c060636af311d72ece21a026553422ddebe274b2e805a
|
|
| MD5 |
bc77c2fa8fc77c80c281074a4b06b816
|
|
| BLAKE2b-256 |
b481d178f76ce9225dd911e4f0c3e7439f5782aa84bd44321daf9b4c1820fc84
|
File details
Details for the file aswan-0.5.15-py3-none-any.whl.
File metadata
- Download URL: aswan-0.5.15-py3-none-any.whl
- Upload date:
- Size: 48.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.32.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8bdf0a4f8ad0758e321ace6b45d7f9e0049138b0420c1550d3f1ebdb8483e945
|
|
| MD5 |
9cecd3157a5836809f6221128a25538f
|
|
| BLAKE2b-256 |
3eeb3f4c71f362b3315934211cf022e61e7c55a50ed6773bbd93786df8a6d979
|