Initial Dev Upload version
Big data, small databases.
The overall thrust of the design of trough is to separate a very large dataset into small SQLite databases that can be quickly and easily queried. The system focuses on simplicity and ease of administration and setup.
Configuration/Ansible Deployment system
- configures hosts
- deploys 4 types of worker nodes: a read worker, a write worker, a consul worker and a synchronizer worker.
- performs read requests on sqlite files
- receives POST requests of SQL, throws error on non-SELECT queries
- throws an error in the case that it is queried for a non-existant database
- one reader per httpd thread
- performs write requests to sqlite files
- receives POST requests of SQL, throws error on non-INSERT/UPDATE queries
Consul Agent (Server mode)
- Vanilla consul agent running in server mode
- 3 dedicated server consul nodes
- Every Read Node and Write Node runs consul agent (a gossip agent)
- By way of DNS, HTTP POST SQL queries are sent directly to a Read Node
- for example: POST http://128100.trough.service.archive-it.org/ —– SELECT count(id) FROM crawled_urls WHERE host = “www.example.com” GROUP BY status_code; —– Response returned as JSON
- Health Checks:
- make http
Consul Agent (Local mode)
- Vanilla consul agent running in local mode
Synchronizer (Server mode)
- A small process that runs on at least two nodes for failover redundancy (dedicated or shared with Read workers)
- Synchronizers elect a leader “Lead Synchronizer” via Consul
- Local Synchronizer pushes tags to consul when segments are available
- Local Synchronizer pushes a K/V for the current host and stored vs quota
- Lead Synchronizer assigns segments to other nodes based on stored/quota ratio
- Lead Synchronizer assigns one or more Read Nodes for each read-only segment (replication is configurable)
- Lead Synchronizer assigns one Write Node for each writable segment
- Lead Synchronizer discovers total segment pool from HDFS (file listing?)
Synchronizer (Local mode)
- the local synchronizer should always be run niced so that we interfere as little as possible with query times.
- checks if databases are removed from consul manifest, deletes them.
- reads a consul manifest, pulls down sqlite files from HDFS, checks against checksum
- sets up files to respond on /health/[ID], caches comparison of checksum of local files against HDFS, server returns 200 or 500 depending on comparison
- periodically reports the databases that are actually available on the local node, in case of partial failure.
- Runs a Read process
- Runs a Write process
- Runs a Synchronizer process (Local mode)
- Runs a Consul Agent (Local mode)
- Runs a Consul Agent (Server mode)
- Runs a Consul Agent (Local Mode)
- Runs a Synchronizer process (Server mode)
Release history Release notifications | RSS feed
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size Trough-0.1.dev0-py3-none-any.whl (19.7 kB)||File type Wheel||Python version py3||Upload date||Hashes View|
Hashes for Trough-0.1.dev0-py3-none-any.whl