Skip to main content

No project description provided

Project description

https://travis-ci.org/internetarchive/trough.svg?branch=master

Trough

Big data, small databases.

Big data is really just lots and lots of little data.

If you split a large dataset into lots of small SQL databases sharded on a well-chosen key, they can work in concert to create a database system that can query very large datasets.

Worst-case Performance is important

A key insight when working with large datasets is that with monolithic big data tools’ performance is largely tied to having a full dataset completely loaded and working in a production-quality cluster.

Trough is designed to have very predictable performance characteristics: simply determine your sharding key, determine your largest shard, load it into a sqlite database locally, and you already know your worst-case performance scenario.

Designed to leverage storage, not RAM

Rather than having huge CPU and memory requirements to deliver performant queries over large datasets, Trough relies on flat sqlite files, which are easily distributed to a cluster and queried against.

Reliable parts, reliable whole

Each piece of technology in the stack was carefully selected and load tested to ensure that your data stays reliably up and reliably queryable. The code is small enough for one programmer to audit.

Ease of installation

One of the worst parts of setting up a big data system generally is getting setting sensible defaults and deploying it to staging and production environments. Trough has been designed to require as little configuration as possible.

An example ansible deployment specification has been removed from the trough repo but can be found at https://github.com/internetarchive/trough/tree/cc32d3771a7/ansible. It is designed for a cluster Ubuntu 16.04 Xenial nodes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Trough-0.2.0.tar.gz (34.5 kB view details)

Uploaded Source

Built Distribution

Trough-0.2.0-py3-none-any.whl (39.2 kB view details)

Uploaded Python 3

File details

Details for the file Trough-0.2.0.tar.gz.

File metadata

  • Download URL: Trough-0.2.0.tar.gz
  • Upload date:
  • Size: 34.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for Trough-0.2.0.tar.gz
Algorithm Hash digest
SHA256 7facf2ef5b78afe8962804ff62b29846ef0cca67ebf2520aade35a51af2bf869
MD5 d3aeaebe353745c842482cef92472235
BLAKE2b-256 644a1d1069f7a5734781e0dcee3c61ab1fc6886551db788f29bb4e352b32cc2e

See more details on using hashes here.

File details

Details for the file Trough-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: Trough-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 39.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for Trough-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 61f0f57bcb6990e11280fe9333957886bf71cba08315d945e4ac4fcbf38d3d24
MD5 58c981bd716e98a9f868c82315478733
BLAKE2b-256 cd326b987e79d952caa8028c019567b872750fcc49391ca450c3e30e8fed36df

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page