Skip to main content

Grab and rinse financial and economic data.

Project description

PyPI version Codacy Badge

Operation Pluto is a pipeline set-up. It plumbs financial and economic data. Focused markets are Hong Kong, U.S. and China.

This data pipeline is organized in Luigi framework with Python.

Available Data

Currently connected data sources :

Hong Kong

United States

China

  • ?

Master Data

Pipeline Organization

  • Crawl websites, back-fill past data, and construct file directories. All done as code.

  • One table in data source corresponds to one target file.

  • Pipeline task is stateful. Overwrite source file the least possible.

Prerequisites

Getting Started

Have Python 3.5 installed and clone this repository :

# Clone this repository
$ git clone https://github.com/hydra-lab/operation-pluto

Install Python dependencies :

# Installing with Conda may not work
$ pip install -r requirements.txt

Set up Luigi configuration file :

# Rename luigi.cfg.sample to luigi.cfg
$ mv luigi.cfg.sample luigi.cfg

Configure proxies in luigi.cfg if you’re behind any :

[proxies]
https = https://username:password@hostname:port/

Test the installation. New data should be extracted and parsed into folder test/data :

$ python -m luigi --module main RunMock --local-scheduler
$ ls test/data

High-level job orchestration is done in main.py. e.g. RunAll() is the wrapper class to initialize whole data directory and trigger all processing tasks. In production, tasks should be run on Luigi server. Because Luigi daemon will not run on Windows, simply run :

# Run Luigi server on http://localhost:8082
$ luigid
# Run task on Luigi server
$ python -m luigi --module main RunAll

Schedule pipeline to run periodically in Task Scheduler or cron. Set up run.sh on Windows :

# Script on Windows
start luigid
python -m luigi --module main RunAll
cmd "/c taskkill /IM "luigid.exe" /T /F"

License

License: AGPL v3

This project is licensed under GNU Affero General Public License, Version 3.0. See LICENSE for full license text.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Operation-Pluto-0.1.1.tar.gz (35.2 kB view details)

Uploaded Source

File details

Details for the file Operation-Pluto-0.1.1.tar.gz.

File metadata

File hashes

Hashes for Operation-Pluto-0.1.1.tar.gz
Algorithm Hash digest
SHA256 f36742d2f2e356930834baa49c01752caab7feba02db5a0c78adcc72b9e361ea
MD5 d22c3a4657d1785306c9573ad41e030a
BLAKE2b-256 4caba31bf07f801d4596cfecec1144b425b0a3125d4de0c1bb3f5eabbf974d4e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page