Energy Dashboard Command Line Interface (CLI)
Project description
Energy Dashboard Command Line Interface (edc)
Command Line Interface for the Energy Dashboard.
!!!PRE-ALPHA!!!
While this is the master branch, this project is not released yet. Stand by...
All examples commands, install, etc. assume a linux (ubuntu) installation
and use the apt
package manager, etc.
Prerequisites
Install basic deps
sudo apt install parallel
sudo apt install build-essential
sudo apt install git
Install git-lfs (git large file store)
git-lfs is used for storing the database files, which are basically binary blobs that are updated periodically. Rather than store all the db blob revisions in the git repository, which would bloat it considerably, the db blobs are offloaded to git-lfs.
For installation instructions, go here:
Example:
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
Install conda/anaconda
You don't strictly need anaconda for this toolchain to work. If you prefer mucking with python virtualenv directly, then go for it. I find that anaconda works really well with other parts of this toolchain, namely Jupyter Notebooks. All the examples and documentation will assume you are using anaconda.
Example, see the website for current instructions:
Example:
wget https://repo.anaconda.com/archive/Anaconda3-2019.07-Linux-x86_64.sh
chmod +x Anaconda3-2019.07-Linux-x86_64.sh
./Anaconda3-2019.07-Linux-x86_64.sh
Installation
This webpage has a great tutorial on how to use conda. It's what I use when I forget the commands and concepts:
First, create a conda environment, it can be named anything, I'll call
this edc-cli
:
conda update conda
conda create -n edc-cli python=3 numpy jupyter pandas
conda activate edc-cli
Then install the energy-dashboard-client:
pip install -U energy-dashboard-client
Setup
The energy-dashboard-client has two commands to get you up and running with an energy-dashboard:
- clone : this will literally use git to clone the energy-dashboard repo to your local machine
- update : this will pull down all the submodules to your local machine
Note: if you only want a subset of the submodules installed on your local machine, then you
can use the git submodule deinit data/[name-of-submodule-to-remove]
.
As always, let me know if you need better tooling around this or any other aspect of this project.
mkdir foo
cd foo
edc clone
cd energy-dashboard
edc update
At this point you should have a working environment:
Verify that you have files:
$ tree -L 1
.
├── data
├── docs
├── LICENSE
├── notebooks
├── README.md
└── run.sh
Verify that edc
works:
`$ edc --help
Usage: edc [OPTIONS] COMMAND [ARGS]...
Command Line Interface for the Energy Dashboard. This tooling collects
information from a number of data feeds, imports that data, transforms
it, and inserts it into a database.
Options:
--ed-dir TEXT Energy Dashboard directory (defaults to cwd)
--log-level [CRITICAL|ERROR|WARNING|INFO|DEBUG]
--help Show this message and exit.
Commands:
clone Clone energy-dashboard locally
feed Manage individual 'feed' (singular).
feeds Manage the full set of data 'feeds' (plural).
license Show the license (GPL v3).
update Update the submodules
Verify that you can list out the data feeds:
$ edc feeds list | head
data-oasis-atl-ruc-zone-map
data-oasis-cbd-nodal-grp-cnstr-prc
data-oasis-cmmt-rmr-dam
data-oasis-atl-sp-tie
data-oasis-prc-mpm-cnstr-cmp-dam
data-oasis-trns-curr-usage-all-all
data-oasis-ene-baa-mkt-events-rtd-all
data-oasis-ene-eim-transfer-limit-all-all
data-oasis-as-results-dam
data-oasis-ene-wind-solar-summary
Using find
we can verify that we don't have any data files
such as .zip, .xml, or .sql in the tree, but that we do have
the state files:
$ find data/ | grep state | head
data/data-oasis-atl-ruc-zone-map/sql/state.txt
data/data-oasis-atl-ruc-zone-map/xml/state.txt
data/data-oasis-atl-ruc-zone-map/zip/state.txt
data/data-oasis-cbd-nodal-grp-cnstr-prc/sql/state.txt
data/data-oasis-cbd-nodal-grp-cnstr-prc/xml/state.txt
data/data-oasis-cbd-nodal-grp-cnstr-prc/zip/state.txt
data/data-oasis-cmmt-rmr-dam/sql/state.txt
data/data-oasis-cmmt-rmr-dam/xml/state.txt
data/data-oasis-cmmt-rmr-dam/zip/state.txt
data/data-oasis-atl-sp-tie/sql/state.txt
Now verify what databases you have downloaded...
$ find data/ | grep "\.db$" | head
data/data-oasis-atl-gen-cap-lst/db/data-oasis-atl-gen-cap-lst_00.db
data/data-oasis-sld-adv-fcst-rtd/db/data-oasis-sld-adv-fcst-rtd_01.db
data/data-oasis-sld-adv-fcst-rtd/db/data-oasis-sld-adv-fcst-rtd_00.db
data/data-oasis-sld-sf-eval-dmd-fcst/db/data-oasis-sld-sf-eval-dmd-fcst_00.db
data/data-oasis-sld-sf-eval-dmd-fcst/db/data-oasis-sld-sf-eval-dmd-fcst_03.db
data/data-oasis-sld-sf-eval-dmd-fcst/db/data-oasis-sld-sf-eval-dmd-fcst_01.db
data/data-oasis-sld-sf-eval-dmd-fcst/db/data-oasis-sld-sf-eval-dmd-fcst_05.db
data/data-oasis-sld-sf-eval-dmd-fcst/db/data-oasis-sld-sf-eval-dmd-fcst_04.db
data/data-oasis-sld-sf-eval-dmd-fcst/db/data-oasis-sld-sf-eval-dmd-fcst_02.db
data/data-oasis-ene-flex-ramp-dc-rtd-all/db/data-oasis-ene-flex-ramp-dc-rtd-all_00.db
I'll go over this in more detail below, but the reason there are multiple database files for a given data feed is because the feed has multiple formats (argh!) and I have not yet sorted out how to deal with that. More on this later.
Use Cases
Create Jupyter Notebook
TODO
This is what most of the users of this project want to do.
Process Data Feeds
At a high level, a data feed is simply a url and some instructions for processing
it. The url is stored in the manifest.json
, and the processing instructions
are stored in the ./src
directory. The ./src
directory contains python files
that handle downloading, parsing, constructing sql insert statements, and inserting
the data into a sqlite3 database. See the section on Add New Data Feed
for more
details on the construction of a data feed.
Data feeds are processed in stages, and you can think of this as a vertical to horizontal processing.
Horizontally, the process is very simple, we move from downloading a resource through the stages until we insert the records into a database. The DATABASE is the final product.
DOWNLOAD -> EXTRACT -> PARSE -> SQL -> INSERT -> *DATABASE*
Vertically, the process is also very simple. Each stage processes all the artifacts
from the previous stage. However, we need this to be robust and re-startable. No idea
if that's a word, but it's a real thing. Machines crash, or you may need to stop
for some reason. I've also had the case where I had an error in the logic in a given
stage, and needed to start over from scratch. How do we do this here? Easy. A state
file:
DOWNLOAD -> EXTRACT -> PARSE SQL -> INSERT -> *DATABASE*
./zip/ ./xml/ ./sql/ ./db/
state.txt state.txt state.txt state.txt
Each state.txt contains a list of artifacts that have already been processed.
Originally I called these: ./zip/downloaded.txt, ./xml/unzipped.txt, etc. But
after working with this for a few days, it is easier to just cat
out [dir]/state.txt
rather than remembering what each state file is named.
Each stage in the pipeline looks at the artifacts in the previous stage and compares
that list with the list of previously processed artifacts in it's state.txt
file, and
then gives the delta of new files to the processing code.
So, to restart a given stage, you just delete the stage directory. This deletes all the generated artifacts and the state file. Voila. You are ready to start over.
Note: if you delete a stage, you may want to delete the subsequent stages, too.
Here's the command that does this for you:
edc feed [feed name] reset [stage]
And to process a given stage:
edc feed [feed name] proc [stage]
Here's the scenario. I've been writing the code for this project on my laptop. But it does not have the horsepower to crunch all this data into the various sqlite databases in a reasonable amount of time. So I'm firing up a desktop machine to perform the heavy lifting. Here's what that process looks like. This is the same process any researcher that wanted to replicate my work would want to do.
Clone and update
mkdir foo
edc clone
cd energy-dashboard
edc update
At this point, we have the energy-dashboard project, but we don't want to re-download all the previously downloaded files from their original source. In the case of CAISO OASIS, that would simply take too long (I've calculated the upper bound as 152 days, though in reality it took about 3 weeks to download the resources here). Instead, we can pull these previously downloaded artifacts from one of the public S3 buckets that I've mirrored them on...
Example:
#edc feed [feed-name] s3restore
edc feed data-oasis-atl-ruc-zone-map s3restore
To grab the artifacts from the entire set of feeds:
edc feeds list | xargs -L 1 -I {} edc feed {} s3restore
Add New Data Feed
TODO
Show Help
edc
Usage: edc [OPTIONS] COMMAND [ARGS]...
Command Line Interface for the Energy Dashboard. This tooling collects
information from a number of data feeds, imports that data, transforms
it, and inserts it into a database.
Options:
--config-dir TEXT Config file directory
--debug / --no-debug Enable debug logging
--help Show this message and exit.
Commands:
config Manage config file.
feed Manage individual 'feed' (singular).
feeds Manage the full set of data 'feeds' (plural).
license Show the license (GPL v3).
config
Usage: edc config [OPTIONS] COMMAND [ARGS]...
Manage config file.
Options:
--help Show this message and exit.
Commands:
show Show the config
update Update config
feed
Usage: edc feed [OPTIONS] COMMAND [ARGS]...
Manage individual 'feed' (singular).
Options:
--help Show this message and exit.
Commands:
archive Archive feed to tar.gz
create Create new feed
download Download from source url
invoke Invoke a shell command in the feed directory
proc Process a feed through the stages
reset Reset feed to reprocess stage
restore Restore feed from tar.gz
s3archive Archive feed to S3 bucket
s3restore Restore feed zip files from from S3 bucket
status Show feed status
feeds
Usage: edc feeds [OPTIONS] COMMAND [ARGS]...
Manage the full set of data 'feeds' (plural).
Options:
--help Show this message and exit.
Commands:
list List feeds
search Search feeds (NYI)
license
edc : Energy Dashboard Command Line Interface
Copyright (C) 2019 Todd Greenwood-Geer (Enviro Software Solutions, LLC)
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
##Usage
Examples
edc feed invoke data-oasis-atl-lap-all "git st"
edc feed invoke data-oasis-atl-lap-all "ls"
edc feed invoke data-oasis-atl-lap-all "cat manifest.json"
edc feed invoke data-oasis-atl-lap-all "head manifest.json"
edc feeds list
edc feeds list | grep atl
edc feeds list | grep atl | edc feed invoke "head manifest.json"
edc feeds list | grep atl | edc feed invoke "head manifest.json" -
edc feeds list | grep atl | xargs -L 1 -I {} edc feed invoke {} "head manifest.json"
edc feeds list | grep atl | xargs -L 1 -I {} edc feed invoke {} "jq . < manifest.json"
edc feeds list | grep atl | xargs -L 1 -I {} edc feed invoke {} "jq .url < manifest.json"
edc feeds list | grep mileage | xargs -L 1 -I {} edc feed invoke {} "echo {}; sqlite3 db/{}.db 'select count(*) from oasis'"
edc feeds list | grep atl | xargs -L 1 -I {} edc feed invoke {} "jq .url < manifest.json"
edc feeds list| xargs -L 1 -I {} edc feed invoke {} "echo {}; sqlite3 db/{}.db 'select count(*) from oasis'"
edc feeds list | grep atl | xargs -L 1 -I {} edc feed status {}
edc feeds list | grep atl | xargs -L 1 -I {} edc feed status --header {}
edc feeds list | grep atl | xargs -L 1 -I {} edc feed status --header {}
edc feeds list | grep mileage | xargs -L 1 -I {} edc feed status --header {}
edc feeds list | xargs -L 1 -I {} edc feed invoke {} "./src/10_down.py"
edc feed archive data-oasis-as-mileage-calc-all
edc feed archive data-oasis-as-mileage-calc-all | xargs -L 1 -I {} tar -tvf {}
edc feed reset data-oasis-as-mileage-calc-all --stage xml --stage db
edc feed s3restore data-oasis-as-mileage-calc-all --outdir=temp --service=wasabi
edc feed s3archive data-oasis-as-mileage-calc-all
Onboarding
Some quick notes on how I onboarded 'data-oasis-as-mileage-calc-all':
edc feed proc data-oasis-as-mileage-calc-all
edc feed s3archive data-oasis-as-mileage-calc-all --service wasabi
edc feed s3archive data-oasis-as-mileage-calc-all --service digitalocean
edc feed status data-oasis-as-mileage-calc-all --header
edc feed invoke data-oasis-as-mileage-calc-all "git st"
edc feed invoke data-oasis-as-mileage-calc-all "git log"
edc feed invoke data-oasis-as-mileage-calc-all "git show HEAD"
Author
Todd Greenwood-Geer (Enviro Software Solutions, LLC)
Notes
This project uses submodules, and this page has been useful: https://github.blog/2016-02-01-working-with-submodules/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for energy-dashboard-client-0.3.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9f7c2553f56eb724f638e636ff755ff50a0aa866cce3f9244d767f05eb72d771 |
|
MD5 | baeace4665a64d82bfe9866a87d30049 |
|
BLAKE2b-256 | c710078020a33766436545335a7afa2fa5573589a4642a360b287e8d54015bb5 |
Hashes for energy_dashboard_client-0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 56072a3be92bff5bb3d35f9e32aab0305b54616d34306779d7a8f35418adcda3 |
|
MD5 | ecb206e1c947348c2f844a978f64fef8 |
|
BLAKE2b-256 | cf08be43694f299340db48d15c7c98e04e6ef131a39c322b24a152329c3f7058 |