Python library for loading/cleaning data used in Valorum training
Project description
# Valorum
This package provides a simplified interface to datasets that we use frequently.
As of now (1-1-18) this package is not registered on pypy. To install and use we recommend using the following steps from the command line:
1. Clone or download this repository
2. Change directory into the cloned repo
3. Call `pip install -e .`, where you can change the python the package is installed into by providing the full path to the pip executable
## Loading data
To see a list of available datasets run
```
import valorum
valorum.data.available()
```
To load one of the listed datasets run
```
df = valorum.data.load("dataset_name")
```
where `dataset_name` is replaced by one of the names returned by `valorum.data.available()`.
When you first load a dataset, valorum will fetch the data from somewhere online. It will then save a local copy of the data to your hard drive. Subsequent requests to load a dataset (even in different python sessions) will first attempt to load the data from your hard drive and only fetch from online if necessary.
## Configuration
The valorum library is configurable. Below is a listing of available configuration options.
To see a list of valid configuration options run
```
import valorum
valorum.data.config.describe_options()
```
To set a configuration use `valourm.data.options[section.option] = value`.
For example, to set the configuration option for the BLS api_key I would call:
```
import valorum
valorum.data.options["bls.api_key"] = "MY_API_KEY"
```
## Developer docs
### Contributing datasets
To contribute a dataset you need to implement a function `_retrieve_{name}` inside the file `data/retrieve.py`. This function is responsible for obtaining the data either "by hand" (data hard coded into the function) or from online. The function must return a pandas DataFrame with the data.
This package provides a simplified interface to datasets that we use frequently.
As of now (1-1-18) this package is not registered on pypy. To install and use we recommend using the following steps from the command line:
1. Clone or download this repository
2. Change directory into the cloned repo
3. Call `pip install -e .`, where you can change the python the package is installed into by providing the full path to the pip executable
## Loading data
To see a list of available datasets run
```
import valorum
valorum.data.available()
```
To load one of the listed datasets run
```
df = valorum.data.load("dataset_name")
```
where `dataset_name` is replaced by one of the names returned by `valorum.data.available()`.
When you first load a dataset, valorum will fetch the data from somewhere online. It will then save a local copy of the data to your hard drive. Subsequent requests to load a dataset (even in different python sessions) will first attempt to load the data from your hard drive and only fetch from online if necessary.
## Configuration
The valorum library is configurable. Below is a listing of available configuration options.
To see a list of valid configuration options run
```
import valorum
valorum.data.config.describe_options()
```
To set a configuration use `valourm.data.options[section.option] = value`.
For example, to set the configuration option for the BLS api_key I would call:
```
import valorum
valorum.data.options["bls.api_key"] = "MY_API_KEY"
```
## Developer docs
### Contributing datasets
To contribute a dataset you need to implement a function `_retrieve_{name}` inside the file `data/retrieve.py`. This function is responsible for obtaining the data either "by hand" (data hard coded into the function) or from online. The function must return a pandas DataFrame with the data.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
valorum-0.0.1.tar.gz
(20.5 kB
view details)
File details
Details for the file valorum-0.0.1.tar.gz
.
File metadata
- Download URL: valorum-0.0.1.tar.gz
- Upload date:
- Size: 20.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e13b579eb9acd3cb3014d89ea001b9e48134d5801a0a566e8a07b0b067302968 |
|
MD5 | c3670b9edbb1f1b3867c4a16333eb116 |
|
BLAKE2b-256 | 062679b8d4e65d5712307670ba118cb1cd68eab1a3a471c78cb018ea856549a4 |