A library to query the Rich Data Services API framework developed by MTNA
Project description
RDS Python
WARNING: THIS PROJECT IS IN EARLY DEVELOPMENT STAGE. CONTENT OR CODE SHOULD ONLY BE USED FOR TESTING OR EVALUATION PURPOSES.
This python module utilizes MTNA's Rich Data Services API to quickly and efficiently access data sets and metadata stored in our repository. Through RDS, you can easily perform complex queries and tabulations on the data you are interested in while also getting back any relevant metadata.
RDS greatly simplifies the long process finding the data to begin with, cleaning and transforming the data, and converting the data into a dataframe. All of this is done in a single step using our queries. This lets you focus on the analyzing and visualizing of the data instead of managing it.
References
RDS API Documentation | Examples | Contributing | Developer Documentation | Changelog |
---|
Contents:
Announcements
Version v0.2.18 released
This version of RDS Python has added the feature of using an API key when connecting to an instance of RDS.
Installation
Using pip
Use the package manager pip to install rds python
pip install mtna-rds
Usage
Through the RDS API, you care able to query for records of data as well as perform a tabulation. Both a simple query and a tabulation contain options for grouping, ordering and filtering of the data, as well as specifying if metadata is wanted or not.
The data returned by a query/tabulation will be contained within an RdsResults
object. This object has three properties: one is the records of data that can be used to build out a dataframe for a graph or chart, one is the column names for each column of data in the records, and the last is a collection of metadata in JSON format that provides information that can be used for better analyzation of your data.
Select Query
Imagine that you would like to get some demographic data in the United State. You look through our Catalog and see that we have the data you are interested in. The first thing you would need to do to access this data is to establish a link to the demographic dataset that we host in our repository. To do this, you simply create a DataProduct
with the ID of the dataproduct that contains the demographic information and the ID of the catalog that contains the dataproduct.
from rds import Server
server = Server("domain", "api_key")
catalog = server.get_catalog("catalog_id")
dataproduct = catalog.get_dataproduct("dataproduct_id")
Once the DataProduct
is created, you can perform your query and get back the results (which contains records in a dataframe). If you wanted to know how many people were born between the years 1900 and 1950 for each year, you could perform the following query.
results = dataproduct.select(cols=["year_of_birth", "amount_born:count(*)"], where=["year_of_birth>1900"], orderby=["year_of_birth"], groupby=["year_of_birth"], limit=50)
This query tells RDS that we want the year of birth for each records as well as the number of records with that year of birth (where we are renaming the column to "amount_born"). We then filter for everyone born after 1900. We also make sure the data is in the correct order and then group the data by year of birth so that we only have a single record returned per year. Setting the limit to 50 ensures we only get date from years 1900 to 1950 (assuming there are no missing years of data).
After obtaining the data, you can pull out the records and columns and place directly into a dataframe for use in a graph or chart. Below we demonstrate by building out a simple line plot of people born per year, utilizing the pandas package.
import pandas as pd
dataframe = pd.DataFrame(results.records, columns=results.columns)
sns.lineplot(data=dataframe, x=dataframe.columns[0], y=dataframe.columns[1])
plt.show()
Tabulation Query
A tabulation query is used almost identically to a select query, except it uses different parameters as a tabulation is more useful for checking the relationships between columns of data
If you wanted to know the amount of male/females for each race in the census, you would perform the below tabulation query.
results = dataproduct.tabulate(dims=["sex", "race"], measure=["count(*)"], orderby=["race"], inject=True)
You can think of the parameter dims
as the dimension of a tabulation table, and the parameter measure
as the value that you want in each cell of the table. One thing you may notice that is new is the inject
parameter. This signifies that we want to replace any "coded" values with their more readable labels. Sex can be an example of a "coded" value as many times the data is coded as "1" to refer to male and a "2" to refer to female. Since "1" and "2" would not be very descriptive in a chart, RDS gives you the ability to replace them with what the codes actually mean.
Metadata
Metadata can be directly asked for on any of our resources. This includes the server, catalogs, dataproducts, variables, classifications, and codes. The metadata contains extensive information on what the resource is and what it is used for.
About
This project is developed and maintained by MTNA.
More detailed documentation about what the current version of RDS can do can be found here
If you are interested in using the RDS framework directly, you can visit our site here.
Software
Compatible with Python 2.7 and Python 3.6 and higher.
If using python 3, it is recommended that you utilize pandas dataframes when working with any records returned from an RDS query.
The are no dependencies required to run RDS Python.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file mtna-rds-0.2.18.tar.gz
.
File metadata
- Download URL: mtna-rds-0.2.18.tar.gz
- Upload date:
- Size: 11.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.25.1 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 173919f9466c94923edb61e249cc1d6256078b1a1796dfe2f3a8998a4b49670c |
|
MD5 | be42a5a008dae48c6930c4f1a27060bd |
|
BLAKE2b-256 | 00392c964c71c4a546f365006c2455d7367dfc678bcdaec0c7982fb4fd48005d |
File details
Details for the file mtna_rds-0.2.18-py3-none-any.whl
.
File metadata
- Download URL: mtna_rds-0.2.18-py3-none-any.whl
- Upload date:
- Size: 13.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.25.1 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9e3ca6951629f15db0cf00485007fe718f018f249bcc7b49f29838739784bb9b |
|
MD5 | 53510e5a51ea36f7c1ca96ad45c850f1 |
|
BLAKE2b-256 | 7f1b62101ba9f2fb5e71af1d2b15f894afc50a3079a166ec5b1f916687dfc85c |