Skip to main content

A library to query the Rich Data Services API framework developed by MTNA

Project description

RDS Python

WARNING: THIS PROJECT IS IN EARLY DEVELOPMENT STAGE. CONTENT OR CODE SHOULD ONLY BE USED FOR TESTING OR EVALUATION PURPOSES.

Build Status Coverage Status PyPI version Python Version
License Code Style

This python module utilizes MTNA's Rich Data Services API to quickly and efficiently access data sets and metadata stored in our repository. Through RDS, you can easily perform complex queries and tabulations on the data you are interested in while also getting back any relevant metadata.

RDS greatly simplifies the long process finding the data to begin with, cleaning and transforming the data, and converting the data into a dataframe. All of this is done in a single step using our queries. This lets you focus on the analyzing and visualizing of the data instead of managing it.

References

RDS API Documentation Examples Contributing Developer Documentation Changelog

Contents:

Announcements

Version v0.2.12 released

This version of RDS Python allows you to take advantage of our powerful database framework through its select queries, tabulation queries, and metadata retrieval. All features for our query system are available through this python API.
{release date}

Installation

Using pip

Use the package manager pip to install rds python

pip install mtna-rds

Usage

Through the RDS API, you care able to query for records of data as well as perform a tabulation. Both a simple query and a tabulation contain options for grouping, ordering and filtering of the data, as well as specifying if metadata is wanted or not.

The data returned by a query/tabulation will be contained within an RdsResults object. This object has three properties: one is the records of data that can be used to build out a dataframe for a graph or chart, one is the column names for each column of data in the records, and the last is a collection of metadata in JSON format that provides information that can be used for better analyzation of your data.

Select Query

Imagine that you would like to get some demographic data in the United State. You look through our Catalog and see that we have the data you are interested in. The first thing you would need to do to access this data is to establish a link to the demographic dataset that we host in our repository. To do this, you simply create a DataProduct with the ID of the dataproduct that contains the demographic information and the ID of the catalog that contains the dataproduct.

from rds import Server

server = Server("domain")
catalog = server.get_catalog("catalog_id")
dataproduct = catalog.get_dataproduct("dataproduct_id")

Once the DataProduct is created, you can perform your query and get back the results (which contains records in a dataframe). If you wanted to know how many people were born between the years 1900 and 1950 for each year, you could perform the following query.

results = dataproduct.select(cols=["year_of_birth", "amount_born:count(*)"], where=["year_of_birth>1900"], orderby=["year_of_birth"], groupby=["year_of_birth"], limit=50)

This query tells RDS that we want the year of birth for each records as well as the number of records with that year of birth (where we are renaming the column to "amount_born"). We then filter for everyone born after 1900. We also make sure the data is in the correct order and then group the data by year of birth so that we only have a single record returned per year. Setting the limit to 50 ensures we only get date from years 1900 to 1950 (assuming there are no missing years of data).

After obtaining the data, you can pull out the records and columns and place directly into a dataframe for use in a graph or chart. Below we demonstrate by building out a simple line plot of people born per year, utilizing the pandas package.

import pandas as pd

dataframe = pd.DataFrame(results.records, columns=results.columns)

sns.lineplot(data=dataframe, x=dataframe.columns[0], y=dataframe.columns[1])
plt.show()

Tabulation Query

A tabulation query is used almost identically to a select query, except it uses different parameters as a tabulation is more useful for checking the relationships between columns of data

If you wanted to know the amount of male/females for each race in the census, you would perform the below tabulation query.

results = dataproduct.select(dims=["sex", "race"], measure=["count(*)"], orderby=["race"], inject=True)

You can think of the parameter dims as the dimension of a tabulation table, and the parameter measure as the value that you want in each cell of the table. One thing you may notice that is new is the inject parameter. This signifies that we want to replace any "coded" values with their more readable labels. Sex can be an example of a "coded" value as many times the data is coded as "1" to refer to male and a "2" to refer to female. Since "1" and "2" would not be very descriptive in a chart, RDS gives you the ability to replace them with what the codes actually mean.

Metadata

Metadata can be directly asked for on any of our resources. This includes the server, catalogs, dataproducts, variables, classifications, and codes. The metadata contains extensive information on what the resource is and what it is used for.

About

This project is developed and maintained by MTNA.

More detailed documentation about what the current version of RDS can do can be found here

If you are interested in using the RDS framework directly, you can visit our site here.

Software

Compatible with Python 2.7 and Python 3.6 and higher.

If using python 3, it is recommended that you utilize pandas dataframes when working with any records returned from an RDS query.

The are no dependencies required to run RDS Python.

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mtna-rds-0.2.12.tar.gz (8.8 kB view details)

Uploaded Source

Built Distribution

mtna_rds-0.2.12-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file mtna-rds-0.2.12.tar.gz.

File metadata

  • Download URL: mtna-rds-0.2.12.tar.gz
  • Upload date:
  • Size: 8.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for mtna-rds-0.2.12.tar.gz
Algorithm Hash digest
SHA256 426c34077707eaff4de0071427965caebd94685b05b8ce4cb852d07f00cb4548
MD5 fc8eb5a153cfaf585169e218863c8b54
BLAKE2b-256 d97951bc8f4b47b7aadf7ebe6c3bdc75f65be18852724e21ee2a7cf9ac65fb72

See more details on using hashes here.

File details

Details for the file mtna_rds-0.2.12-py3-none-any.whl.

File metadata

  • Download URL: mtna_rds-0.2.12-py3-none-any.whl
  • Upload date:
  • Size: 11.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for mtna_rds-0.2.12-py3-none-any.whl
Algorithm Hash digest
SHA256 589af0c1016cf4329ad80579734ee102a200f5194e37e151fc7bc24e1eb2b07e
MD5 8b16d1a473c19ca0c16906bc1aa53059
BLAKE2b-256 3cc7fe8f5d772f0b4e74db82ce15f74493dbdfae81ccc79528c3896a4920c687

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page