Python utility to get open data from some popular websites
Project description
Overview
pyopendata is a Python utility to offer an unified API to read world various data sources, and output pandas.DataFrame. Which covers:
CKAN websites ( data.gov , data.go.jp , etc)
Documentation
Installation
pip install pyopendata
Basic Usage
This section explains how to retrieve data from website which uses CKAN API.You can create DataStore instance to access CKAN website by passing CKAN URL to DataStore class.
In this example, we’re going to retrieve the ‘California Unemployment Statistics’ data from data.gov. The target URL is:
We can read abov URL as:
CKAN API URL: https://catalog.data.gov/dataset
Package ID: california-unemployment-statistics
Resource ID: ffd05307-4528-4d15-a370-c16222119227
>>> import pyopendata as pyod
>>> store = pyod.DataStore('http://catalog.data.gov/')
>>> store
CKANStore (http://catalog.data.gov)
DataStore.serch performs search by keyword. Results will be the list of packages. You can select a target package by slicing.
>>> packages = store.search('Unemployment Statistics')
>>> packages
[annual-survey-of-school-system-finances (1 resource),
current-population-survey (1 resource),
federal-aid-to-states (1 resource),
current-population-survey-labor-force-statistics (2 resources),
dataferrett (1 resource),
mass-layoff-statistics (1 resource),
unemployment-rate (3 resources),
consolidated-federal-funds-report (1 resource),
annual-survey-of-state-and-local-government-finances (1 resource),
local-area-unemployment-statistics (2 resources)]
>>> packages[0]
annual-survey-of-school-system-finances (1 resource)
Otherwise, specify the package name to be retrieved.
>>> package = store.get('california-unemployment-statistics')
>>> package
Resource ID: ffd05307-4528-4d15-a370-c16222119227
Resource Name: Comma Separated Values File
Resource URL: https://data.lacity.org/api/views/5zrb-xqhf/rows.csv?accessType=DOWNLOAD
Format: CSV, Size: None
A package has resources (files) which contains actual data. You use get method to retrieve the resource.
>>> resource = package.get('ffd05307-4528-4d15-a370-c16222119227')
>>> resource
Resource ID: ffd05307-4528-4d15-a370-c16222119227
Resource Name: Comma Separated Values File
Resource URL: https://data.lacity.org/api/views/5zrb-xqhf/rows.csv?accessType=DOWNLOAD
Format: CSV, Size: None
Once you get the resource, use read method to read data as pandas DataFrame.
>>> df = resource.read()
>>> df.head()
Year Period Area Unemployment Rate Labor Force \
0 2013 Jan California 10.4% 18556500
1 2013 Jan Los Angeles County 10.9% 4891500
2 2013 Jan Los Angeles City 12% 1915600
3 2013 Feb California 9.699999999999999% 18648300
4 2013 Feb Los Angeles County 10.3% 4924000
Employment Unemployment Adjusted Preliminary
0 16631900 1924600 Not Adj Not Prelim
1 4357800 533800 Not Adj Not Prelim
2 1684800 230800 Not Adj Not Prelim
3 16835900 1812400 Not Adj Not Prelim
4 4418000 506000 Not Adj Not Prelim
Or you can get raw data by specifying raw=True.
>>> raw = resource.read(raw=True)
>>> raw[:100]
'Year,Period,Area,Unemployment Rate,Labor Force,Employment,Unemployment,Adjusted,Preliminary\n2013,Jan'
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file pyopendata-0.0.2.tar.gz
.
File metadata
- Download URL: pyopendata-0.0.2.tar.gz
- Upload date:
- Size: 28.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5a2cf9f8d86273d8f443ce938b9e7c7be6c5807de11577709704ba2921f4d4cb |
|
MD5 | 5dc897984a5e7db9adb97af55c5a9c7a |
|
BLAKE2b-256 | 6ca45a21d96b0344987bb3ae892de15d84bc582107a1cb43c1ba18cfb1911b5f |