Skip to main content

Python utility to get open data from some popular websites

Project description

Latest Version Latest Docs https://travis-ci.org/sinhrks/pyopendata.svg?branch=master

Overview

pyopendata is a Python utility to offer an unified API to read world various data sources, and output pandas.DataFrame. Which covers:

Documentation

http://pyopendata.readthedocs.org/

Installation

pip install pyopendata

Basic Usage

This section explains how to retrieve data from website which uses CKAN API.You can create DataStore instance to access CKAN website by passing CKAN URL to DataStore class.

In this example, we’re going to retrieve the ‘California Unemployment Statistics’ data from data.gov. The target URL is:

We can read abov URL as:

>>> import pyopendata as pyod

>>> store = pyod.DataStore('http://catalog.data.gov/')
>>> store
CKANStore (http://catalog.data.gov)

DataStore.serch performs search by keyword. Results will be the list of packages. You can select a target package by slicing.

>>> packages = store.search('Unemployment Statistics')
>>> packages
[annual-survey-of-school-system-finances (1 resource),
 current-population-survey (1 resource),
 federal-aid-to-states (1 resource),
 current-population-survey-labor-force-statistics (2 resources),
 dataferrett (1 resource),
 mass-layoff-statistics (1 resource),
 unemployment-rate (3 resources),
 consolidated-federal-funds-report (1 resource),
 annual-survey-of-state-and-local-government-finances (1 resource),
 local-area-unemployment-statistics (2 resources)]

>>> packages[0]
annual-survey-of-school-system-finances (1 resource)

Otherwise, specify the package name to be retrieved.

>>> package = store.get('california-unemployment-statistics')
>>> package
Resource ID: ffd05307-4528-4d15-a370-c16222119227
Resource Name: Comma Separated Values File
Resource URL: https://data.lacity.org/api/views/5zrb-xqhf/rows.csv?accessType=DOWNLOAD
Format: CSV, Size: None

A package has resources (files) which contains actual data. You use get method to retrieve the resource.

>>> resource = package.get('ffd05307-4528-4d15-a370-c16222119227')
>>> resource
Resource ID: ffd05307-4528-4d15-a370-c16222119227
Resource Name: Comma Separated Values File
Resource URL: https://data.lacity.org/api/views/5zrb-xqhf/rows.csv?accessType=DOWNLOAD
Format: CSV, Size: None

Once you get the resource, use read method to read data as pandas DataFrame.

>>> df = resource.read()
>>> df.head()
   Year Period                Area   Unemployment Rate  Labor Force  \
0  2013    Jan          California               10.4%     18556500
1  2013    Jan  Los Angeles County               10.9%      4891500
2  2013    Jan    Los Angeles City                 12%      1915600
3  2013    Feb          California  9.699999999999999%     18648300
4  2013    Feb  Los Angeles County               10.3%      4924000

   Employment  Unemployment Adjusted Preliminary
0    16631900       1924600  Not Adj  Not Prelim
1     4357800        533800  Not Adj  Not Prelim
2     1684800        230800  Not Adj  Not Prelim
3    16835900       1812400  Not Adj  Not Prelim
4     4418000        506000  Not Adj  Not Prelim

Or you can get raw data by specifying raw=True.

>>> raw = resource.read(raw=True)
>>> raw[:100]
'Year,Period,Area,Unemployment Rate,Labor Force,Employment,Unemployment,Adjusted,Preliminary\n2013,Jan'

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyopendata-0.0.2.tar.gz (28.3 kB view details)

Uploaded Source

File details

Details for the file pyopendata-0.0.2.tar.gz.

File metadata

  • Download URL: pyopendata-0.0.2.tar.gz
  • Upload date:
  • Size: 28.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for pyopendata-0.0.2.tar.gz
Algorithm Hash digest
SHA256 5a2cf9f8d86273d8f443ce938b9e7c7be6c5807de11577709704ba2921f4d4cb
MD5 5dc897984a5e7db9adb97af55c5a9c7a
BLAKE2b-256 6ca45a21d96b0344987bb3ae892de15d84bc582107a1cb43c1ba18cfb1911b5f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page