Skip to main content

Python bindings for NOAA's National Centers for Environomental Information webservices

Project description

This module provides tools to request data from the Climate Data Online webservices provided by NOAA’s National Centers for Environmental information (formerly the National Center for Climate Data). Install with:

pip install pyncei

Getting started

To use the NCEI webservices, you’ll need a token. The token is a 32-character string; users can request one here. Pass the token to pyncei.NCEIReader() to get started:

from pyncei.NCEIReader import NCEIReader

token = 'AnExampleTokenFromTheNCEIWebsite'
ncei = NCEIReader(token)

NCEIReader includes functions corresponding to each of the endpoints described on the CDO website. Query parameters specified by CDO can be passed as arguments:

ncei.get_stations(location='FIPS:11')
ncei.get_data(dataset='GHCND',
              station=['COOP:010957'],
              datatype=['TMIN','TMAX'],
              startdate='2015-03-01',
              enddate='2016-03-01')

The table below provides some information about the different endpoints. More information about query parameters for each endpoint is available at the NCO website.

NCO Endpoint

NCO Query Parameter

Argument

datasets

datasetid

dataset

datacategories

datacategoryid

datacategory

datatypes

datatypeid

datatype

locationcategories

locationcategoryid

locationcategory

locations

locationid

location

stations

stationid

station

data

Note that id fields used by CDO have been renamed here. For example, datasetid has been renamed dataset, and locationid has been renamed location. Unlike CDO, which accepts only ids, NCEIReader will accept either ids or name strings. If names are used, NCEIReader attempts to map the name strings to valid id using NCEIReader.map_term(), called manually here:

ncei.map_term('District of Columbia', 'locations')

('FIPS:11', True)

When the mapping function fails to find an exact match, it throws an exception containing a list of similar values that can be used to refine the original query. You can also search the available terms for each endpoint using NCEIReader.find_in_endpoint():

ncei.find_in_endpoint('District of Columbia', 'locations')

['FIPS:11 => District of Columbia',
 'FIPS:11001 => District of Columbia County, DC']

Or NCEIReader.find_all():

ncei.find_all('temperature')

[('datacategories', 'ANNTEMP', 'Annual Temperature'),
 ('datacategories', 'AUTEMP', 'Autumn Temperature'),...]

You can search by city, state, zip code, data type, etc. If the search term is None, NCEIReader.find_in_endpoint() will list ALL available ids for the given endpoint.

The NCEIReader.find_all() function searches across all endpoints and can be useful in locating a specific dataset or data type if you have no idea what’s available or where to look.

The mapping functions uses a set of .csv files included with the package. These files can be updated using the NCEIReader.refresh_lookup() function:

ncei.refresh_lookups()

Queries are cached for one day by default. Users can change this behavior using the cache parameter when initializing an NCEIReader object. This parameter specifies the number of seconds pages should persist in the cache; a value of zero disables the cache entirely.

Example: Return data from a station

import csv
from datetime import date

from pyncei.NCEIReader import NCEIReader


# Initialize NCEIReader object using your token string
ncei = NCEIReader('AnExampleTokenFromTheNCEIWebsite')
ncei.debug = True  # this flag produces verbose output

# Set the parameters you're looking for. You can use ncei.find_all() or
# ncei_find_in_endpoint() to search the available parameters if you don't
# know what to use.
mindate = '1966-01-01'  # either yyyy-mm-dd or a datetime object
maxdate = '2015-12-31'
datatypes = ['TMIN', 'TMAX']
dataset = 'GHCND'

# You can manually verify parameters if you're so inclined
for datatype in datatypes:
    ncei.map_name(datatype, 'datatypes')

# Get all DC stations operating between mindate and maxdate. The date
# parameters in station queries are a little odd. According to the docs,
# queries will return stations with data from on/before the enddate and
# on/after the startdate. If both parameters are included, the result set
# seems to include all stations that EITHER have data from on/before the
# startdate OR have data on/after the enddate.
stations = ncei.get_stations(location='District of Columbia',
                             dataset=dataset,
                             datatype=datatypes,
                             enddate=mindate)

# Filter out stations no longer operating using maxdate
stations = [station for station in stations
            if station['maxdate'] >= maxdate]

# Find the station with the best data coverage in the result set
stations.sort(key=lambda s:s['datacoverage'], reverse=True)
station = stations[0]
minyear = int(station['mindate'][:4])

# Get temperature data for the the lifetime of the station. Note that for the
# data endpoint, you can't request more than one year's worth of data at a
# time.
year = date.today().year - 1
results = []
while year >= minyear:
    results.extend(ncei.get_data(dataset=dataset,
                                 station=station['id'],
                                 datatype=datatypes,
                                 startdate=date(year, 1, 1),
                                 enddate=date(year, 12, 31)))
    year -= 1

# Write results to csv
fn = station['id'].replace(':', '') + '.csv'
with open(fn, 'wb') as f:
    writer = csv.writer(f, delimiter=',', quotechar='"')
    keys = ('date', 'datatype', 'value')
    writer.writerow(keys)
    for row in results:
        row['date'] = row['date'].split('T')[0].replace('-', '')
        writer.writerow([row[key] for key in keys])

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyncei-0.1.tar.gz (11.2 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page