Skip to main content

Python utilities for working with inaturalist-open-data

Project description

pyinaturalist-open-data

Build status PyPI PyPI - Python Versions PyPI - Format

This is a work in progress and not yet complete!

pyinaturalist-open-data is a python library and CLI tool for working with inaturalist-open-data. Its goal is to make it easy to import and use this dataset in a python application backed by any SQLAlchemy-compatible database engine (SQLite by default), or simply for local data exploration.

See the CLI in action here or on asciinema: asciicast

Installation

Install with pip:

pip install pyinaturalist-open-data

Or for local development:

git clone https://github.com/JWCook/pyinaturalist-open-data.git
cd pyinaturalist-open-data
pip install poetry && poetry install

Usage

This package provides the command pynat. See --help for commands and options:

Usage: pynat [OPTIONS] COMMAND [ARGS]...

  Commands for working with inaturalist open data

Options:
  -v, --verbose  Show more detailed output
  --help         Show this message and exit.

Commands:
  db    Load contents of CSV files into a database
  dl    Download and extract inaturalist open data archive
  init  Just create tables (if they don't already exist) without populating...
  load  Download and load all data into a database.

Run everything

The simplest command is load, which runs all steps:

  1. Download and extract the dataset
  2. Create database tables and indices
  3. Load the data into the database

Options:

Usage: pynat load [OPTIONS]

Options:
  -d, --download-dir TEXT  Alternate path for downloads
  -u, --uri TEXT           Alternate database URI to connect to
  --help                   Show this message and exit.

By default, this will create a new SQLite database. Alternatively, you can provide a URI for any supported database.

Run individual steps

Other commands are available if you only one to run one of those steps at a time.

dl command:

Usage: pynat dl [OPTIONS]

  Download and extract all files in the inaturalist open data archive

Options:
  -d, --download-dir TEXT  Alternate path for downloads
  --help                   Show this message and exit

Note: Both dl and load will reuse local data if already exists and is up to date.

db command:

Usage: pynat db [OPTIONS]

  Load contents of CSV files into a database. Also creates tables and
  indexes, if they don't already exist.

Options:
  -d, --download-dir TEXT         Alternate path for downloads
  -i, --init                      Just initialize the database with tables
                                  + indexes without loading data
  -t, --tables [observation|photo|taxon|user]
                                  Load only these specific tables
  -u, --uri TEXT                  Alternate database URI to connect to

  --help                          Show this message and exit.

Note: This can take a long time to run. Depending on the database type, you will likely get better performance with database-specific bulk loading tools (for example, psql with COPY for PostgreSQL)

Python package

To use as a python package instead of a CLI tool:

from pyinaturalist_open_data import download_metadata, load_all

download_metadata()
load_all()

Full package documentation on readthedocs will be coming soon.

Planned features

Some features I would ideally like to add to this:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyinaturalist-open-data-0.1.1.tar.gz (11.6 kB view details)

Uploaded Source

Built Distribution

pyinaturalist_open_data-0.1.1-py3-none-any.whl (15.6 kB view details)

Uploaded Python 3

File details

Details for the file pyinaturalist-open-data-0.1.1.tar.gz.

File metadata

  • Download URL: pyinaturalist-open-data-0.1.1.tar.gz
  • Upload date:
  • Size: 11.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.6 CPython/3.9.5 Linux/5.4.0-1047-azure

File hashes

Hashes for pyinaturalist-open-data-0.1.1.tar.gz
Algorithm Hash digest
SHA256 bda40cbec21cb80076deba0651c696824614141a5b85f3f634f3d0b50bf2a689
MD5 92550b25937c1c54051715deb17a7f4e
BLAKE2b-256 9d0372888af6a897ea6a93be6a86dc383efc3361a28accffc3bae3289778a41f

See more details on using hashes here.

File details

Details for the file pyinaturalist_open_data-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for pyinaturalist_open_data-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b896272cc0fcde98fa6c42b7abb8e488778ab4246c95184c01819951fa1c4750
MD5 50ee13fe348f03aadd7d6f5fc2b43a94
BLAKE2b-256 936ebdcdd3639d816abecf40213729fce1b8c8617d5fcd89f8d3c92462923359

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page