Skip to main content

A Python client for accessing USDA Soil Data Access (SDA) web service

Project description

soildb

PyPI version License: MIT

Python client for the USDA-NRCS Soil Data Access (SDA) web service and other National Cooperative Soil Survey data sources.

Overview

soildb provides Python access to:

  • Soil Survey Data: USDA Soil Data Access (SDA) web service for SSURGO/STATSGO
  • Laboratory Data: NCSS Kellogg Soil Survey Laboratory (KSSL) characterization data
  • Bulk Downloads: Complete SSURGO/STATSGO datasets from Web Soil Survey
  • Multiple Backends: Query data from SDA web service, local SQLite snapshots, or GeoPackage files

Query soil survey data via web service or local database, export to pandas/polars DataFrames, and handle spatial queries.

Installation

pip install soildb

For spatial functionality:

pip install soildb[spatial]

For all optional features support:

pip install soildb[all]

Features

Soil Survey Data (SDA)

  • Query SSURGO/STATSGO data from NRCS Soil Data Access web service
  • Build custom SQL queries with fluent interface
  • Spatial queries with points, bounding boxes, and polygons
  • Bulk data fetching with automatic pagination and chunking
  • Export to pandas and polars DataFrames

Laboratory Characterization Data

  • Access NCSS Kellogg Soil Survey Laboratory (KSSL) pedon data
  • Query via SDA web service or local SQLite snapshot databases
  • Full horizon-level data with lab analyses
  • Structured object models for nested pedon data
  • Support for flexible column selection

Web Soil Survey Downloads

  • Download complete SSURGO datasets as ZIP files
  • Download STATSGO (general soil map) data
  • Concurrent downloads with progress tracking
  • Automatic file extraction and organization
  • State-wide and custom area selections

Multi-Backend Support

  • Query from SDA web service (live data)
  • Query from local SQLite snapshots (offline analysis)
  • Support for GeoPackage files with spatial features
  • Unified interface across all backends
  • Async I/O for high performance and concurrency

Quick Start

Query Builder

Build and execute custom SQL queries with the fluent interface:

from soildb import Query

query = (Query()
        .select("mukey", "muname", "musym")
        .from_("mapunit")
        .inner_join("legend", "mapunit.lkey = legend.lkey")
        .where("areasymbol = 'IA109'")
        .limit(5))

# Inspect the generated SQL
print(query.to_sql())

# Execute and get results
import asyncio
from soildb import SDAClient

async def main():
    result = await SDAClient().execute(query)
    return result.to_pandas()

df = asyncio.run(main())
print(df.head())
SELECT TOP 5 mukey, muname, musym FROM mapunit INNER JOIN legend ON mapunit.lkey = legend.lkey WHERE areasymbol = 'IA109'
    mukey                                             muname  musym
0  408337  Colo silty clay loam, channeled, 0 to 2 percen...   1133
1  408339        Colo silty clay loam, 0 to 2 percent slopes    133
2  408340        Colo silty clay loam, 2 to 4 percent slopes   133B
3  408345  Clarion loam, 9 to 14 percent slopes, moderate...  138D2
4  408348          Harpster silt loam, 0 to 2 percent slopes   1595

Async vs Synchronous Usage

All soildb functions have both async and synchronous versions. For most use cases, the synchronous .sync() version is simpler and easier to use.

Synchronous Usage

For simple scripts and interactive use, soildb provides synchronous versions of all async functions:

from soildb import get_mapunit_by_areasymbol

# Synchronous usage - no async/await needed!
mapunits = get_mapunit_by_areasymbol.sync("IA109")
df = mapunits.to_pandas()
print(f"Found {len(df)} map units")
df.head()
Found 80 map units
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { vertical-align: right; } </style>
mukey musym muname mukind muacres areasymbol areaname
0 408333 1032 Spicer silty clay loam, 0 to 2 percent slopes Consociation 1834 IA109 Kossuth County, Iowa
1 408334 107 Webster clay loam, 0 to 2 percent slopes Consociation 46882 IA109 Kossuth County, Iowa
2 408335 108 Wadena loam, 0 to 2 percent slopes Consociation 807 IA109 Kossuth County, Iowa
3 408336 108B Wadena loam, 2 to 6 percent slopes Consociation 1103 IA109 Kossuth County, Iowa
4 408337 1133 Colo silty clay loam, channeled, 0 to 2 percen... Consociation 1403 IA109 Kossuth County, Iowa

The .sync methods automatically manage SDA client connections for you. For multiple calls, consider reusing a client:

from soildb import SDAClient, get_mapunit_by_areasymbol

client = SDAClient()
mapunits1 = get_mapunit_by_areasymbol.sync("IA109", client=client)
mapunits2 = get_mapunit_by_areasymbol.sync("IA113", client=client)
client.close()

Convenience Functions

soildb provides high-level functions for common tasks:

from soildb import get_mapunit_by_areasymbol

mapunits = get_mapunit_by_areasymbol.sync("IA109")
df = mapunits.to_pandas()
print(f"Found {len(df)} map units")
df.head()
Found 80 map units
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style>
mukey musym muname mukind muacres areasymbol areaname
0 408333 1032 Spicer silty clay loam, 0 to 2 percent slopes Consociation 1834 IA109 Kossuth County, Iowa
1 408334 107 Webster clay loam, 0 to 2 percent slopes Consociation 46882 IA109 Kossuth County, Iowa
2 408335 108 Wadena loam, 0 to 2 percent slopes Consociation 807 IA109 Kossuth County, Iowa
3 408336 108B Wadena loam, 2 to 6 percent slopes Consociation 1103 IA109 Kossuth County, Iowa
4 408337 1133 Colo silty clay loam, channeled, 0 to 2 percen... Consociation 1403 IA109 Kossuth County, Iowa

If you have suggestions for new convenience functions please file a feature request on GitHub.

Spatial Queries

Query soil data by location with points, bounding boxes, or polygons:

from soildb import spatial_query

# Point query
response = spatial_query.sync(
    geometry="POINT(-93.6 42.0)",
    table="mupolygon"
)
df = response.to_pandas()
print(f"Point query found {len(df)} results")
Point query found 1 results
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style>
mukey areasymbol musym nationalmusym muname mukind
0 411278 IA169 1314 fsz1 Hanlon-Spillville complex, channeled, 0 to 2 p... Complex

Bulk Data Fetching

Retrieve large datasets efficiently with automatic pagination and chunking:

from soildb import fetch_by_keys, get_mukey_by_areasymbol

# Get mukeys for survey areas
areas = ["IA109", "IA113", "IA117"]
all_mukeys = get_mukey_by_areasymbol.sync(areas)

print(f"Found {len(all_mukeys)} mukeys across {len(areas)} areas")

# Fetch components in chunks automatically
response = fetch_by_keys.sync(
    all_mukeys, 
    "component", 
    key_column="mukey", 
    chunk_size=100,
    columns=["mukey", "cokey", "compname", "localphase", "comppct_r"]
)
df = response.to_pandas()
print(f"Fetched {len(df)} component records")
Found 410 mukeys across 3 areas
Fetching 410 keys in 5 chunks of 100
Fetched 1067 component records
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style>
mukey cokey compname localphase comppct_r
0 408333 25562547 Kingston <NA> 2
1 408333 25562548 Okoboji <NA> 5
2 408333 25562549 Spicer <NA> 90
3 408333 25562550 Madelia <NA> 3
4 408334 25562837 Okoboji <NA> 5
5 408334 25562838 Glencoe <NA> 3
6 408334 25562839 Canisteo <NA> 2
7 408334 25562840 Webster <NA> 85
8 408334 25562841 Nicollet <NA> 5
9 408335 25562135 Biscay <NA> 1

The component table has a hierarchical relationship:

  • mukey (map unit key) is the parent
  • cokey (component key) is the child

So when fetching components, you typically want to filter by mukey to get all components for specific map units.

Use the fetch_by_keys() function with the "mukey" as the key_column to achieve this with automatic pagination over chunks with 100 rows each (or specify your own chunk_size).

Bulk Downloads (Web Soil Survey)

Download complete SSURGO and STATSGO datasets as ZIP files from the USDA Web Soil Survey portal:

from soildb import download_wss

# Download specific survey areas
paths = download_wss.sync(
    areasymbols=["IA109", "IA113"],
    dest_dir="./ssurgo_data",
    extract=True
)
print(f"Downloaded {len(paths)} survey areas")

# Download all survey areas for a state
paths = download_wss.sync(
    where_clause="areasymbol LIKE 'IA%'",
    dest_dir="./iowa_ssurgo",
    extract=True,
    remove_zip=True  # Clean up ZIP files after extraction
)

# Download STATSGO (general soil map) data
paths = download_wss.sync(
    areasymbols=["IA"],
    db="STATSGO",
    dest_dir="./iowa_statsgo",
    extract=True
)

Each extracted survey area directory contains:

  • tabular/ - Pipe-delimited TXT files with soil data tables
  • spatial/ - ESRI shapefiles with map unit polygons and boundaries

Use Cases:

  • SDA: Live queries, filtered data, programmatic access to current data
  • WSS Downloads: Complete offline datasets, bulk data for analysis, static snapshots updated annually

Async Usage

For performance-critical applications, use async functions directly with concurrent requests:

import asyncio
from soildb import fetch_by_keys, get_mukey_by_areasymbol

async def concurrent_example():
    # Get mukeys for multiple areas concurrently
    areas = ["IA109", "IA113", "IA117"]
    all_mukeys = await get_mukey_by_areasymbol(areas)
    
    # Fetch components concurrently with automatic pagination
    response = await fetch_by_keys(
        all_mukeys,
        "component",
        key_column="mukey",
        chunk_size=100,
        columns=["mukey", "cokey", "compname", "comppct_r"]
    )
    return response.to_pandas()

# Run async function
df = asyncio.run(concurrent_example())

For more async patterns, see the Async Programming Guide.

Examples

See the examples/ directory and documentation for detailed usage patterns.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

soildb-0.7.0.tar.gz (282.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

soildb-0.7.0-py3-none-any.whl (174.7 kB view details)

Uploaded Python 3

File details

Details for the file soildb-0.7.0.tar.gz.

File metadata

  • Download URL: soildb-0.7.0.tar.gz
  • Upload date:
  • Size: 282.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for soildb-0.7.0.tar.gz
Algorithm Hash digest
SHA256 86e8e7c9161fee55d9388d4cc8468573e26614b66aa265fe4d7ec18a271c5ac3
MD5 469fd5d1ddf26e70ef35992aefca4b15
BLAKE2b-256 2d498bb5597b2fce9caa7d7b7f1990779137d50bf44fd6d4b85ef7308fb908e3

See more details on using hashes here.

Provenance

The following attestation bundles were made for soildb-0.7.0.tar.gz:

Publisher: pypi-release.yml on brownag/py-soildb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file soildb-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: soildb-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 174.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for soildb-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a9de9c645f4d70b493f2a95c0da117f6bb1dd4b21344f40468832620376a29bc
MD5 bf96da75cbb3ead63d54f6f5b2abc451
BLAKE2b-256 0015bac81d8b8151468ce22592ad61948a2b32809f80f98db08b83713d7b9d4e

See more details on using hashes here.

Provenance

The following attestation bundles were made for soildb-0.7.0-py3-none-any.whl:

Publisher: pypi-release.yml on brownag/py-soildb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page