Skip to main content

A Python client for accessing USDA Soil Data Access (SDA) web service

Project description

soildb

PyPI version License: MIT

Python client for the USDA-NRCS Soil Data Access (SDA) web service and other National Cooperative Soil Survey data sources.

Overview

soildb provides Python access to the USDA Soil Data Access (SDA) web service https://sdmdataaccess.nrcs.usda.gov/.

Query soil survey data, export to pandas/polars DataFrames, and handle spatial queries.

Installation

pip install soildb

For spatial functionality:

pip install soildb[spatial]

For all optional features support:

pip install soildb[all]

Features

  • Query soil survey data from SDA
  • Export to pandas and polars DataFrames
  • Build custom SQL queries with fluent interface
  • Spatial queries with points, bounding boxes, and polygons
  • Bulk data fetching with automatic pagination
  • Async I/O for high performance and concurrency

Quick Start

Query Builder

This is a basic example of building a custom query and getting the results:

from soildb import Query
    
query = (Query()
        .select("mukey", "muname", "musym")
        .from_("mapunit")
        .inner_join("legend", "mapunit.lkey = legend.lkey")
        .where("areasymbol = 'IA109'")
        .limit(5))
    
# inspect query
print(query.to_sql())

result = await soildb.SDAClient().execute(query)

df = result.to_pandas()
print(df.head())
SELECT TOP 5 mukey, muname, musym FROM mapunit INNER JOIN legend ON mapunit.lkey = legend.lkey WHERE areasymbol = 'IA109'
    mukey                                             muname  musym
0  408337  Colo silty clay loam, channeled, 0 to 2 percen...   1133
1  408339        Colo silty clay loam, 0 to 2 percent slopes    133
2  408340        Colo silty clay loam, 2 to 4 percent slopes   133B
3  408345  Clarion loam, 9 to 14 percent slopes, moderate...  138D2
4  408348          Harpster silt loam, 0 to 2 percent slopes   1595

Async Setup

You may have noticed that we need to await the query execution result.

All soildb functions are async. Here's how to run them in different environments like Jupyter notebooks, VSCode, or regular Python scripts.

Basic Async Execution

import asyncio
import soildb

async def main():
    # Your async code here
    mapunits = await soildb.get_mapunit_by_areasymbol("IA109")
    df = mapunits.to_pandas()
    return df

# Handle different environments
try:
    # Check if there's already an event loop (Jupyter, etc.)
    loop = asyncio.get_running_loop()
    import nest_asyncio
    nest_asyncio.apply()
    result = loop.run_until_complete(main())
except RuntimeError:
    # No existing loop, use asyncio.run()
    result = asyncio.run(main())

result

For comprehensive async usage, see the Async Programming Guide.

Synchronous Usage

For simple scripts and interactive use, soildb provides synchronous versions of all async functions:

from soildb import get_mapunit_by_areasymbol

# Synchronous usage - no async/await needed!
mapunits = get_mapunit_by_areasymbol.sync("IA109")
df = mapunits.to_pandas()
print(f"Found {len(df)} map units")
df.head()
Found 80 map units
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { vertical-align: right; } </style>
mukey musym muname mukind muacres areasymbol areaname
0 408333 1032 Spicer silty clay loam, 0 to 2 percent slopes Consociation 1834 IA109 Kossuth County, Iowa
1 408334 107 Webster clay loam, 0 to 2 percent slopes Consociation 46882 IA109 Kossuth County, Iowa
2 408335 108 Wadena loam, 0 to 2 percent slopes Consociation 807 IA109 Kossuth County, Iowa
3 408336 108B Wadena loam, 2 to 6 percent slopes Consociation 1103 IA109 Kossuth County, Iowa
4 408337 1133 Colo silty clay loam, channeled, 0 to 2 percen... Consociation 1403 IA109 Kossuth County, Iowa

The .sync methods automatically manage SDA client connections for you. For multiple calls, consider reusing a client:

from soildb import SDAClient, get_mapunit_by_areasymbol

client = SDAClient()
mapunits1 = get_mapunit_by_areasymbol.sync("IA109", client=client)
mapunits2 = get_mapunit_by_areasymbol.sync("IA113", client=client)
client.close()

Convenience Functions

soildb provides several high-level functions for common tasks:

mapunits = await soildb.get_mapunit_by_areasymbol("IA109")
df = mapunits.to_pandas()
print(f"Found {len(df)} map units")
df.head()
Found 80 map units
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style>
mukey musym muname mukind muacres areasymbol areaname
0 408333 1032 Spicer silty clay loam, 0 to 2 percent slopes Consociation 1834 IA109 Kossuth County, Iowa
1 408334 107 Webster clay loam, 0 to 2 percent slopes Consociation 46882 IA109 Kossuth County, Iowa
2 408335 108 Wadena loam, 0 to 2 percent slopes Consociation 807 IA109 Kossuth County, Iowa
3 408336 108B Wadena loam, 2 to 6 percent slopes Consociation 1103 IA109 Kossuth County, Iowa
4 408337 1133 Colo silty clay loam, channeled, 0 to 2 percen... Consociation 1403 IA109 Kossuth County, Iowa

If you have suggestions for new convenience functions please file a feature request on GitHub.

Spatial Queries

soildb also offers support for queries by location via spatial_query(). You can specify arbitrary geometry to target several spatial and tabular types of results.

from soildb import spatial_query

# Point query (synchronous)
response = spatial_query.sync(
    geometry="POINT (-93.6 42.0)",
    table="mupolygon",
    spatial_relation="intersects"
)
df = response.to_pandas()
print(f"Point query found {len(df)} results")

# Or async
import asyncio

async def spatial_query_example():
    from soildb import spatial_query
    
    async with soildb.SDAClient() as client:
        response = await spatial_query(
            geometry="POINT (-93.6 42.0)",
            table="mupolygon",
            spatial_relation="intersects",
            client=client
        )
        df = response.to_pandas()
        print(f"Point query found {len(df)} results")
        return df

result = asyncio.run(spatial_query_example())
Point query found 1 results
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style>
mukey areasymbol musym nationalmusym muname mukind
0 411278 IA169 1314 fsz1 Hanlon-Spillville complex, channeled, 0 to 2 p... Complex

Bulk Data Fetching

soildb makes it easy to retrieve large datasets efficiently, using concurrent requests and built-in functions that automatically handle pagination.

from soildb import fetch_by_keys, get_mukey_by_areasymbol

# Get mukeys for multiple areas (synchronous)
areas = ["IA109", "IA113", "IA117"]
all_mukeys = []
for area in areas:
    mukeys = get_mukey_by_areasymbol.sync([area])
    all_mukeys.extend(mukeys)

print(f"Found {len(all_mukeys)} mukeys across {len(areas)} areas")

# Fetch data in chunks automatically (synchronous)
response = fetch_by_keys.sync(
    all_mukeys, 
    "component", 
    key_column="mukey", 
    chunk_size=100,
    columns=["mukey", "cokey", "compname", "localphase", "comppct_r"]
)
df = response.to_pandas()
print(f"Fetched {len(df)} component records")

# Or async
import asyncio

async def bulk_fetch_example():
    from soildb import fetch_by_keys, get_mukey_by_areasymbol
    
    # Get mukeys for multiple areas concurrently
    areas = ["IA109", "IA113", "IA117"]
    mukeys_tasks = [
        get_mukey_by_areasymbol([area]) 
        for area in areas
    ]
    
    # Execute all mukey requests concurrently
    mukeys_results = await asyncio.gather(*mukeys_tasks)
    
    # Flatten the results (each task returns a list)
    all_mukeys = []
    for mukeys in mukeys_results:
        all_mukeys.extend(mukeys)
    
    print(f"Found {len(all_mukeys)} mukeys across {len(areas)} areas")
    
    # Fetch data in chunks automatically
    response = await fetch_by_keys(
        all_mukeys, 
        "component", 
        key_column="mukey", 
        chunk_size=100,
        columns=["mukey", "cokey", "compname", "localphase", "comppct_r"]
    )
    df = response.to_pandas()
    print(f"Fetched {len(df)} component records")
    return df

result = asyncio.run(bulk_fetch_example())
Found 410 mukeys across 3 areas
Fetching 410 keys in 5 chunks of 100
Fetched 1067 component records
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style>
mukey cokey compname localphase comppct_r
0 408333 25562547 Kingston <NA> 2
1 408333 25562548 Okoboji <NA> 5
2 408333 25562549 Spicer <NA> 90
3 408333 25562550 Madelia <NA> 3
4 408334 25562837 Okoboji <NA> 5
5 408334 25562838 Glencoe <NA> 3
6 408334 25562839 Canisteo <NA> 2
7 408334 25562840 Webster <NA> 85
8 408334 25562841 Nicollet <NA> 5
9 408335 25562135 Biscay <NA> 1

The component table has a hierarchical relationship:

  • mukey (map unit key) is the parent
  • cokey (component key) is the child

So when fetching components, you typically want to filter by mukey to get all components for specific map units.

The specialized fetch_component_by_mukey() convenience function handles this, but above we use the lower-level fetch_by_keys() with the "mukey" as the key_column to achieve the same result and demonstrate pagination over chunks with 100 rows each.

Examples

See the examples/ directory and documentation for detailed usage patterns.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

soildb-0.3.0.tar.gz (136.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

soildb-0.3.0-py3-none-any.whl (73.4 kB view details)

Uploaded Python 3

File details

Details for the file soildb-0.3.0.tar.gz.

File metadata

  • Download URL: soildb-0.3.0.tar.gz
  • Upload date:
  • Size: 136.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for soildb-0.3.0.tar.gz
Algorithm Hash digest
SHA256 6dc4ef044327b1fac224a91c16267e5ac89938fd46a98a3420d464571ea525a9
MD5 295ce41e0e7ab196e25a7b741da624fc
BLAKE2b-256 67af1852b12e03ae183c5e7c33e2f4944711355ecd20f891d3740fce6f2359ed

See more details on using hashes here.

Provenance

The following attestation bundles were made for soildb-0.3.0.tar.gz:

Publisher: pypi-release.yml on brownag/py-soildb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file soildb-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: soildb-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 73.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for soildb-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a589ec0f8a4935c9da82130ec0f9ffea6b3c9e2b12815fc1cc757dbeee33d6b6
MD5 ffdb084d0b7d6b25032ed55bc6613904
BLAKE2b-256 7d32da905c3b5ad0de443b0af1165fbeb59a748279ae9bd2a90837b76d5e86f9

See more details on using hashes here.

Provenance

The following attestation bundles were made for soildb-0.3.0-py3-none-any.whl:

Publisher: pypi-release.yml on brownag/py-soildb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page