Skip to main content

Unofficial Python SDK & CLI for Source Coop

Project description

Unofficial Source Coop Python SDK and CLI

License

An unofficial Python SDK and command-line interface for Source Coop, a platform for sharing geospatial data.

DISCLAIMER: This project is not officially affiliated with, authorized by, endorsed by, or in any way connected to Source Coop or Radiant Earth. This is an independent project created by Samapriya Roy to simplify interaction with Source Coop's public APIs and services.

Features

This package serves both as a software development kit (SDK) for Python applications and as a command-line interface (CLI):

  • As an SDK: Import the package in your Python applications to interact with Source Coop programmatically
  • As a CLI: Use the command-line interface for quick operations without writing code

The package provides functionality for:

  • Authenticating with Source Coop
  • Browsing and searching repositories
  • Retrieving account and organization profiles
  • Listing and summarizing repository contents
  • Downloading data from repositories with concurrent and multipart support

Installation

Install the package using pip to get both the Python SDK and the command-line tool:

pip install source-coop

Command-Line Interface

The CLI provides several commands to interact with Source Coop. Each command has specific options that can be discovered using the --help flag.

Authentication

Authentication is required for accessing certain features of Source Coop. The CLI provides commands to log in and verify your identity.

# Log in to Source Coop (will prompt for credentials)
source-coop login

# Check who you're logged in as
source-coop whoami

When you run the login command, you'll be prompted for your Source Coop email and password. Your credentials are securely saved as cookies in your user directory for future use. The whoami command helps verify that you're logged in correctly by displaying your profile information.

Exploring Repositories

Source Coop hosts numerous geospatial data repositories. These commands help you discover and search through available datasets.

# List recent repositories
source-coop repos

# List featured repositories
source-coop repos --featured

# Search for repositories
source-coop repos --search "sentinel"

# Export repository list to CSV
source-coop repos --export csv --output my-repos.csv

The repos command without arguments lists recent repositories with details like title, account, tags, and publication date. The --featured flag shows highlighted repositories chosen by Source Coop. Use the search functionality to find repositories containing specific keywords. You can export the results to CSV, JSON, or Parquet format for further analysis or record-keeping.

Account Information

These commands provide details about user profiles and organization memberships on Source Coop.

# Get profile information for a user or organization
source-coop profile username

# List members of an organization
source-coop members organization_name

The profile command displays information about a specific user or organization, including their bio and associated repositories. For organizations, the members command reveals the organizational structure by listing all members and their roles, which is useful for understanding who contributes to a specific group's repositories.

Repository Operations

Once you've found repositories of interest, these commands help you explore their contents and download data. You can use either web URLs or S3 URLs interchangeably - the CLI will handle the conversion for you.

# Summarize repository contents (using web URL)
source-coop summarize https://source.coop/account/repository

# Filter by file type (using S3 URL)
source-coop summarize s3://account/repository --file-type .tif

# Download repository contents (using web URL)
source-coop download https://source.coop/account/repository

# Download with custom options (using S3 URL)
source-coop download s3://account/repository \
  --file-type .csv \
  --output-dir ./data \
  --threads 20 \
  --multipart 16

The summarize command provides a quick overview of a repository's contents, showing file counts, sizes, and types without downloading anything. This helps you understand what's in a repository before committing to a download. The command accepts both web URLs and S3-style URLs.

The download command efficiently retrieves files from a repository. It supports concurrent downloads to speed up the process and multipart downloads for large files. You can filter by file type to download only the data you need and specify an output directory to organize your downloads.

CLI Help

For detailed help on any command:

source-coop --help
source-coop command --help  # e.g., source-coop download --help

Each command has additional options not covered here. Use the help system to discover all available features.

Python SDK Usage

The SDK provides programmatic access to Source Coop's features, allowing you to incorporate them into your own Python applications.

Client Setup

The main entry point for the SDK is the SourceCoopClient class, which handles both API and S3 operations.

from source_coop import SourceCoopClient

# Create a client (will use stored credentials if available)
client = SourceCoopClient()

# Check if authenticated
if client.is_authenticated():
    print("Client is authenticated")
else:
    print("Client is not authenticated, use login_to_source_coop() first")

When you initialize a client, it automatically looks for stored cookies from previous logins. The is_authenticated() method helps you verify whether the client has valid authentication credentials before attempting operations that might require them. This is useful for scripts that need to check authentication status before proceeding.

Authentication

If your application needs to log in programmatically, you can use the authentication functions directly.

from source_coop import login_to_source_coop

# Log in and save credentials
cookies = login_to_source_coop("email@example.com", "password")

# Create a client with the cookies
from source_coop import SourceCoopClient
client = SourceCoopClient(cookies)

The login_to_source_coop function handles the authentication flow with Source Coop's servers and returns cookies that can be used to create an authenticated client. These cookies are also saved to your filesystem for future use. This approach is particularly useful for automated scripts or applications where you want to handle authentication programmatically rather than through the CLI.

Working with Repositories

The API client provides methods for discovering and searching repositories.

# Get repositories
repos_data = client.api.get_repositories(limit=20)

# Search repositories
search_results = client.api.get_repositories(search="landsat")

# Get featured repositories
featured = client.api.get_repositories(featured=True)

These methods return structured data about repositories, including metadata like titles, descriptions, tags, and publication dates. The limit parameter controls how many results are returned, which is useful for pagination. The search parameter filters repositories based on keywords, helping you find relevant datasets. The featured flag returns repositories that have been highlighted by Source Coop for their significance or quality.

Accessing Account Information

The API client also provides methods for accessing account information.

# Get your own account info
profile = client.api.whoami()

# Get another account's profile
user_profile = client.api.get_profile("username")

# Get organization members
members = client.api.get_members("organization_name")

The whoami method returns information about your own account, which is useful for verifying authentication and accessing your profile data. The get_profile method retrieves public information about any account, which can help you learn more about data providers. For organizational accounts, the get_members method shows the membership structure, revealing who contributes to an organization's repositories.

Working with S3 Data

The S3 client provides methods for interacting with the actual data in repositories.

# List objects in a repository
s3_url = "s3://account/repository"
s3_objects = client.s3.list_objects(s3_url)

# List with filtering by file type
tiff_objects = client.s3.list_objects(s3_url, file_type=".tif")

# Get summary statistics
objects, summary = client.s3.list_objects_with_summary(s3_url)
print(f"Total files: {summary['total_files']}")
print(f"Total size: {summary['total_size_human']}")

The list_objects method retrieves detailed information about files in a repository, including their names, sizes, and download URLs. You can filter by file extension to focus on specific data types. The list_objects_with_summary method provides additional aggregated statistics about the repository, such as total file count, total size, and a breakdown by file type. This is useful for understanding the composition of a repository before downloading any files.

URL Handling (Web URLs and S3 URLs)

Source Coop supports both web URLs (https://source.coop/account/repository) and S3-style URLs (s3://account/repository). The SDK handles both formats seamlessly, so you can use whichever is more convenient.

from source_coop import SourceCoopClient

client = SourceCoopClient()

# You can use regular web URLs directly with most methods
repository_url = "https://source.coop/account/repository"
objects, summary = client.s3.list_objects_with_summary(repository_url)

# Or you can use S3-style URLs if you prefer
s3_url = "s3://account/repository"
objects, summary = client.s3.list_objects_with_summary(s3_url)

# If you need to convert between formats for any reason:
from source_coop.s3 import SourceCoopS3
converted_url = SourceCoopS3.convert_repo_url_to_s3_url(repository_url)

Most SDK methods automatically handle URL conversion internally, so you can provide either format. This makes the SDK more intuitive to use since you can simply copy and paste URLs from your browser. Behind the scenes, the SDK converts web URLs to the S3 format when necessary, but this is transparent to you as a user.

Real-World Example: Downloading PMTiles Files

This complete example demonstrates downloading PMTiles files from a real public repository. The code uses web URLs directly without needing to manually convert them to S3 URLs.

from pathlib import Path

from source_coop import SourceCoopClient
from source_coop.commands.download import download_s3_objects


def main():
    # Repository URL - this is a real public repository with PMTiles data
    repository_url = "https://source.coop/fiboa/japan"

    # Output directory
    output_dir = "downloaded-pmtiles"

    # Create a client
    client = SourceCoopClient()

    # Check if we're authenticated
    if not client.is_authenticated():
        print("Not authenticated. Please run 'source-coop login' first")
        return

    print(f"Downloading PMTiles from: {repository_url}")

    # List objects and get summary, filtering for .pmtiles files
    # Note: We're using the web URL directly here - no manual conversion needed!
    print("Listing PMTiles files...")
    objects, summary = client.s3.list_objects_with_summary(repository_url, file_type=".pmtiles")

    if not objects:
        print("No PMTiles files found in the repository")
        return

    # Display summary
    total_files = summary['total_files']
    total_size = summary['total_size_human']
    print(f"Found {total_files} PMTiles files (Total size: {total_size})")

    # Display first few files
    if objects:
        print("\nFiles to download:")
        for i, obj in enumerate(objects[:5]):
            print(f"- {obj['key']} ({client.s3.human_readable_size(obj['size'])})")
        if len(objects) > 5:
            print(f"- ... and {len(objects) - 5} more files")

    # Create output directory if it doesn't exist
    Path(output_dir).mkdir(parents=True, exist_ok=True)

    # Download the files
    successful = download_s3_objects(
        objects,
        output_dir,
        multipart_count=8,  # Use multipart downloads for large files
        max_concurrent=10   # Use up to 10 concurrent downloads
    )

    print(f"Successfully downloaded {successful} PMTiles files to {output_dir}")


if __name__ == "__main__":
    main()

This example demonstrates several key features:

  1. Direct use of web URLs: The code uses a standard web URL directly with SDK methods
  2. File type filtering: It only lists files with the .pmtiles extension
  3. User-friendly output: It shows a summary and previews files before downloading
  4. Efficient downloading: It uses multipart downloading for large files and handles up to 10 concurrent downloads

You can run this exact code to download real PMTiles files from a public repository. The SDK handles all the complexities of URL conversion, API interactions, and efficient downloading behind the scenes.

The download_s3_objects function handles the complex process of downloading multiple files concurrently while showing progress bars. The max_concurrent parameter controls how many files are downloaded simultaneously, while multipart_count determines how many parts large files are split into for parallel downloading. This approach significantly speeds up downloads, especially for repositories with many files or very large files.

Configuration

The package stores configuration in the following location:

  • Authentication cookies: ~/.config/source-coop/cookies.json

This file contains authentication tokens needed to access Source Coop without re-entering your credentials. The cookies are securely stored and automatically used when you create a new client.

License

This project is licensed under the Apache License 2.0 - see the file headers for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Acknowledgments

  • This package is developed by Samapriya Roy
  • Thanks to Source Coop for providing a platform for sharing geospatial data

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

source_coop-0.1.0.tar.gz (31.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

source_coop-0.1.0-py3-none-any.whl (39.8 kB view details)

Uploaded Python 3

File details

Details for the file source_coop-0.1.0.tar.gz.

File metadata

  • Download URL: source_coop-0.1.0.tar.gz
  • Upload date:
  • Size: 31.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.6

File hashes

Hashes for source_coop-0.1.0.tar.gz
Algorithm Hash digest
SHA256 27e05cb6a9f7484f0b7eabf1305a87a91bc4460fdd24bb698b8c79891edee8aa
MD5 f94fb5c8ac57c004c8b82dbc54fd531d
BLAKE2b-256 34edf18052ec48495f04c5d98cbe15ad0bda723f3071f9741f2c070986507937

See more details on using hashes here.

File details

Details for the file source_coop-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: source_coop-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 39.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.6

File hashes

Hashes for source_coop-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4bec3744cf80c1eeb5a78b60c23d2c65ef85ba7d5573ac6dab6379204e45138c
MD5 361af0e8c6eeb5d63dba1ddd3b88b61d
BLAKE2b-256 985347f3f0883bb1195189a9e6b89e8ff41b8c4e3be9b3d2ee077abc5c744729

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page