Skip to main content

Generating and Augmenting Agricultural Synthetic Data in Africa

Project description

SAGDA: Synthetic Agriculture Data in Africa

SAGDA (Synthetic Agriculture Data in Africa) is a Python library for generating, augmenting, and validating synthetic agricultural data for African regions. The library allows users to create realistic agricultural datasets, augment them using machine learning techniques, and validate the generated data against real data. SAGDA can also integrate real-world data from APIs like NASA POWER and OpenWeatherMap to enhance synthetic data generation.

Key Features

  • Generate Synthetic Agricultural Data: Create time-series datasets with geospatial, climate, soil, crop yield, and fertilizer information.
  • Augment Data: Augment real agricultural datasets using machine learning techniques such as random sampling, interpolation, linear regression, autoencoders, and GANs.
  • Validate Data: Validate synthetic data using statistical tests (e.g., KS test) and compare it against real data.
  • API Integration: Fetch real-world climate data using NASA POWER and OpenWeatherMap APIs, or generate synthetic data based on user-specified parameters.

Installation

You can install SAGDA from PyPI using pip:

pip install sagda

Dependencies

  • pandas
  • numpy
  • requests
  • scipy
  • tensorflow
  • scikit-learn

Usage Examples

1. Generate Synthetic Agricultural Data

You can generate synthetic data using either random generation or real-world data from APIs (e.g., NASA POWER, OpenWeatherMap).

Basic Usage (Without API)

from sagda import generate

# Generate synthetic data without API
data = generate(
    num_records=12,
    start_date="2020-01-01",
    end_date="2020-12-31",
    lat=34.0522,
    lon=-118.2437,
    crop_type="corn",
    frequency='monthly'
)

print(data)

Usage with NASA POWER API

from sadga import generate

# Generate synthetic data using NASA POWER API
data = generate(
    start_date="2020-01-01",
    end_date="2020-12-31",
    lat=34.0522,
    lon=-118.2437,
    use_nasa=True,
    nasa_api_key="your_nasa_api_key",
    frequency='monthly'
)

print(data)

2. Augment Agricultural Data

You can augment real data using techniques like random sampling, linear regression, autoencoders, or GANs.

Linear Regression-Based Augmentation

from sadga import augment
import pandas as pd

# Example real data
real_data = pd.DataFrame({
    'soil_ph': [6.5, 6.3, 6.7],
    'temperature_min': [12, 14, 13],
    'rainfall': [40, 50, 45],
    'fertilizer_n': [100, 90, 95],
    'yield_kg_per_ha': [3200, 3100, 3300]
})

# Augment the data using linear regression
augmented_data = augment(
    real_data=real_data,
    num_augmented_records=50,
    start_date="2021-01-01",
    end_date="2021-12-31",
    technique='linear_regression',
    lat=34.0522,
    lon=-118.2437
)

print(augmented_data)

3. Validate Synthetic Data

Validate the generated synthetic data by comparing it with real-world data.

Validation Example

from sadga import validate
import pandas as pd

# Example real data
real_data = pd.DataFrame({
    'soil_ph': [6.5, 6.3, 6.7],
    'temperature_min': [12, 14, 13],
    'rainfall': [40, 50, 45],
    'fertilizer_n': [100, 90, 95],
    'yield_kg_per_ha': [3200, 3100, 3300]
})

# Example synthetic data
synthetic_data = pd.DataFrame({
    'soil_ph': [6.4, 6.5, 6.6],
    'temperature_min': [12.5, 13.0, 14.0],
    'rainfall': [43, 44, 46],
    'fertilizer_n': [101, 96, 99],
    'yield_kg_per_ha': [3205, 3105, 3295]
})

# Validate the synthetic data
validation_report = validate(synthetic_data, real_data=real_data)
print(validation_report)

4. API Integration

SADGA integrates with NASA POWER and OpenWeatherMap APIs for fetching real-world climate data. You can pass your API keys as parameters.

NASA POWER API Example

from sadga import generate

# Generate data using NASA POWER API
data = generate(
    start_date="2020-01-01",
    end_date="2020-12-31",
    lat=34.0522,
    lon=-118.2437,
    use_nasa=True,
    nasa_api_key="your_nasa_api_key",
    frequency='monthly'
)

print(data)

Project Structure

SADGA/
│
├── sagda/                   # Main package directory
│   ├── __init__.py          # Initialize the package
│   ├── data_generation.py   # Generate synthetic agricultural data
│   ├── data_augmentation.py # Augment real data with synthetic data
│   ├── data_validation.py   # Validate synthetic data
│   ├── api_utils.py         # API integration functions
│   ├── utils.py             # Helper functions
├── tests/                   # Test directory
│   ├── test_data_generation.py # Test synthetic data generation
│   ├── test_data_augmentation.py # Test data augmentation
│   ├── test_data_validation.py # Test data validation
├── setup.py                 # Configuration for PyPI
├── README.md                # Project documentation
├── LICENSE                  # License file
├── MANIFEST.in              # Include non-Python files
├── requirements.txt         # Dependencies

Contributing

We welcome contributions! Please fork the repository, create a new branch for your feature or bug fix, and submit a pull request.


License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sagda-0.1.1.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sagda-0.1.1-py3-none-any.whl (8.4 kB view details)

Uploaded Python 3

File details

Details for the file sagda-0.1.1.tar.gz.

File metadata

  • Download URL: sagda-0.1.1.tar.gz
  • Upload date:
  • Size: 8.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.9

File hashes

Hashes for sagda-0.1.1.tar.gz
Algorithm Hash digest
SHA256 f36ed5b78ce811c23e900016bd82298733a461b81bf1e47e0500945e408d37a6
MD5 2545fae47e4b138fc92feb1c766ff9c4
BLAKE2b-256 62b1c4a6250f58f4503353c993a12eb927029b60910412a65f7736e66b350b27

See more details on using hashes here.

File details

Details for the file sagda-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: sagda-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 8.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.9

File hashes

Hashes for sagda-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1a6bf4b87154c3c32c862c24e819618de4e00ce54ca4e4646ee2b1ddf09c5f61
MD5 fe7ab13e4eb01adee4f4e8b49cb2c9f5
BLAKE2b-256 421adb3603c139f3a876b2ce006bcca7a2a6fc94bf74f990c70cbd769567f494

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page