Generating and Augmenting Agricultural Synthetic Data in Africa
Project description
SAGDA: Synthetic Agriculture Data in Africa
SAGDA (Synthetic Agriculture Data in Africa) is a Python library for generating, augmenting, and validating synthetic agricultural data for African regions. The library allows users to create realistic agricultural datasets, augment them using machine learning techniques, and validate the generated data against real data. SAGDA can also integrate real-world data from APIs like NASA POWER and OpenWeatherMap to enhance synthetic data generation.
Key Features
- Generate Synthetic Agricultural Data: Create time-series datasets with geospatial, climate, soil, crop yield, and fertilizer information.
- Augment Data: Augment real agricultural datasets using machine learning techniques such as random sampling, interpolation, linear regression, autoencoders, and GANs.
- Validate Data: Validate synthetic data using statistical tests (e.g., KS test) and compare it against real data.
- API Integration: Fetch real-world climate data using NASA POWER and OpenWeatherMap APIs, or generate synthetic data based on user-specified parameters.
Installation
You can install SAGDA from PyPI using pip:
pip install sagda
Dependencies
pandasnumpyrequestsscipytensorflowscikit-learn
Usage Examples
1. Generate Synthetic Agricultural Data
You can generate synthetic data using either random generation or real-world data from APIs (e.g., NASA POWER, OpenWeatherMap).
Basic Usage (Without API)
from sagda import generate
# Generate synthetic data without API
data = generate(
num_records=12,
start_date="2020-01-01",
end_date="2020-12-31",
lat=34.0522,
lon=-118.2437,
crop_type="corn",
frequency='monthly'
)
print(data)
Usage with NASA POWER API
from sadga import generate
# Generate synthetic data using NASA POWER API
data = generate(
start_date="2020-01-01",
end_date="2020-12-31",
lat=34.0522,
lon=-118.2437,
use_nasa=True,
nasa_api_key="your_nasa_api_key",
frequency='monthly'
)
print(data)
2. Augment Agricultural Data
You can augment real data using techniques like random sampling, linear regression, autoencoders, or GANs.
Linear Regression-Based Augmentation
from sadga import augment
import pandas as pd
# Example real data
real_data = pd.DataFrame({
'soil_ph': [6.5, 6.3, 6.7],
'temperature_min': [12, 14, 13],
'rainfall': [40, 50, 45],
'fertilizer_n': [100, 90, 95],
'yield_kg_per_ha': [3200, 3100, 3300]
})
# Augment the data using linear regression
augmented_data = augment(
real_data=real_data,
num_augmented_records=50,
start_date="2021-01-01",
end_date="2021-12-31",
technique='linear_regression',
lat=34.0522,
lon=-118.2437
)
print(augmented_data)
3. Validate Synthetic Data
Validate the generated synthetic data by comparing it with real-world data.
Validation Example
from sadga import validate
import pandas as pd
# Example real data
real_data = pd.DataFrame({
'soil_ph': [6.5, 6.3, 6.7],
'temperature_min': [12, 14, 13],
'rainfall': [40, 50, 45],
'fertilizer_n': [100, 90, 95],
'yield_kg_per_ha': [3200, 3100, 3300]
})
# Example synthetic data
synthetic_data = pd.DataFrame({
'soil_ph': [6.4, 6.5, 6.6],
'temperature_min': [12.5, 13.0, 14.0],
'rainfall': [43, 44, 46],
'fertilizer_n': [101, 96, 99],
'yield_kg_per_ha': [3205, 3105, 3295]
})
# Validate the synthetic data
validation_report = validate(synthetic_data, real_data=real_data)
print(validation_report)
4. API Integration
SADGA integrates with NASA POWER and OpenWeatherMap APIs for fetching real-world climate data. You can pass your API keys as parameters.
NASA POWER API Example
from sadga import generate
# Generate data using NASA POWER API
data = generate(
start_date="2020-01-01",
end_date="2020-12-31",
lat=34.0522,
lon=-118.2437,
use_nasa=True,
nasa_api_key="your_nasa_api_key",
frequency='monthly'
)
print(data)
Project Structure
SADGA/
│
├── sagda/ # Main package directory
│ ├── __init__.py # Initialize the package
│ ├── data_generation.py # Generate synthetic agricultural data
│ ├── data_augmentation.py # Augment real data with synthetic data
│ ├── data_validation.py # Validate synthetic data
│ ├── api_utils.py # API integration functions
│ ├── utils.py # Helper functions
├── tests/ # Test directory
│ ├── test_data_generation.py # Test synthetic data generation
│ ├── test_data_augmentation.py # Test data augmentation
│ ├── test_data_validation.py # Test data validation
├── setup.py # Configuration for PyPI
├── README.md # Project documentation
├── LICENSE # License file
├── MANIFEST.in # Include non-Python files
├── requirements.txt # Dependencies
Contributing
We welcome contributions! Please fork the repository, create a new branch for your feature or bug fix, and submit a pull request.
License
This project is licensed under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sagda-0.1.1.tar.gz.
File metadata
- Download URL: sagda-0.1.1.tar.gz
- Upload date:
- Size: 8.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f36ed5b78ce811c23e900016bd82298733a461b81bf1e47e0500945e408d37a6
|
|
| MD5 |
2545fae47e4b138fc92feb1c766ff9c4
|
|
| BLAKE2b-256 |
62b1c4a6250f58f4503353c993a12eb927029b60910412a65f7736e66b350b27
|
File details
Details for the file sagda-0.1.1-py3-none-any.whl.
File metadata
- Download URL: sagda-0.1.1-py3-none-any.whl
- Upload date:
- Size: 8.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1a6bf4b87154c3c32c862c24e819618de4e00ce54ca4e4646ee2b1ddf09c5f61
|
|
| MD5 |
fe7ab13e4eb01adee4f4e8b49cb2c9f5
|
|
| BLAKE2b-256 |
421adb3603c139f3a876b2ce006bcca7a2a6fc94bf74f990c70cbd769567f494
|