Commands to download datasets from the Open Data portal of the Kingdom of Saudi Arabia (KSA).
Project description
open-ksa
Overview
open-ksa is a Python package designed to facilitate downloading and managing datasets from the Kingdom of Saudi Arabia's Open Data portal. It provides a set of utilities to fetch organization information, retrieve dataset resources, and download data files efficiently. This package is ideal for developers and data engineers looking to programmatically access and work with KSA open data.
Installation
Install the package via pip:
pip install open-ksa
Key Features
- Fetch organization details and metadata.
- Retrieve dataset IDs and resources for specific organizations.
- Download dataset resources with support for common file formats (CSV, XLSX, XLS).
- Manage SSL connections with a custom adapter for reliable HTTPS requests.
- Concurrent downloading with progress indication.
Package Structure
organizations(): Fetches organization information from the KSA Open Data API. Supports saving results to JSON or CSV.get_org_resources(org_id): Retrieves organization name, ID, and dataset IDs for a given organization.get_dataset_resource(dataset_id, ...): Downloads all resources for a single dataset matching allowed file extensions.get_dataset_resources(dataset_ids, ...): Concurrently downloads resources for multiple datasets.download_file(session, url, headers, file_path, resource_id, ...): Low-level utility to download a file with error handling.SSLAdapterandSingletonSession: Custom SSL adapter and singleton session for consistent HTTPS requests.
Usage Examples
1. List Organizations
from open_ksa import organizations
orgs = organizations()
for org in orgs['content'][:10]:
print(org['name'])
2. Download Resources for a Single Dataset
from open_ksa import get_dataset_resource
dataset_id = 'e63563d0-3312-48f3-8786-7c3e2af61fe7'
get_dataset_resource(dataset_id, verbose=True)
3. Download Resources for Multiple Datasets of an Organization
from open_ksa import get_org_resources, get_dataset_resources
org_id = 'a9e617ff-d918-4f4d-8be1-c42b733b1143' # King Saud University
resources = get_org_resources(org_id=org_id)
dataset_ids = resources['dataset_ids']
get_dataset_resources(dataset_ids=dataset_ids[:10],
output_dir=f"opendata/{resources['organization_name'].strip().replace(' ', '_').lower()}",
allowed_exts=['csv'],
verbose=False,
show_progress=True)
4. Full Workflow: Download All Datasets for an Organization
import open_ksa as ok
def main():
orgs = ok.organizations()
ks = orgs['content'][3]['publisherID'] # Select organization by index
resources = ok.get_org_resources(org_id=ks)
dataset_ids = resources['dataset_ids']
ok.get_dataset_resources(dataset_ids=dataset_ids,
output_dir=f"opendata/{resources['organization_name'].strip().replace(' ', '_').lower()}")
if __name__ == "__main__":
main()
Process Flow Diagram
graph TD
A[Start] --> B[Create SSL Adapter and Session]
B --> C[Fetch Organization List]
C --> D[Select Organization ID]
D --> E[Fetch Dataset IDs for Organization]
E --> F[Download Dataset Resources]
F --> G[Save Files Locally]
G --> H[End]
Running Tests
Tests are located in the open_ksa/tests directory. Run tests using:
pytest open_ksa/tests
Examples
See the examples/scripts directory for practical usage scripts:
1_organizations.py: List organizations.2_get_dataset_resource.py: Download resources for a single dataset.3_get_dataset_resources.py: Download resources for multiple datasets.4_org_and_resources.py: Full workflow to download all datasets for an organization.
Contribution
Please follow the contribution guidelines:
- Fork the repository and create a new branch.
- Make your changes following the coding style.
- Submit a pull request with a detailed description.
- Discuss changes via issues or email before implementation.
- Follow the Code of Conduct.
License
This project is licensed under the terms of the MIT License. See the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file open_ksa-0.1.6a0.tar.gz.
File metadata
- Download URL: open_ksa-0.1.6a0.tar.gz
- Upload date:
- Size: 48.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cd0ba14fbedeac5b7e20904b634efb5bc481b1bf763a22065273f909b07286e3
|
|
| MD5 |
d3557410f710bc22ac7debe9ddde5daf
|
|
| BLAKE2b-256 |
f18c9a10dfcfb48efc26fcd7f8ce506b1e3edb15b1f4e704946d6909f7e833e3
|
Provenance
The following attestation bundles were made for open_ksa-0.1.6a0.tar.gz:
Publisher:
publish-to-pypi.yml on Esturban/open_ksa
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
open_ksa-0.1.6a0.tar.gz -
Subject digest:
cd0ba14fbedeac5b7e20904b634efb5bc481b1bf763a22065273f909b07286e3 - Sigstore transparency entry: 219725250
- Sigstore integration time:
-
Permalink:
Esturban/open_ksa@ac745ad6753d6d8b3a4f5ce3e236d9a06dde5a78 -
Branch / Tag:
refs/tags/0.1.6-a - Owner: https://github.com/Esturban
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@ac745ad6753d6d8b3a4f5ce3e236d9a06dde5a78 -
Trigger Event:
push
-
Statement type:
File details
Details for the file open_ksa-0.1.6a0-py3-none-any.whl.
File metadata
- Download URL: open_ksa-0.1.6a0-py3-none-any.whl
- Upload date:
- Size: 14.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f28e9efa3d83b5bbd80a2a5e44e800d38c526269d3e4165f337c9ff1f9f02595
|
|
| MD5 |
604c32a5eb5f0a3de30b311b03eef7bb
|
|
| BLAKE2b-256 |
1f7f55c4ecb08f27caa82236e3805adf73cab6230bfeec80482f4fb014b3035d
|
Provenance
The following attestation bundles were made for open_ksa-0.1.6a0-py3-none-any.whl:
Publisher:
publish-to-pypi.yml on Esturban/open_ksa
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
open_ksa-0.1.6a0-py3-none-any.whl -
Subject digest:
f28e9efa3d83b5bbd80a2a5e44e800d38c526269d3e4165f337c9ff1f9f02595 - Sigstore transparency entry: 219725252
- Sigstore integration time:
-
Permalink:
Esturban/open_ksa@ac745ad6753d6d8b3a4f5ce3e236d9a06dde5a78 -
Branch / Tag:
refs/tags/0.1.6-a - Owner: https://github.com/Esturban
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@ac745ad6753d6d8b3a4f5ce3e236d9a06dde5a78 -
Trigger Event:
push
-
Statement type: