Package helping to upload local resources to a CKAN server using the CKAN API. Metadata and the list of resources can be defined in an Excel spreadsheet. The package includes requests to download resources and checks metadata against formatting rules.
Project description
ckanapi_harvesters
Description
This package enables users to benefit from the CKAN API and provides functions which
realize complex API calls to achieve specific operations.
In this package, DataStores are returned/inputted as pandas DataFrames.
The underlying request mechanism uses the requests Session object, which improves performance with multiple requests.
This package is oriented in the management of CKAN datasets and resources.
Only a selection of API calls has been implemented in this objective.
To perform custom API calls, the function api_action_call is provided to the end user.
This package was initially designed to harvest a large DataStores from your local file system.
It also implements particular requests which can define a large DataStore.
Large datasets composed of multiple files can be uploaded/downloaded
through scripts into a single resource or multiple resources.
For a DataStore, large files are uploaded with a limited number of rows per request.
The package is divided in the following sections:
ckan_api: functions interacting with the CKAN API. In addition to the base class which manages basic parameters and requests, API functions are divided as follows:- functions to map the CKAN packages and resources. The remote data structures are mapped in a mirrored data structure. CKAN DataStore information, organizations, licenses and resource views are optionally tracked.
- functions to query a DataStore or to download file resources.
- functions to apply a test a data format policy on a given package.
- functions to upsert data to a DataStore or to upload files to a resource.
- functions to manage CKAN objects (creating, patching, or removing packages, resources, and DataStores). These functions enable the user to change the metadata for these objects. The other objects are meant to be managed through the API.
policies: functions to check data format policies. A data format policy defines which attributes are mandatory for a package or resource. Specific rules can be implemented to restrict package tags to certain lists, grouped by vocabulary. Extra key-pair values of packages can be enforced. Resource formats can be restricted to a certain list.reports: functions to extract a report on the CKAN database in order to monitor package user access rights, resource memory occupation, modification dates and data format policy messages.harvesters: this module implements ways to load data from your local machine.file_formats: The primary approach is to use files on you local file system. The CSV and SHP (shape file) formats are currently supported.- In addition to the file formats, harvesters have been implemented to transfer data from a database. This is particularly useful if the database cannot be accessed by CKAN harvester extensions because it would only be available locally. MongoDB and PostgreSQL databases are currently supported.
builder: functions to automate package and resource metadata patching and data uploads or downloads. These parameters can be defined in an Excel workbook and files from the local file system can be referred as inputs for the data upload. The parameters can also be deduced from an online CKAN package through the API.- Example scripts are given in this module, referring to an example Excel workbook. The Excel workbook is available in the package and at this link: builder_package_example.xlsx See also the notebook example in the current documentation here: builder_example_notebook.ipynb.
Useful links
Links to resources and documentation:
Python Package Template Architecture
.
├── sphinx
│ ├── conf.py
│ │ └── Sphinx documentation configuration file
│ └── index.rst
│ └── Root file for Sphinx documentation, structuring and linking source documents into complete documentation.
├── src
│ └── ckanapi_harvesters
│ ├── __init__.py
│ ├── main.py
│ │ └── Main file of your package, it references what is usable in your package
│ └── module_name
│ ├── __init__.py
│ └── module.py
│ └── Module file, each module holds a logic of the package
├── tests
│ └── Directory for testing the package and verifying that everything works
├── .gitattributes
│ └── Ensures that all text files use LF as the line ending, improving consistency across different development environments.
├── .bumpversion.toml
│ └── Configuration file for bumping the package version
├── .gitignore
│ └── File explicitly instructed for Git to ignore
├── .github
│ └── workflows
│ └── Github Ci/CD files
├── .pre-commit-config.yaml
│ └── Pre-commit configuration file
├── CONTRIBUTING.md
│ └── Contribution guidelines file
├── LICENSE
├── README.md
│ └── File with general information about the project
├── pyproject.toml
│ └── Package configuration file
└── tox.ini
└── Configuration file for `tox`, used to automate testing and linting tasks across multiple Python environments. This file is configured to use Python 3.12 and runs commands for the linter `ruff` as well as for tests with `pytest`. The specified commands check code style, format files according to defined standards, and run unit tests to ensure the code works as expected. This file is also used to facilitate version management tasks with `bump-my-version`.
Getting Started
Prerequisites
This project requires Python 3.12. Python 3.12 introduces many new features and improvements that are essential for the proper functioning of this project. Ensure that Python is correctly installed on your system by running python --version.
About the pyproject.toml File
The pyproject.toml file is a central configuration file for the Python project. It contains TOML tables specifying the basic metadata of the project, the dependencies needed to build your project, and specific configurations for the tools used.
The [project] table is used to specify the basic metadata of your project, such as dependencies, your name, etc. The [tool] table contains sub-tables specific to each tool, such as [tool.setuptools] or [tool.ruff]. For more information on configuring your pyproject.toml, refer to the Python documentation.
Installing Dependencies
The pyproject.toml file is used to manage the dependencies of this project. To install these dependencies, follow these steps:
- Open a terminal and navigate to the project directory.
- Run the command
pip install .to install the necessary dependencies for the project.
This process ensures that all required dependencies are correctly installed in your environment, allowing you to work on the project with all necessary resources.
To add or modify project dependencies, you must list them in your pyproject.toml file under the dependencies section.
dependencies = [
"pytest == 8.0.1",
# add necessary dependencies
]
Developing the Package
The CONTRIBUTING.md file is an essential guide for developing this Python package. It describes the steps to set up the development environment, the coding conventions to follow, and how to submit changes.
Once your changes are ready, push your contribution to the desired branch to trigger the integration pipeline, which will create the Python package and deploy it to the Python server.
For more details on contributing and best practices, please refer to the CONTRIBUTING.md file.
Using the Python Package
Installation
The package and its optional dependencies can be installed with the following command:
pip install ckanapi_harvesters[extras]
Example Usage of the Python Package in Your Code
After installation, you can import and use your package and its functions in your Python code:
from ckanapi_harvesters import CkanApi
ckan = CkanApi()
To use sub-modules defined in the package:
from ckanapi_harvesters.ckan_api import CkanApi
ckan = CkanApi()
These instructions will allow you to access the package and utilize its features effectively and in line with your development configuration.
License
This project is licensed under the MIT License, which means it is freely usable for personal and commercial purposes. The MIT License is one of the most permissive open source licenses. It allows you to do almost anything with the source code, as long as you retain the original license notice and copyright information when redistributing the software or substantial portions of it. This license comes without any warranties, so the software is provided "as is." For more details, please refer to the included LICENSE file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ckanapi_harvesters-0.0.22.tar.gz.
File metadata
- Download URL: ckanapi_harvesters-0.0.22.tar.gz
- Upload date:
- Size: 267.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
133fe617451c4b9a5aa6f7055f8f7653119f406ca299bc8816b4bda5b8119432
|
|
| MD5 |
50ced7e7fa6026e39921dd83d881d514
|
|
| BLAKE2b-256 |
c175328e3bfbcd9f1c88d5fc8d2831fdd9cbd43777d2dff3fbb542d0d6478112
|
File details
Details for the file ckanapi_harvesters-0.0.22-py3-none-any.whl.
File metadata
- Download URL: ckanapi_harvesters-0.0.22-py3-none-any.whl
- Upload date:
- Size: 335.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1da6130ff3c2ca197a084f36ad6f56edc75c5a9deb13ce742d8236560e5423de
|
|
| MD5 |
79e725659fd62f33b679f7c6264ad8a5
|
|
| BLAKE2b-256 |
7691f498ba030c72cd429d524d2d58bd9c35970b796326a13b14b405ae3f6873
|