No project description provided
Project description
Dataset Manager
Library to pre-process CSV files from Research Portal into usable datasets.
Installation
pip install minder.dataset-manager
Example
import logging
import asyncio
import sys
from typing import Optional
from minder.dataset_manager._utils import Dataset
from minder.dataset_manager.datasets import LabelledUtiDataset
from minder.research_portal_client import Configuration, JobManager
logging.basicConfig(level=logging.INFO)
Configuration.set_default(
Configuration(
access_token="---REDACTED---",
)
)
async def example1():
job_ids = ["c25249e0-82ff-43d1-9676-f3cead0228b9"]
async with JobManager() as job_manager:
files = Dataset.download(job_ids, job_manager)
dataset = LabelledUtiDataset.create(job_ids, files)
dataset.save("./my-dataset.npz")
async def example2():
job_ids = ["c25249e0-82ff-43d1-9676-f3cead0228b9"]
existing_dataset = "./my-dataset.npz"
async with JobManager() as job_manager:
download_task = Dataset.download(job_ids, job_manager)
try:
previous_dataset: Optional[Dataset] = None
if existing_dataset.exists():
previous_dataset = LabelledUtiDataset.load(existing_dataset)
finally:
files = await download_task
new_dataset = LabelledUtiDataset.create(job_ids, files)
dataset = (
await previous_dataset.update(new_dataset, job_manager=job_manager)
if previous_dataset is not None
else new_dataset
)
dataset.save("./my-dataset.npz")
async def main():
await example1()
await example2()
if sys.platform == "win32"::
asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())
asyncio.run(main())
Development
Useful commands
Setup
poetry install
Run tests
poetry run pytest
Code Coverage
This command consists of 2 parts:
- running tests with coverage collection
- formatting the report:
report
(text to stdout),xml
(GitLab compatible: cobertura),html
(visual)
poetry run coverage run -m pytest && poetry run coverage report -m
Linting
poetry run flake8
Formatting
poetry run black .
Type Checking
poetry run mypy .
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file minder.dataset_manager-0.1.0.tar.gz
.
File metadata
- Download URL: minder.dataset_manager-0.1.0.tar.gz
- Upload date:
- Size: 8.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.13 CPython/3.8.8 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 599645a22b2a40deae16efa4a1d3a389146b924e83b59d13e009a7715ad9e7e4 |
|
MD5 | 092e50e87cab54cf50e67a66282c0749 |
|
BLAKE2b-256 | d1db0f9dbcd7cdbd9dc0f74b7b74171b26fcf2911b0a4e8e6f74f551fd75e398 |
File details
Details for the file minder.dataset_manager-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: minder.dataset_manager-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.13 CPython/3.8.8 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8f5994ffe5c73ae5aa4764d46ebf6b6db29cb42ef58fe24392d8c851e90aecff |
|
MD5 | 1949e98d03b8b492159d895996e88be1 |
|
BLAKE2b-256 | 05ca22bec17edddf2f8cb7a00103e0cf881cf9647a4ad42519ebf3ce080d813c |