No project description provided
Project description
Dataset Manager
Library to pre-process CSV files from Research Portal into usable datasets.
Installation
pip install minder.dataset-manager
Example
import logging
import asyncio
import sys
from typing import Optional
from minder.dataset_manager._utils import Dataset
from minder.dataset_manager.datasets import LabelledUtiDataset
from minder.research_portal_client import Configuration, JobManager
logging.basicConfig(level=logging.INFO)
Configuration.set_default(
Configuration(
access_token="---REDACTED---",
)
)
async def example1():
job_ids = ["c25249e0-82ff-43d1-9676-f3cead0228b9"]
async with JobManager() as job_manager:
files = Dataset.download(job_ids, job_manager)
dataset = LabelledUtiDataset.create(job_ids, files)
dataset.save("./my-dataset.npz")
async def example2():
job_ids = ["c25249e0-82ff-43d1-9676-f3cead0228b9"]
existing_dataset = "./my-dataset.npz"
async with JobManager() as job_manager:
download_task = Dataset.download(job_ids, job_manager)
try:
previous_dataset: Optional[Dataset] = None
if existing_dataset.exists():
previous_dataset = LabelledUtiDataset.load(existing_dataset)
finally:
files = await download_task
new_dataset = LabelledUtiDataset.create(job_ids, files)
dataset = (
await previous_dataset.update(new_dataset, job_manager=job_manager)
if previous_dataset is not None
else new_dataset
)
dataset.save("./my-dataset.npz")
async def main():
await example1()
await example2()
if sys.platform == "win32"::
asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())
asyncio.run(main())
Development
Useful commands
Setup
poetry install
Run tests
poetry run pytest
Code Coverage
This command consists of 2 parts:
- running tests with coverage collection
- formatting the report:
report
(text to stdout),xml
(GitLab compatible: cobertura),html
(visual)
poetry run coverage run -m pytest && poetry run coverage report -m
Linting
poetry run flake8
Formatting
poetry run black .
Type Checking
poetry run mypy .
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for minder.dataset_manager-0.1.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 599645a22b2a40deae16efa4a1d3a389146b924e83b59d13e009a7715ad9e7e4 |
|
MD5 | 092e50e87cab54cf50e67a66282c0749 |
|
BLAKE2b-256 | d1db0f9dbcd7cdbd9dc0f74b7b74171b26fcf2911b0a4e8e6f74f551fd75e398 |
Close
Hashes for minder.dataset_manager-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8f5994ffe5c73ae5aa4764d46ebf6b6db29cb42ef58fe24392d8c851e90aecff |
|
MD5 | 1949e98d03b8b492159d895996e88be1 |
|
BLAKE2b-256 | 05ca22bec17edddf2f8cb7a00103e0cf881cf9647a4ad42519ebf3ce080d813c |