Skip to main content

Data processing library implemented by Acuzle.

Project description

acutils

Python library providing a robust pipeline for data processing tasks.

The acutils library is designed to facilitate data processing tasks, especially for individuals dealing with custom data preprocessing before building machine learning algorithms. It has been used in various domains, including pathology image processing, custom segmentation, and frame extractions from videos.

HERE ARE THE ONLINE DOCUMENTATION AND THE DOCUMENTATION FILES.

Key features

  1. You only need to code one function for a custom treatment, nothing else.
  2. Easy random distribution/split of the data into datasets.
  3. Remember and reproduce your distribution/split by saving it to JSON files.
  4. Made for multiprocessing and facilitate GPU usage for computation.
  5. Works with any kind of data files.
  6. If some files are related (for example: two medical images from the same patient), you can define groups to ensure that those are in the same dataset (to avoid biases).

Brief example

import acutils as au

# Define the handler and linked it to a source directory
handler = au.handler.DataHandler('./data', allowed_cpus=2)

# Load filenames and labels
handler.load_data_fromdatapath()
handler.load_labels_fromsheet('./labels.xlsx', idcol="id", labelcol="label")

# Even load relations between files through groups (optional for split)
handler.load_groups_fromsheet(os.path.join(DIR, './labels.xlsx'), 
                               idcol="id", groupcol="patient")

# Randomly split into datasets (dict<filename,label>) and balance them
tdata, vdata = handler.split(train_percentage=0.70) # use groups if defined
bal_tdata, bal_vdata = handler.balance_datasets(tdata, vdata)

#TODO
def your_custom_treatment(self, src, dstdir, arg1):
    au.file.tmnt_copyfile_to_dir(src, dstdir) # example

# Process the data using your custom function and save the datasets:
handler.make_datasets('./train_bal', './val_bal', bal_tdata, bal_vdata, 
                      func=your_custom_treatment, # custom function
                      arg1="very useful argument") # its arguments

Installation

  • It should work using any OS, but for now, we only tested using Ubuntu 22.04.
  • It works using any Python version >= 3.8 (maybe lowers too, but not tested yet).

From pip

pip install acutils-python

From this repository (still pip though :)

pip install --upgrade build
python3 -m build
pip install dist/acutils_python-0.1.1-py3-none-any.whl

Additional requirements

  • Pillow, scikit-image, pooch and openslide-python: pathology module.
  • opencv-python: image, pathology and video modules.
pip install opencv-python Pillow scikit-image pooch openslide-python

Finally, the pathology module requires you to install Openslide, openslide-python is just a mapping of it. Maybe reinstall openslide-python after installing Openslide, but it should not be necesarry.

GPU computation

For now, this is only used in pathology module (because the process takes a while).

  • cupy: numpy but using GPU.
  • cucim: includes cucim.skimage that is skimage (older version) using GPU.
pip install cupy cucim

Make your CUDA install locatable from cupy:

export LD_LIBRARY_PATH=/path/to/cudnn/lib:$LD_LIBRARY_PATH

Choose at least one device (if multiple, the first is taken):

export CUDA_VISIBLE_DEVICES=0

Modules

Use the relevant modules from acutils based on your data processing needs:

  • handler: High-level classes to handle data processing.

  • file: About directories and files.

  • gpu: GPU computation (for now, only used in pathology module).

  • image: Computer Vision tasks on images.

  • multiprocess: Multiprocessing and process management.

  • pathology: Pathology data processing for segmentation and tiling.

  • sheet: Handling pandas DataFrames.

  • video: Computer Vision tasks on Videos.

Refer to the online documentation and code examples for detailed usage instructions.

TBD

  • Provide more code examples.
  • Define a test pipeline to check if all features work.
  • Ensure that acutils works on multiple OS and Python versions.

License

Apache-2.0, see the LICENSE.

Contributing

This library is maintained by Acuzle's development team, lead by @ThomasPDM.

We welcome and appreciate contributions from the community to enhance acutils. If you have ideas, bug fixes, or new features that can benefit others, we encourage you to contribute to the project. Just fork the project, create a new branch, do whatever you want and create a pull request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

acutils-python-0.1.1.tar.gz (24.7 kB view details)

Uploaded Source

Built Distribution

acutils_python-0.1.1-py3-none-any.whl (25.3 kB view details)

Uploaded Python 3

File details

Details for the file acutils-python-0.1.1.tar.gz.

File metadata

  • Download URL: acutils-python-0.1.1.tar.gz
  • Upload date:
  • Size: 24.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for acutils-python-0.1.1.tar.gz
Algorithm Hash digest
SHA256 aa6ef489640b705ac5dcabcae66014d7595df1bde315e35584d7d21f7d15e1ba
MD5 1bfaf2e09c4cafc85dbaa84598f3e711
BLAKE2b-256 30945f20776c119f4815939f830692447f0d132ca5874d81ca8e9c25a9d54a4c

See more details on using hashes here.

File details

Details for the file acutils_python-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for acutils_python-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b61362f6eb539872005de3cbf23d74c298458b1182aa37b8ea7c244064311060
MD5 9dd1fccb70565e5319ec82e7c42d51aa
BLAKE2b-256 758707226cdfc2009cfa1b262f73e53ead5a2d49080ff5fa8698ef0be39e5488

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page