Data processing library implemented by Acuzle.
Project description
acutils
Python library providing a robust pipeline for data processing tasks.
The acutils library is designed to facilitate data processing tasks, especially for individuals dealing with custom data preprocessing before building machine learning algorithms. It has been used in various domains, including pathology image processing, custom segmentation, and frame extractions from videos.
HERE ARE THE ONLINE DOCUMENTATION AND THE DOCUMENTATION FILES.
Key features
- You only need to code one function for a custom treatment, nothing else.
- Easy random distribution/split of the data into datasets.
- Remember and reproduce your distribution/split by saving it to JSON files.
- Made for multiprocessing and facilitate GPU usage for computation.
- Works with any kind of data files.
- If some files are related (for example: two medical images from the same patient), you can define groups to ensure that those are in the same dataset (to avoid biases).
Brief example
import acutils as au
# Define the handler and linked it to a source directory
handler = au.handler.DataHandler('./data', allowed_cpus=2)
# Load filenames and labels
handler.load_data_fromdatapath()
handler.load_labels_fromsheet('./labels.xlsx', idcol="id", labelcol="label")
# Even load relations between files through groups (optional for split)
handler.load_groups_fromsheet(os.path.join(DIR, './labels.xlsx'),
idcol="id", groupcol="patient")
# Randomly split into datasets (dict<filename,label>) and balance them
tdata, vdata = handler.split(train_percentage=0.70) # use groups if defined
bal_tdata, bal_vdata = handler.balance_datasets(tdata, vdata)
#TODO
def your_custom_treatment(self, src, dstdir, arg1):
au.file.tmnt_copyfile_to_dir(src, dstdir) # example
# Process the data using your custom function and save the datasets:
handler.make_datasets('./train_bal', './val_bal', bal_tdata, bal_vdata,
func=your_custom_treatment, # custom function
arg1="very useful argument") # its arguments
Installation
- It should work using any OS, but for now, we only tested using Ubuntu 22.04.
- It works using any Python version >= 3.8 (maybe lowers too, but not tested yet).
From pip
pip install acutils-python
From this repository (still pip though :)
pip install --upgrade build
python3 -m build
pip install dist/acutils_python-0.1.1-py3-none-any.whl
Additional requirements
Pillow
,scikit-image
,pooch
andopenslide-python
: pathology module.opencv-python
: image, pathology and video modules.
pip install opencv-python Pillow scikit-image pooch openslide-python
Finally, the pathology module requires you to install Openslide, openslide-python
is just a mapping of it. Maybe reinstall openslide-python after installing Openslide, but it should not be necesarry.
GPU computation
For now, this is only used in pathology module (because the process takes a while).
cupy
: numpy but using GPU.cucim
: includes cucim.skimage that is skimage (older version) using GPU.
pip install cupy cucim
Make your CUDA install locatable from cupy:
export LD_LIBRARY_PATH=/path/to/cudnn/lib:$LD_LIBRARY_PATH
Choose at least one device (if multiple, the first is taken):
export CUDA_VISIBLE_DEVICES=0
Modules
Use the relevant modules from acutils based on your data processing needs:
-
handler: High-level classes to handle data processing.
-
file: About directories and files.
-
gpu: GPU computation (for now, only used in pathology module).
-
image: Computer Vision tasks on images.
-
multiprocess: Multiprocessing and process management.
-
pathology: Pathology data processing for segmentation and tiling.
-
sheet: Handling pandas DataFrames.
-
video: Computer Vision tasks on Videos.
Refer to the online documentation and code examples for detailed usage instructions.
TBD
- Provide more code examples.
- Define a test pipeline to check if all features work.
- Ensure that acutils works on multiple OS and Python versions.
License
Apache-2.0, see the LICENSE.
Contributing
This library is maintained by Acuzle's development team, lead by @ThomasPDM.
We welcome and appreciate contributions from the community to enhance acutils. If you have ideas, bug fixes, or new features that can benefit others, we encourage you to contribute to the project. Just fork the project, create a new branch, do whatever you want and create a pull request.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file acutils-python-0.1.1.tar.gz
.
File metadata
- Download URL: acutils-python-0.1.1.tar.gz
- Upload date:
- Size: 24.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | aa6ef489640b705ac5dcabcae66014d7595df1bde315e35584d7d21f7d15e1ba |
|
MD5 | 1bfaf2e09c4cafc85dbaa84598f3e711 |
|
BLAKE2b-256 | 30945f20776c119f4815939f830692447f0d132ca5874d81ca8e9c25a9d54a4c |
File details
Details for the file acutils_python-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: acutils_python-0.1.1-py3-none-any.whl
- Upload date:
- Size: 25.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b61362f6eb539872005de3cbf23d74c298458b1182aa37b8ea7c244064311060 |
|
MD5 | 9dd1fccb70565e5319ec82e7c42d51aa |
|
BLAKE2b-256 | 758707226cdfc2009cfa1b262f73e53ead5a2d49080ff5fa8698ef0be39e5488 |