Skip to main content

Python library for cleaning data in large datasets of Xrays

Project description

cleanX

CleanX (DOI) License: GPL-3 is an open source python library for exploring, cleaning and augmenting large datasets of X-rays, or certain other types of radiological images. JPEG files can be extracted from DICOM files or used directly.

The latest official release:

PyPI

primary author: Candace Makeda H. Moore

other authors + contributors: Oleg Sivokon, Andrew Murphy

Continous Integration (CI) status

ci workflow ci workflow

Requirements

  • a python installation (3.7, 3.8 or 3.9)
  • ability to create virtual environments (recommended, not absolutely necessary)
  • tesseract-ocr, matplotlib, pandas, pillow and opencv
  • optional recommendation of simpleITK or pydicom for DICOM/dcm to jpg conversion
  • anaconda is now supported, but not technically necessary

Documentation

Online documentation at https://drcandacemakedamoore.github.io/cleanX/

We encourage you to build up-to-date documentation by command.

Documentation can be generated by command:

python setup.py apidoc
python setup.py build_sphinx

The documentation will be generated in ./build/sphinx/html directory. Documentation is generated automatically as new functions are added.

Installation

  • setting up a virtual environment is desirable, but not absolutely necessary

  • activate the environment

Anaconda Installation

  • use command for conda as below

      conda install -c doctormakeda -c conda-forge cleanx       
    

You need to specify both channels because there are some cleanX dependencies that exist in both Anaconda main channel and in conda-forge

pip installation

  • use pip as below

      pip install cleanX
    

About using this library

If you use the library, please credit me and my collaborators. You are only free to use this library according to license. We hope that if you use the library you will open source your entire code base, and send us modifications. You can get in touch with me by email (doctormakeda@gmail.com) if you have a legitimate reason to use my library without open-sourcing your code base, or following other conditions, and I can make you specifically a different license.

We are adding new functions and classes all the time. Many unit tests are available in the test folder. Test coverage is currently partial. Some newly added functions allow for rapid automated data augmentation (in ways that are realistic for radiological data). Some other classes and functions are for cleaning datasets including ones that:

    Get image and metadata out of dcm (DICOM) files into jpeg and csv files 

    Process datasets from csv or json or other formats to generate reports

    Run on dataframes to make sure there is no image leakage

    Run on a dataframe to look for demographic or other biases in patients

    Crop off excessive black frames (run this on single images) one at a time

    Run on a list to make a prototype tiny Xray others can be compared to

    Run on image files which are inside a folder to check if they are "clean"

    Take a dataframe with image names and return plotted(visualized) images  

    Run to make a dataframe of pics in a folder (assuming they all have the same 'label'/diagnosis)

    Normalize images in terms of pixel values (multiple methods)

All important functions are documented in the online documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

cleanX-0.1.9-py3.8.egg (77.2 kB view hashes)

Uploaded Source

cleanX-0.1.9-py3-none-any.whl (46.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page