Skip to main content

Sample Handling and Analysis Kit for Experiments

Project description

FAIRshake

FAIRshake (Sample Handling and Analysis Kit for Experiments) is a comprehensive data processing pipeline designed for efficient benchmarking and processing of datasets, particularly in diffraction data analysis. It includes modules for benchmarking, data loading, preprocessing, integration, and exporting.

Table of Contents

Features

  • Benchmarking Modules: Assess the performance of data processing workflows.
  • Data Loading: Efficient handling of large-scale datasets.
  • Preprocessing: Data cleaning, normalization, and noise reduction.
  • Integration: Combine data from various formats and sources seamlessly.
  • Exporting: Output processed data in multiple formats for further analysis.

Installation

Requirements

  • Python 3.11 or higher

From Source

Clone the repository and install FAIRshake locally:

git clone https://github.com/FinleyHolt/SHAKE.git
cd SHAKE
pip install .

# FAIRshake

FAIRshake (Sample Handling and Analysis Kit for Experiments) is a comprehensive data processing pipeline designed for efficient benchmarking and processing of datasets, particularly in diffraction data analysis. It includes modules for benchmarking, data loading, preprocessing, integration, and exporting.

## Features

- **Benchmarking Modules**: Assess the performance of data processing workflows.
- **Data Loading**: Efficient handling of large-scale datasets.
- **Preprocessing**: Data cleaning, normalization, and noise reduction.
- **Integration**: Combine data from various formats and sources seamlessly.
- **Exporting**: Output processed data in multiple formats for further analysis.

## Installation

### Requirements

- Python 3.11 or higher

### From Source

Clone the repository and install FAIRshake locally:

```bash
git clone https://github.com/FinleyHolt/SHAKE.git
cd SHAKE
pip install .

Usage

FAIRshake provides command-line tools and modules for data processing, benchmarking, and integration of diffraction data.

Command-Line Interface

After installation, you can use the fairshake command. Use fairshake --help to see available commands:

fairshake --help

Data Processing Pipeline

To run the data processing pipeline on your dataset:

fairshake process --config <config-file> --data-dir <data-directory> --output-dir <output-directory>
Example Configuration File

Create a configuration file (e.g., config.json) specifying parameters for preprocessing, integration, and exporting:

{
  "preprocessing": {
    "dark_field_path": "path/to/dark_field.ge2",
    "mask_file_path": "path/to/mask.edf",
    "invert_mask": true,
    "min_intensity": 0.0,
    "max_intensity": null
  },
  "integration": {
    "poni_file_path": "calibration_files/det0.poni",
    "npt_radial": 500,
    "unit": "2th_deg",
    "do_solid_angle": false,
    "error_model": "poisson",
    "radial_range": [3, 13],
    "azimuth_range": [-180, 180],
    "polarization_factor": 0.99,
    "method": ["full", "histogram", "cython"]
  },
  "exporting": {
    "output_directory": "path/to/output",
    "naming_convention": "{GE_filenumber}_{iter}",
    "options": {
      "do_remove_nan": true,
      "unit": "2th_deg"
    },
    "file_format": "fxye"
  }
}

Benchmarking

To benchmark the performance of the data processing pipeline:

fairshake benchmark --data-dir <data-directory> \
                    --iterations <iterations> \
                    --batch-size <batch-size> \
                    --files-per-dataset <files-per-dataset>

Example:

fairshake benchmark --data-dir data/benchmark_files \
                    --iterations 1 \
                    --batch-size 5 \
                    --files-per-dataset 10

Programmatic Usage

You can use FAIRshake modules directly in your Python scripts:

from FAIRshake.execution_pipeline.pipeline import ExecutionPipeline

# Configuration Parameters
input_base_dir = 'path/to/input'
output_base_dir = 'path/to/output'

# Preprocessing configuration
preprocessing_config = {
    "dark_field_path": "path/to/dark_field.ge2",
    "mask_file_path": "path/to/mask.edf",
    "invert_mask": True,
    "min_intensity": 0.0,
    "max_intensity": None,
}

# Integration configuration
integration_config = {
    "poni_file_path": "calibration_files/det0.poni",
    "npt_radial": 500,
    "unit": "2th_deg",
    "do_solid_angle": False,
    "error_model": "poisson",
    "radial_range": (3, 13),
    "azimuth_range": [-180, 180],
    "polarization_factor": 0.99,
    "method": ["full", "histogram", "cython"]
}

# Exporting configuration
exporting_config = {
    "output_directory": output_base_dir,
    "naming_convention": "{GE_filenumber}_{iter}",
    "options": {
        "do_remove_nan": True,
        "unit": "2th_deg"
    },
    "file_format": "fxye"
}

# Pipeline parameters
pipeline_params = {
    "input_base_dir": input_base_dir,
    "output_base_dir": output_base_dir,
    "batch_size": 10,
    "data_file_types": ['.ge2', '.tif', '.edf', '.cbf', '.mar3450', '.h5', '.png'],
    "metadata_file_types": ['.json', '.poni', '.instprm', '.geom', '.spline'],
    "require_metadata": True,
    "load_metadata_files": True,
    "load_detector_metadata": False,
    "require_all_formats": False,
    "average_frames": False,
    "enable_profiling": True,
    "tf_data_debug_mode": False,
    "pattern": '*/*/*',
    "preprocessing_config": preprocessing_config,
    "enable_preprocessing": True,
    "enable_integration": True,
    "integration_config": integration_config,
    "enable_exporting": True,
    "exporting_config": exporting_config,
    "log_level": "ERROR"
}

# Initialize the Execution Pipeline
pipeline = ExecutionPipeline(**pipeline_params)

# Run the Pipeline
pipeline.run()

Ensure that you define preprocessing_config, integration_config, and exporting_config according to your requirements.

Help and Support

For detailed usage and options, use the help command:

fairshake process --help
fairshake benchmark --help

Contributing

Contributions are welcome. Please fork the repository and submit a pull request. For major changes, please open an issue first to discuss what you would like to change.

Steps to Contribute

  1. Fork the repository.
  2. Create a new branch (git checkout -b feature-branch).
  3. Make your changes.
  4. Commit your changes (git commit -m 'Add some feature').
  5. Push to the branch (git push origin feature-branch).
  6. Open a pull request.

License

This project is licensed under the BSD 3-Clause License. See the LICENSE.txt file for details.

Contact Information

For support or inquiries:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fairshake-0.1.0.tar.gz (40.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fairshake-0.1.0-py3-none-any.whl (47.7 kB view details)

Uploaded Python 3

File details

Details for the file fairshake-0.1.0.tar.gz.

File metadata

  • Download URL: fairshake-0.1.0.tar.gz
  • Upload date:
  • Size: 40.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for fairshake-0.1.0.tar.gz
Algorithm Hash digest
SHA256 213c683563da58c262c9996403bcbe5228e9969d58c08fb65a811eb55a31d5e2
MD5 f4692b41376f99c9fbba0d63264f226c
BLAKE2b-256 fbb8935d45b2217f8cf4ccc3441451235bb6c8716b671e8e0f171c31c169c208

See more details on using hashes here.

File details

Details for the file fairshake-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: fairshake-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 47.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for fairshake-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e9ec56a042620adf85795bed75df0dbb53ed1889a1ea12078c9ade7e972f470a
MD5 d4ea7ae1b4d423aa5e6e541d5ef13c92
BLAKE2b-256 c0e1db7a306106b55f5dbd9b92adad2e651405a52d75f1ae12e4dad3f2160b68

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page