Signal correction module for Ariel Data Challenge 2025.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

gperdrizet

These details have not been verified by PyPI

Project description

Ariel Data Preprocessing

This module contains the complete FGS1 and AIRS-CH0 signal data preprocessing pipeline for the Ariel Data Challenge.

Overview

The DataProcessor class provides an integrated pipeline that combines:

Signal correction - Complete 6-step calibration pipeline for raw telescope data
Signal extraction - Intelligent data reduction and spectral signal extraction

This unified approach transforms raw Ariel telescope data directly into extracted, science-ready spectral time series in a single processing step.

1. Signal correction

Implements the complete six-step signal correction pipeline outlined in the Calibrating and Binning Ariel Data notebook shared by the contest organizers. This module transforms raw Ariel telescope data into science-ready corrected signals.

Processing Pipeline:

ADC Conversion - Convert raw counts to physical units
Hot/Dead Pixel Masking - Remove problematic detector pixels
Linearity Correction - Account for non-linear detector response
Dark Current Subtraction - Remove thermal background noise
Correlated Double Sampling (CDS) - Reduce read noise via paired exposures
Flat Field Correction - Normalize pixel-to-pixel sensitivity variations

Key Features:

Multiprocessing support for parallel planet processing
Optional FGS1 downsampling to match AIRS-CH0 cadence (83% data reduction)
Configurable processing steps (enable/disable individual corrections)
HDF5 output for efficient large dataset storage

See the following notebooks for implementation details and performance analysis:

Example use:

from ariel_data_preprocessing.data_preprocessing import DataProcessor

data_processor = DataProcessor(
    input_data_path='data/raw',
    output_data_path='data/corrected',
    n_cpus=4,
    downsample_fgs=True,
    n_planets=100
)

data_processor.run()

The signal correction pipeline will write the corrected frames and hot/dead pixel masks as an HDF5 archive called train.h5 by default with the following structure:

    train.h5:
    │
    ├── planet_id_1/
    │   ├── signal  # Combined corrected/extracted spectral time series
    │   └── mask    # Dead/hot pixel mask for spectra
    |
    ├── planet_id_2/
    │   ├── signal  
    │   └── mask    
    |
    └── planet_id_n/

2. Signal extraction

Complete extraction pipeline integrated with DataProcessor

The signal extraction functionality is integrated within the DataProcessor class, which handles both signal correction and extraction in a unified pipeline. This approach transforms 3D detector arrays into focused time series suitable for exoplanet atmospheric analysis.

Processing Features:

AIRS-CH0 Extraction: Selects brightest detector rows containing spectral traces, sums to create 1D spectra per frame
FGS1 Extraction: Uses 2D block extraction to identify signal regions, collapses to single brightness value per frame
Combined Output: Merges FGS1 and AIRS-CH0 signals (FGS1 as first column for transit detection)
Adaptive Thresholding: Automatically selects signal-bearing pixels based on configurable intensity thresholds
Optional Smoothing: Applies moving average filtering across wavelengths to reduce noise
Massive Data Reduction: Achieves ~97-98% volume reduction while preserving transit signals

Key Benefits:

Dramatically faster downstream processing due to reduced data volume
Improved signal-to-noise ratio by focusing on high-signal detector regions
Preserved exoplanet transit signatures with cleaner temporal structure
Unified processing for both instrument types

See the following notebooks for implementation details and analysis:

Example usage:

The signal extraction is performed as part of the integrated DataProcessor pipeline:

from ariel_data_preprocessing.data_preprocessing import DataProcessor

data_processor = DataProcessor(
    input_data_path='data/raw',
    output_data_path='data/processed',
    inclusion_threshold=0.75,
    smooth=True,
    smoothing_window=200
)

data_processor.run()

Output data will be written to train.h5 by default in the directory passed to output_data_path. The structure of the HDF5 archive combines both AIRS-CH0 and FGS1 signals:

    train.h5
    |
    ├── planet_1/
    │   ├── signal  # Shape: (n_frames, n_wavelengths + 1) - FGS1 + AIRS-CH0
    │   └── mask    # Shape: (n_wavelengths + 1,) - combined mask
    │
    ├── planet_2/
    │   ├── signal  # Shape: (n_frames, n_wavelengths + 1) - FGS1 + AIRS-CH0  
    │   └── mask    # Shape: (n_wavelengths + 1,) - combined mask
    │
    └── planet_n/

Note: The first column of the signal dataset contains the extracted FGS1 brightness time series, followed by the AIRS-CH0 spectral channels. This structure facilitates easy transit detection using FGS1 data while providing wavelength-dependent atmospheric information from AIRS-CH0.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

gperdrizet

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.5

Sep 24, 2025

1.4

Sep 22, 2025

1.4a3 pre-release

Sep 22, 2025

1.4a2 pre-release

Sep 22, 2025

This version

1.4a1 pre-release

Sep 21, 2025

1.3

Sep 21, 2025

1.3a2 pre-release

Sep 21, 2025

1.2

Sep 17, 2025

1.2a4 pre-release

Sep 16, 2025

1.2a1 pre-release

Sep 13, 2025

1.1

Sep 9, 2025

1.1a3 pre-release

Sep 8, 2025

1.1a2 pre-release

Sep 8, 2025

1.1a1 pre-release

Sep 8, 2025

1.0

Sep 7, 2025

1.0a2 pre-release

Sep 7, 2025

1.0a1 pre-release

Sep 7, 2025

1.0a0 pre-release

Sep 7, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ariel_data_preprocessing-1.4a1.tar.gz (12.5 MB view details)

Uploaded Sep 21, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ariel_data_preprocessing-1.4a1-py3-none-any.whl (20.8 kB view details)

Uploaded Sep 21, 2025 Python 3

File details

Details for the file ariel_data_preprocessing-1.4a1.tar.gz.

File metadata

Download URL: ariel_data_preprocessing-1.4a1.tar.gz
Upload date: Sep 21, 2025
Size: 12.5 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ariel_data_preprocessing-1.4a1.tar.gz
Algorithm	Hash digest
SHA256	`2fe09efd2833078193136567b2d89e5eeb47a56c6e162d020de72f22d4a50ae6`
MD5	`106e42b107b276a94fc5bf360c8c07dc`
BLAKE2b-256	`8216fc71ffdd23371f93a579f3bbb0aa55b3b623c368f51c805a353c2fbef7a7`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ariel_data_preprocessing-1.4a1.tar.gz:

Publisher: pypi_release.yml on gperdrizet/ariel-data-challenge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ariel_data_preprocessing-1.4a1.tar.gz
- Subject digest: 2fe09efd2833078193136567b2d89e5eeb47a56c6e162d020de72f22d4a50ae6
- Sigstore transparency entry: 544196788
- Sigstore integration time: Sep 21, 2025
Source repository:
- Permalink: gperdrizet/ariel-data-challenge@e4da2a5b968fae139a32d9cc7e4120f21fec7be5
- Branch / Tag: refs/tags/1.4a1
- Owner: https://github.com/gperdrizet
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi_release.yml@e4da2a5b968fae139a32d9cc7e4120f21fec7be5
- Trigger Event: release

File details

Details for the file ariel_data_preprocessing-1.4a1-py3-none-any.whl.

File metadata

Download URL: ariel_data_preprocessing-1.4a1-py3-none-any.whl
Upload date: Sep 21, 2025
Size: 20.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ariel_data_preprocessing-1.4a1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b45bd7bff516989098c01a97ef9d441862b4155992711d53bdec57d177351b7e`
MD5	`17f0a369f674882342625365f5e9ee47`
BLAKE2b-256	`cd0d375e60b4cbc1707950fb039b2f7ab04c4745bf1f07b8d13c626f3f8ce584`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ariel_data_preprocessing-1.4a1-py3-none-any.whl:

Publisher: pypi_release.yml on gperdrizet/ariel-data-challenge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ariel_data_preprocessing-1.4a1-py3-none-any.whl
- Subject digest: b45bd7bff516989098c01a97ef9d441862b4155992711d53bdec57d177351b7e
- Sigstore transparency entry: 544196789
- Sigstore integration time: Sep 21, 2025
Source repository:
- Permalink: gperdrizet/ariel-data-challenge@e4da2a5b968fae139a32d9cc7e4120f21fec7be5
- Branch / Tag: refs/tags/1.4a1
- Owner: https://github.com/gperdrizet
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi_release.yml@e4da2a5b968fae139a32d9cc7e4120f21fec7be5
- Trigger Event: release

ariel-data-preprocessing 1.4a1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

Ariel Data Preprocessing

Overview

1. Signal correction

2. Signal extraction

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance