Skip to main content

Signal correction module for Ariel Data Challenge 2025.

Project description

Ariel Data Preprocessing

PyPI release Unittest

This module contains the complete FGS1 and AIRS-CH0 signal data preprocessing pipeline for the Ariel Data Challenge.

Submodules

  1. Signal correction - Complete 6-step calibration pipeline for raw telescope data
  2. Signal extraction - Intelligent data reduction and spectral signal extraction

1. Signal correction

Implements the complete six-step signal correction pipeline outlined in the Calibrating and Binning Ariel Data notebook shared by the contest organizers. This module transforms raw Ariel telescope data into science-ready corrected signals.

Processing Pipeline:

  1. ADC Conversion - Convert raw counts to physical units
  2. Hot/Dead Pixel Masking - Remove problematic detector pixels
  3. Linearity Correction - Account for non-linear detector response
  4. Dark Current Subtraction - Remove thermal background noise
  5. Correlated Double Sampling (CDS) - Reduce read noise via paired exposures
  6. Flat Field Correction - Normalize pixel-to-pixel sensitivity variations

Key Features:

  • Multiprocessing support for parallel planet processing
  • Optional FGS1 downsampling to match AIRS-CH0 cadence (83% data reduction)
  • Configurable processing steps (enable/disable individual corrections)
  • HDF5 output for efficient large dataset storage

See the following notebooks for implementation details and performance analysis:

  1. Signal correction
  2. Signal correction optimization

Example use:

from ariel_data_preprocessing.signal_correction import SignalCorrection

signal_correction = SignalCorrection(
    input_data_path='data/raw',
    output_data_path='data/corrected',
    n_cpus=4,
    downsample_fgs=True,
    n_planets=100
)

signal_correction.run()

The signal correction pipeline will write the corrected frames and hot/dead pixel masks as an HDF5 archive called train.h5 by default with the following structure:

    train.h5:
    │
    ├── planet_id_1/
    │   ├── AIRS-CH0_signal   # Corrected spectrometer data
    │   ├── AIRS-CH0_mask    # Mask for spectrometer data
    │   ├── FGS1_signal      # Corrected guidance camera data
    │   └── FGS1_mask        # Mask for guidance camera data
    |
    ├── planet_id_2/
    │   ├── AIRS-CH0_signal  # Corrected spectrometer data
    │   ├── AIRS-CH0_mask    # Mask for spectrometer data
    │   ├── FGS1_signal      # Corrected guidance camera data
    │   └── FGS1_mask        # Mask for guidance camera data
    |
    └── ...

2. Signal extraction

Complete extraction pipeline for both AIRS-CH0 and FGS1 data

Takes signal corrected data from SignalCorrection() and extracts clean spectral signals through intelligent pixel selection and data reduction. This module transforms 3D detector arrays into focused time series suitable for exoplanet atmospheric analysis.

Processing Features:

  • AIRS-CH0 Extraction: Selects brightest detector rows containing spectral traces, sums to create 1D spectra per frame
  • FGS1 Extraction: Uses 2D block extraction to identify signal regions, collapses to single brightness value per frame
  • Combined Output: Merges FGS1 and AIRS-CH0 signals (FGS1 as first column for transit detection)
  • Adaptive Thresholding: Automatically selects signal-bearing pixels based on configurable intensity thresholds
  • Optional Smoothing: Applies moving average filtering across wavelengths to reduce noise
  • Massive Data Reduction: Achieves ~97-98% volume reduction while preserving transit signals

Key Benefits:

  • Dramatically faster downstream processing due to reduced data volume
  • Improved signal-to-noise ratio by focusing on high-signal detector regions
  • Preserved exoplanet transit signatures with cleaner temporal structure
  • Unified processing for both instrument types

See the following notebooks for implementation details and analysis:

  1. AIRS-CH0 signal extraction
  2. FGS1 signal extraction
  3. Wavelength smoothing

Example usage:

from ariel_data_preprocessing.signal_extraction import SignalExtraction

signal_extraction = SignalExtraction(
    input_data_path='data/corrected',
    output_data_path='data/extracted',
    input_filename='train.h5',
    inclusion_threshold=0.75,
    smooth=True,
    smoothing_window=200
)

output_file = signal_extraction.run()

Output data will be written to train.h5 by default in the directory passed to output_data_path. The structure of the HDF5 archive combines both AIRS-CH0 and FGS1 signals:

    train.h5
    |
    ├── planet_1/
    │   ├── signal  # Shape: (n_frames, n_wavelengths + 1) - FGS1 + AIRS-CH0
    │   └── mask    # Shape: (n_wavelengths + 1,) - combined mask
    │
    ├── planet_2/
    │   ├── signal  # Shape: (n_frames, n_wavelengths + 1) - FGS1 + AIRS-CH0  
    │   └── mask    # Shape: (n_wavelengths + 1,) - combined mask
    │
    └── ...

Note: The first column of the signal dataset contains the extracted FGS1 brightness time series, followed by the AIRS-CH0 spectral channels. This structure facilitates easy transit detection using FGS1 data while providing wavelength-dependent atmospheric information from AIRS-CH0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ariel_data_preprocessing-1.2a4.tar.gz (11.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ariel_data_preprocessing-1.2a4-py3-none-any.whl (19.0 kB view details)

Uploaded Python 3

File details

Details for the file ariel_data_preprocessing-1.2a4.tar.gz.

File metadata

  • Download URL: ariel_data_preprocessing-1.2a4.tar.gz
  • Upload date:
  • Size: 11.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ariel_data_preprocessing-1.2a4.tar.gz
Algorithm Hash digest
SHA256 cc59e313274aa9c13ed59fc559194d8b374feadc49582803e2157adf835bb43f
MD5 9c99b24828fdea5fb126a1b5d3dabea6
BLAKE2b-256 9bfa6aa304416552f87529ea43f515ff188e6ed78fe0ab8d9c18c7e637d6ba70

See more details on using hashes here.

Provenance

The following attestation bundles were made for ariel_data_preprocessing-1.2a4.tar.gz:

Publisher: pypi_release.yml on gperdrizet/ariel-data-challenge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ariel_data_preprocessing-1.2a4-py3-none-any.whl.

File metadata

File hashes

Hashes for ariel_data_preprocessing-1.2a4-py3-none-any.whl
Algorithm Hash digest
SHA256 d9c7e12dd1ac5cca2788adf397b99cd48af32a2ecb163322438e87ca77d00d5d
MD5 32fe464ab1d0dbf217c142388ff87ea4
BLAKE2b-256 def3a07b0c018ae52b58f759228a8b1d03ef7b72b2c405ef36f0c76618eb9816

See more details on using hashes here.

Provenance

The following attestation bundles were made for ariel_data_preprocessing-1.2a4-py3-none-any.whl:

Publisher: pypi_release.yml on gperdrizet/ariel-data-challenge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page