Skip to main content

Metadata extractor and converter for scientific instrument file formats (AFM, EM, X-ray).

Project description

MetaXtract

PyPI Python License Pipeline Coverage Docs Ruff

Overview

MetaXtract is a Python package designed to extract metadata from various scientific file formats including:

  • .h5 (HDF5)
  • .xrdml (XRDML XML-based)
  • .dm4 (DigitalMicrograph)
  • .ibw (Igor Binary Wave)

Installation

From PyPI

# Basic installation (metadata extraction only)
pip install metaxtract

# With converters (xarray + netCDF4)
pip install "metaxtract[converters]"

# With every optional feature (converters, utils, checksums, viz)
pip install "metaxtract[all]"

From source

git clone https://gitlab.com/dataerai/MetaXtract.git
cd MetaXtract
pip install -e ".[dev]"     # editable install + tests, docs, lint, build tools

Optional Dependencies

  • xarray ([converters]): Required for file conversion to netCDF format
  • requests ([utils]): Required for the download_file() utility function
  • blake3 ([checksums]): Optional faster checksum algorithm (falls back to sha256 if not available)
  • plotly ([viz]): Required for interactive visualizations (metadata and data visualization)

Documentation

Full documentation is available at: https://dataerai.gitlab.io/MetaXtract/

Usage

Using the High-Level API

from metaxtract import extract_metadata, get_supported_formats

# Get list of supported formats
formats = get_supported_formats()
print(formats)

# Extract metadata from a file
metadata = extract_metadata("path/to/file.h5")
print(metadata)

Data Type Detection

from metaxtract import extract_metadata, detect_data_type

# Extract metadata
metadata = extract_metadata("file.ibw")

# Automatically detect data type
result = detect_data_type(metadata)
if result:
    print(f"Data Type: {result['type']}")
    print(f"Confidence: {result['confidence']:.2%}")

Visualization

from metaxtract import convert_file
from metaxtract.viz import visualize_data, visualize_metadata_tree

# Visualize metadata
metadata = extract_metadata("file.ibw")
fig = visualize_metadata_tree(metadata)
fig.show()

# Visualize data (automatically detects type and routes to appropriate visualization)
dataset = convert_file("file.ibw")
fig = visualize_data(dataset)
fig.show()

Utility Functions

from metaxtract.utils import download_file, benchmark_checksums

# Download a file
file_path = download_file("https://example.com/data.ibw")

# Benchmark checksum algorithms
results = benchmark_checksums("data.ibw", algorithms=['sha256', 'md5', 'blake3'])

Running the extractor

For h5 File type

from metaxtract.instruments.AFM.bandexcitation.h5 import H5
import json
from pprint import pprint

process_file = H5("path/to/file")
metadata = process_file.extract()

print(metadata)
pprint(metadata)  # Pretty Print likewise
print(json.dumps(metadata, indent=4))  # Print metadata in JSON format

For xrdml file type

from metaxtract.instruments.Xray.panalytical.xrdml import XRDML
import json
from pprint import pprint

process_file = XRDML("path/to/file")
metadata = process_file.extract()

print(metadata)
pprint(metadata)  # Pretty Print likewise
print(json.dumps(metadata, indent=4))  # Print metadata in JSON format

For dm4 file type

from metaxtract.instruments.EM.dm.dm4 import DM4
import json
from pprint import pprint

process_file = DM4("path/to/file")
metadata = process_file.extract()

print(metadata)
pprint(metadata)  # Pretty Print likewise
print(json.dumps(metadata, indent=4))  # Print metadata in JSON format

For ibw file type

from metaxtract.instruments.AFM.oxfordAFM.ibw import IBW
import json
from pprint import pprint

process_file = IBW("path/to/file")
metadata = process_file.extract()

print(metadata)
pprint(metadata)  # Pretty Print likewise
print(json.dumps(metadata, indent=4))  # Print metadata in JSON format

Links

Contributing

Contributions are welcome! Please see CONTRIBUTING.rst for guidelines.

License

This project is licensed under the MIT License — see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metaxtract-0.2.0.tar.gz (5.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

metaxtract-0.2.0-py3-none-any.whl (77.9 kB view details)

Uploaded Python 3

File details

Details for the file metaxtract-0.2.0.tar.gz.

File metadata

  • Download URL: metaxtract-0.2.0.tar.gz
  • Upload date:
  • Size: 5.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for metaxtract-0.2.0.tar.gz
Algorithm Hash digest
SHA256 a29a19c327b2a3283be120a752e75ec63d813c97ae3fba19dd8dae4813f8497d
MD5 507190dd0e6c202295488f3f8a2612a1
BLAKE2b-256 e4c785053d76d17432a2a64c76355ad345f6df6afb115f40172968cb5e39018b

See more details on using hashes here.

File details

Details for the file metaxtract-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: metaxtract-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 77.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for metaxtract-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a25213036fd3c1210cd78d3b8045bfb4701ec55f2dfb7eac48389f0cb5a06f23
MD5 0f6bec22de7e3686eff68fc4ef4efac1
BLAKE2b-256 bda0571dd770fc6c7883bd8cd75acc8711fb8e01510c12214563a797abd6920b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page