Skip to main content

Metadata extractor and converter for scientific instrument file formats (AFM, EM, X-ray).

Project description

MetaXtract

PyPI Python License Pipeline Coverage Docs Ruff

Overview

MetaXtract is a Python package designed to extract metadata from various scientific file formats including:

  • .h5 (HDF5)
  • .xrdml (XRDML XML-based)
  • .dm4 (DigitalMicrograph)
  • .ibw (Igor Binary Wave)

Installation

From PyPI

# Basic installation (metadata extraction only)
pip install metaxtract

# With converters (xarray + netCDF4)
pip install "metaxtract[converters]"

# With every optional feature (converters, utils, checksums, viz)
pip install "metaxtract[all]"

From source

git clone https://gitlab.com/dataerai/MetaXtract.git
cd MetaXtract
pip install -e ".[dev]"     # editable install + tests, docs, lint, build tools

Optional Dependencies

  • xarray ([converters]): Required for file conversion to netCDF format
  • requests ([utils]): Required for the download_file() utility function
  • blake3 ([checksums]): Optional faster checksum algorithm (falls back to sha256 if not available)
  • plotly ([viz]): Required for interactive visualizations (metadata and data visualization)

Documentation

Full documentation is available at: https://dataerai.gitlab.io/MetaXtract/

Usage

Using the High-Level API

from metaxtract import extract_metadata, get_supported_formats

# Get list of supported formats
formats = get_supported_formats()
print(formats)

# Extract metadata from a file
metadata = extract_metadata("path/to/file.h5")
print(metadata)

Data Type Detection

from metaxtract import extract_metadata, detect_data_type

# Extract metadata
metadata = extract_metadata("file.ibw")

# Automatically detect data type
result = detect_data_type(metadata)
if result:
    print(f"Data Type: {result['type']}")
    print(f"Confidence: {result['confidence']:.2%}")

Visualization

from metaxtract import convert_file
from metaxtract.viz import visualize_data, visualize_metadata_tree

# Visualize metadata
metadata = extract_metadata("file.ibw")
fig = visualize_metadata_tree(metadata)
fig.show()

# Visualize data (automatically detects type and routes to appropriate visualization)
dataset = convert_file("file.ibw")
fig = visualize_data(dataset)
fig.show()

Utility Functions

from metaxtract.utils import download_file, benchmark_checksums

# Download a file
file_path = download_file("https://example.com/data.ibw")

# Benchmark checksum algorithms
results = benchmark_checksums("data.ibw", algorithms=['sha256', 'md5', 'blake3'])

Running the extractor

For h5 File type

from metaxtract.instruments.AFM.bandexcitation.h5 import H5
import json
from pprint import pprint

process_file = H5("path/to/file")
metadata = process_file.extract()

print(metadata)
pprint(metadata)  # Pretty Print likewise
print(json.dumps(metadata, indent=4))  # Print metadata in JSON format

For xrdml file type

from metaxtract.instruments.Xray.panalytical.xrdml import XRDML
import json
from pprint import pprint

process_file = XRDML("path/to/file")
metadata = process_file.extract()

print(metadata)
pprint(metadata)  # Pretty Print likewise
print(json.dumps(metadata, indent=4))  # Print metadata in JSON format

For dm4 file type

from metaxtract.instruments.EM.dm.dm4 import DM4
import json
from pprint import pprint

process_file = DM4("path/to/file")
metadata = process_file.extract()

print(metadata)
pprint(metadata)  # Pretty Print likewise
print(json.dumps(metadata, indent=4))  # Print metadata in JSON format

For ibw file type

from metaxtract.instruments.AFM.oxfordAFM.ibw import IBW
import json
from pprint import pprint

process_file = IBW("path/to/file")
metadata = process_file.extract()

print(metadata)
pprint(metadata)  # Pretty Print likewise
print(json.dumps(metadata, indent=4))  # Print metadata in JSON format

Links

Contributing

Contributions are welcome! Please see CONTRIBUTING.rst for guidelines.

License

This project is licensed under the MIT License — see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metaxtract-0.1.3.tar.gz (5.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

metaxtract-0.1.3-py3-none-any.whl (72.2 kB view details)

Uploaded Python 3

File details

Details for the file metaxtract-0.1.3.tar.gz.

File metadata

  • Download URL: metaxtract-0.1.3.tar.gz
  • Upload date:
  • Size: 5.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for metaxtract-0.1.3.tar.gz
Algorithm Hash digest
SHA256 f498d901bcdc25931c7f34f6b0152d1dab5b9c03a9b1b7a55599e72405e2d985
MD5 8c76b30be5a5954c615a507fda09ad1c
BLAKE2b-256 215fcd1dbef03987e045733c1507e04d4538067681d760f735268360c8f7c64e

See more details on using hashes here.

File details

Details for the file metaxtract-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: metaxtract-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 72.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for metaxtract-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 73a9e8ead436e08838e01c9b8e5106ababcf36a7104b1ec1c7a3b7103aee2a57
MD5 aafa64dece5e5e892ce7b346d6610d56
BLAKE2b-256 109b302af0e24b8182c2390849285c4b1fe7c92f431fd3a629b4fdb17bd07dbd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page