Metadata extractor and converter for scientific instrument file formats (AFM, EM, X-ray).
Project description
MetaXtract
Overview
MetaXtract is a Python package designed to extract metadata from various scientific file formats including:
.h5(HDF5).xrdml(XRDML XML-based).dm4(DigitalMicrograph).ibw(Igor Binary Wave)
Installation
From PyPI
# Basic installation (metadata extraction only)
pip install metaxtract
# With converters (xarray + netCDF4)
pip install "metaxtract[converters]"
# With every optional feature (converters, utils, checksums, viz)
pip install "metaxtract[all]"
From source
git clone https://gitlab.com/dataerai/MetaXtract.git
cd MetaXtract
pip install -e ".[dev]" # editable install + tests, docs, lint, build tools
Optional Dependencies
- xarray (
[converters]): Required for file conversion to netCDF format - requests (
[utils]): Required for thedownload_file()utility function - blake3 (
[checksums]): Optional faster checksum algorithm (falls back to sha256 if not available) - plotly (
[viz]): Required for interactive visualizations (metadata and data visualization)
Documentation
Full documentation is available at: https://dataerai.gitlab.io/MetaXtract/
Usage
Using the High-Level API
from metaxtract import extract_metadata, get_supported_formats
# Get list of supported formats
formats = get_supported_formats()
print(formats)
# Extract metadata from a file
metadata = extract_metadata("path/to/file.h5")
print(metadata)
Data Type Detection
from metaxtract import extract_metadata, detect_data_type
# Extract metadata
metadata = extract_metadata("file.ibw")
# Automatically detect data type
result = detect_data_type(metadata)
if result:
print(f"Data Type: {result['type']}")
print(f"Confidence: {result['confidence']:.2%}")
Visualization
from metaxtract import convert_file
from metaxtract.viz import visualize_data, visualize_metadata_tree
# Visualize metadata
metadata = extract_metadata("file.ibw")
fig = visualize_metadata_tree(metadata)
fig.show()
# Visualize data (automatically detects type and routes to appropriate visualization)
dataset = convert_file("file.ibw")
fig = visualize_data(dataset)
fig.show()
Utility Functions
from metaxtract.utils import download_file, benchmark_checksums
# Download a file
file_path = download_file("https://example.com/data.ibw")
# Benchmark checksum algorithms
results = benchmark_checksums("data.ibw", algorithms=['sha256', 'md5', 'blake3'])
Running the extractor
For h5 File type
from metaxtract.instruments.AFM.bandexcitation.h5 import H5
import json
from pprint import pprint
process_file = H5("path/to/file")
metadata = process_file.extract()
print(metadata)
pprint(metadata) # Pretty Print likewise
print(json.dumps(metadata, indent=4)) # Print metadata in JSON format
For xrdml file type
from metaxtract.instruments.Xray.panalytical.xrdml import XRDML
import json
from pprint import pprint
process_file = XRDML("path/to/file")
metadata = process_file.extract()
print(metadata)
pprint(metadata) # Pretty Print likewise
print(json.dumps(metadata, indent=4)) # Print metadata in JSON format
For dm4 file type
from metaxtract.instruments.EM.dm.dm4 import DM4
import json
from pprint import pprint
process_file = DM4("path/to/file")
metadata = process_file.extract()
print(metadata)
pprint(metadata) # Pretty Print likewise
print(json.dumps(metadata, indent=4)) # Print metadata in JSON format
For ibw file type
from metaxtract.instruments.AFM.oxfordAFM.ibw import IBW
import json
from pprint import pprint
process_file = IBW("path/to/file")
metadata = process_file.extract()
print(metadata)
pprint(metadata) # Pretty Print likewise
print(json.dumps(metadata, indent=4)) # Print metadata in JSON format
Links
- Source Code: https://gitlab.com/dataerai/MetaXtract
- Documentation: https://dataerai.gitlab.io/MetaXtract/
- Package Registry: https://gitlab.com/dataerai/MetaXtract/-/packages
- Issues: https://gitlab.com/dataerai/MetaXtract/-/issues
Contributing
Contributions are welcome! Please see CONTRIBUTING.rst for guidelines.
License
This project is licensed under the MIT License — see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file metaxtract-0.1.3.tar.gz.
File metadata
- Download URL: metaxtract-0.1.3.tar.gz
- Upload date:
- Size: 5.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f498d901bcdc25931c7f34f6b0152d1dab5b9c03a9b1b7a55599e72405e2d985
|
|
| MD5 |
8c76b30be5a5954c615a507fda09ad1c
|
|
| BLAKE2b-256 |
215fcd1dbef03987e045733c1507e04d4538067681d760f735268360c8f7c64e
|
File details
Details for the file metaxtract-0.1.3-py3-none-any.whl.
File metadata
- Download URL: metaxtract-0.1.3-py3-none-any.whl
- Upload date:
- Size: 72.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
73a9e8ead436e08838e01c9b8e5106ababcf36a7104b1ec1c7a3b7103aee2a57
|
|
| MD5 |
aafa64dece5e5e892ce7b346d6610d56
|
|
| BLAKE2b-256 |
109b302af0e24b8182c2390849285c4b1fe7c92f431fd3a629b4fdb17bd07dbd
|