Tools for VOParquet table data and metadata handling
Project description
VOparquet
VOparquet is a Python package for working with Virtual Observatory (VO) metadata and Parquet-based tabular data. It enables you to generate, manipulate, and read VO-compliant metadata for astronomical data stored in efficient Apache Parquet formats.
📦 Installation
pip install VOparquet
🚀 Core Functionality
VOparquet’s core revolves around two components:
-
A
pandas.DataFrameholding the actual tabular data. -
An
astropy.io.votable.tree.VOTableFilerepresenting the VO metadata.
This design gives users maximum flexibility: you can construct any valid VOTable metadata object (as defined in the VOParquet specification). By leveraging astropy, you also gain access to a rich ecosystem of VOTable utilities.
🛠 Building Metadata from Scratch
Using astropy lets you fully leverage the VOTable format’s flexibility—defining custom FIELDs, PARAMs, INFO elements, units, UCDs, and more.
For example, reproduce the metadata-only table from the VOParquet documentation:
Example
from astropy.io.votable.tree import VOTableFile, Resource, Table, Field, Param
from astropy.io.votable import writeto
# Create VOTable structure
votable = VOTableFile(version="1.4")
resource = Resource()
votable.resources.append(resource)
# Create TABLE element
table = Table(votable)
table.name = "MessierObjects"
table.description = "Nebulae and clusters"
# Add PARAM
param = Param(votable, name="author", datatype="char", arraysize="*", value="Charles Messier")
table.params.append(param)
# Add FIELDs
field_id = Field(votable, name="ID", datatype="long")
field_id.description = "Source identifier"
field_ra = Field(votable, name="RA", datatype="double", unit="deg", ucd="pos.eq.ra")
field_ra.description = "ICRS Right Ascension"
field_dec = Field(votable, name="DEC", datatype="double", unit="deg", ucd="pos.eq.dec")
field_dec.description = "ICRS Declination"
table.fields.extend([field_id, field_ra, field_dec])
# Add table to resource
resource.tables.append(table)
# (Optional) Save to file
writeto(votable, "messier_metadata.vot")
You can then build the Parquet file using VOParquetTable:
from vo_parquet.vo_parquet_table import VOParquetTable
import pandas as pd
df = pd.DataFrame({
"ID": [1, 2, 3],
"RA": [10.684, 83.822, 201.365], # in degrees
"DEC": [41.269, -5.391, -47.479] # in degrees
})
vp = VOParquetTable(df, votable)
🛠 Helper Functions
When you only need basic metadata, manually building the full astropy structure can be verbose. The metadata module offers two helpers:
-
get_names_and_datatypes(df): Creates a DataFrame withNameandDatatypecolumns from your data. -
ParquetMetaVO: A class to build or parse VOTable metadata more succinctly.
Creating from an existing VOTable
from vo_parquet.metadata import ParquetMetaVO
vpt = ParquetMetaVO.from_votable(votable)
Building from scratch
from vo_parquet.metadata import get_names_and_datatypes, ParquetMetaVO
# Generate fields DataFrame
field_df = get_names_and_datatypes(df)
field_df["description"] = ["Source identifier", "ICRS Right Ascension", "ICRS Declination"]
field_df["unit"] = ["", "deg", "deg"]
field_df["ucd"] = ["", "pos.eq.ra", "pos.eq.dec"]
# Define PARAMs (and optionally INFO)
params = [{"name": "author", "datatype": "char", "value": "Charles Messier"}]
vpt = ParquetMetaVO(field_df, params, description="Nebulae and clusters")
# Convert to VOTableFile and integrate
vo_table = vpt.to_votable()
vp = VOParquetTable(df, vo_table)
This approach is more compact and leverages DataFrame operations for customization. You can also include INFO metadata by passing a list of info dictionaries to ParquetMetaVO.
📖 Reading Parquet + VO Metadata
Load any Parquet file; if VO metadata is present, it’s parsed automatically:
from vo_parquet.vo_parquet_table import VOParquetTable
vp = VOParquetTable.from_parquet("test.parquet")
data = vp.data # pandas DataFrame
meta = vp.meta_data # astropy VOTableFile (or None)
💾 Writing Parquet + VO Metadata
vp.write_to_parquet("test.parquet")
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file voparquet-0.1.0.tar.gz.
File metadata
- Download URL: voparquet-0.1.0.tar.gz
- Upload date:
- Size: 8.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd18cfb057eb7535a11ea0b777fb87dd2d90d88b119d89b414bc58d5f599842f
|
|
| MD5 |
bd5f7e8a19b748c3dbfc5ea6b2a94f93
|
|
| BLAKE2b-256 |
2b95b46000df4f4ca9c8f86c63816f317a2e6fb6f9869e9641145ff72ba022be
|
File details
Details for the file voparquet-0.1.0-py3-none-any.whl.
File metadata
- Download URL: voparquet-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b067841824f7a9dd660205aef5d86fcf8fa7b3bf0a55351e884db594138b0543
|
|
| MD5 |
a65de69e17ed5a6036546b714dab5443
|
|
| BLAKE2b-256 |
45756695251280c9c69d80ee7f857b30c098b7dee78f17d4d88eade5e17b8a7b
|