Apache Spark data source for ASN.1-encoded files (BER, DER, PER, XER) — schema-driven, no code generation
Project description
pyspark-asn1
PySpark integration for spark-asn1 — a schema-driven Apache Spark data source for reading ASN.1-encoded files (BER, DER, Aligned PER, Unaligned PER, XER) without any code-generation step.
Installation
pip install pyspark-asn1
Quick start
import pyspark_asn1
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
# Register the spark-asn1 JAR with the active session
pyspark_asn1.register(spark)
# Read ASN.1-encoded files exactly like any other Spark format
df = (spark.read
.format("asn1")
.option("asn1.schema", "/path/to/schema.asn1")
.option("asn1.type", "MyMessage")
.option("asn1.encoding", "ber") # ber | der | per-aligned | per-unaligned | xer
.load("/data/messages/*.ber"))
df.printSchema()
df.show()
Options
| Option | Default | Description |
|---|---|---|
asn1.schema |
required | Path(s) to .asn1 schema files (comma-separated) |
asn1.type |
required | Root ASN.1 type name to decode |
asn1.encoding |
ber |
ber, der, per-aligned, per-unaligned, xer |
asn1.per.framing |
length-prefixed |
PER framing: length-prefixed, fixed-length, hex-lines |
asn1.per.record.bytes |
— | Record size for fixed-length PER framing |
asn1.choice.tag.field |
_tag |
Discriminator field name for CHOICE types |
asn1.enumerated.as.int |
false |
Return ENUMERATED as integer instead of name |
CHOICE types
ASN.1 CHOICE maps to a struct with a _tag discriminator plus one nullable field per alternative:
# CHOICE { circle Circle, rectangle Rectangle }
# → schema: _tag STRING, circle STRUCT<…>, rectangle STRUCT<…>
from pyspark.sql.functions import col, when
df.filter(col("_tag") == "circle").select(col("circle.*")).show()
df.select(
when(col("_tag") == "circle", col("circle.radius"))
.when(col("_tag") == "rectangle", col("rectangle.width"))
.alias("dimension")
).show()
Parallel reads (BER/DER)
Pre-scan a file once to enable parallel Spark tasks:
# Run once — writes a sidecar .asn1idx file
from py4j.java_gateway import java_import
java_import(spark._jvm, "io.github.sparkasn1.spark.asn1.util.Asn1Indexer")
java_import(spark._jvm, "org.apache.hadoop.fs.Path")
Asn1Indexer = spark._jvm.io.github.sparkasn1.spark.asn1.util.Asn1Indexer
path = spark._jvm.org.apache.hadoop.fs.Path("/data/messages.ber")
Asn1Indexer.buildIndex(path, spark._jsc.hadoopConfiguration())
# Subsequent reads are fully parallel
df = spark.read.format("asn1").option("asn1.schema", "schema.asn1") \
.option("asn1.type", "MyMessage").load("/data/messages.ber")
Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyspark_asn1-0.5.1.tar.gz.
File metadata
- Download URL: pyspark_asn1-0.5.1.tar.gz
- Upload date:
- Size: 12.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8ab8bef3c7bd9dc05d2d2719866c7aec0a0c01cca06ad393dcf5a8c2031bf564
|
|
| MD5 |
f46c0a9df01e329b4f1dd8be786b935c
|
|
| BLAKE2b-256 |
c1594e4bc47225a7f74d770067b5558793685dd99b01d4b6c7de06aa4a260583
|
File details
Details for the file pyspark_asn1-0.5.1-py3-none-any.whl.
File metadata
- Download URL: pyspark_asn1-0.5.1-py3-none-any.whl
- Upload date:
- Size: 12.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
98cae549c1b8c61ff241c3de34d3c7a4d509f17de597863cd45763d2a1606710
|
|
| MD5 |
ad5b31a6065832f1e807ae992666b8bf
|
|
| BLAKE2b-256 |
4ddeb93226e89ea36f071bc6dfdd4a8d9164f0209fe18cf3313a1dda5f06680a
|