Skip to main content

High-performance PHP serialize/unserialize parser written in Rust with Python bindings

Project description

phpserialize-rs

Crates.io PyPI License CI

High-performance PHP serialize/unserialize parser written in Rust with Python bindings.

Features

  • Zero-copy parsing - Minimal memory allocations for maximum performance
  • Full PHP serialize support - All types including objects, references, and PHP 8.1 enums
  • UTF-8 aware - Proper handling of multi-byte characters (Korean, Chinese, etc.)
  • Auto-unescape - Automatic detection and handling of DB-escaped strings
  • Auto-fallback - Automatic recovery from encoding mismatches (e.g., EUC-KR to UTF-8)
  • Error recovery - Configurable error handling for malformed data
  • PySpark integration - Ready-to-use Arrow-optimized UDFs for Databricks/Spark workloads

Installation

Python

pip install phpserialize-rs

Rust

[dependencies]
php-deserialize-core = "0.1"

Quick Start

Python

from php_deserialize import loads, loads_json

# Basic usage
data = b'a:2:{s:4:"name";s:5:"Alice";s:3:"age";i:30;}'
result = loads(data)
print(result)  # {'name': 'Alice', 'age': 30}

# Direct JSON conversion (optimized for Databricks)
json_str = loads_json(data)
print(json_str)  # {"name":"Alice","age":30}

# Handle DB-escaped strings automatically
escaped = b'"a:1:{s:4:""key"";s:5:""value"";}"'
result = loads(escaped)  # Auto-unescapes
print(result)  # {'key': 'value'}

# Auto-fallback for encoding mismatches (no option needed!)
# Handles data serialized with EUC-KR but stored as UTF-8
mismatch = b's:4:"\xed\x95\x9c\xea\xb8\x80";'  # "한글" with wrong length
result = loads(mismatch)  # Automatically recovers
print(result)  # '한글'

# Strict mode (disable auto-fallback)
result = loads(data, strict=True)  # Fails on length mismatch

# Error handling options
result = loads(data, errors="replace")  # Replace invalid UTF-8
result = loads(data, errors="bytes")    # Return bytes for invalid UTF-8

PySpark / Databricks

from php_deserialize.spark import php_to_json
from pyspark.sql.functions import get_json_object

# Convert PHP serialize to JSON (Arrow-optimized UDF)
df = spark.table("bronze.my_table")
df = df.withColumn("data_json", php_to_json("serialized_column"))

# Extract fields from JSON
df = df.withColumn("name", get_json_object("data_json", "$.name"))
df = df.withColumn("age", get_json_object("data_json", "$.age"))

df.display()

For Databricks installation:

%pip install phpserialize-rs

Rust

use php_deserialize_core::{from_bytes, PhpValue};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let data = br#"a:2:{s:4:"name";s:5:"Alice";s:3:"age";i:30;}"#;
    let value = from_bytes(data)?;

    if let PhpValue::Array(items) = value {
        for (key, val) in items {
            println!("{:?} => {:?}", key, val);
        }
    }

    Ok(())
}

Supported PHP Types

Type PHP Format Example
Null N; N;
Boolean b:0; / b:1; b:1;
Integer i:<value>; i:42;
Float d:<value>; d:3.14;
String s:<len>:"<data>"; s:5:"hello";
Array a:<count>:{...} a:1:{i:0;s:3:"foo";}
Object O:<len>:"<class>":<count>:{...} Object with properties
Reference R:<index>; / r:<index>; Circular references
Enum (PHP 8.1+) E:<len>:"<Class:Case>"; E:10:"Status:Active";

Performance

Benchmarked on Apple M1 Pro:

Operation Throughput
Simple array ~1.5 GB/s
Nested structure ~800 MB/s
Large string ~2.0 GB/s

Compared to php2json (Python):

  • 10-50x faster for typical workloads
  • 100x faster for large arrays

Error Handling

The library provides detailed error messages for debugging:

from php_deserialize import loads, PhpDeserializeError

try:
    loads(b"invalid data")
except PhpDeserializeError as e:
    print(f"Parse error at position {e.position}: {e.message}")

DB Escape Handling

When data is exported from databases (MySQL, PostgreSQL), strings may be double-quoted and escaped:

Original: a:1:{s:4:"key";s:5:"value";}
DB Export: "a:1:{s:4:""key"";s:5:""value"";}"

The library automatically detects and handles this format:

# Both work identically
loads(b'a:1:{s:4:"key";s:5:"value";}')
loads(b'"a:1:{s:4:""key"";s:5:""value"";}"')

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

Licensed under either of:

at your option.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

phpserialize_rs-0.1.1-cp311-cp311-win_amd64.whl (154.6 kB view details)

Uploaded CPython 3.11Windows x86-64

phpserialize_rs-0.1.1-cp311-cp311-macosx_11_0_arm64.whl (248.0 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

phpserialize_rs-0.1.1-cp311-cp311-macosx_10_12_x86_64.whl (256.1 kB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

phpserialize_rs-0.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (266.7 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file phpserialize_rs-0.1.1-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for phpserialize_rs-0.1.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 3cb77c6c35c65252ed3d1219123da54350b5f4becce76df4fa1ffaba28f17e62
MD5 4d55d523889c0f143c462c1d232b4865
BLAKE2b-256 9098a18b4008f47c686d2a30b37f00c63bc7c93c2cba1d985fdd930b1d68b337

See more details on using hashes here.

Provenance

The following attestation bundles were made for phpserialize_rs-0.1.1-cp311-cp311-win_amd64.whl:

Publisher: release.yml on sokojh/phpserialize-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file phpserialize_rs-0.1.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for phpserialize_rs-0.1.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 996d7991f84ffd26d2a196f971cbd85cc97a206b4f07ef69a96990e55db7d43d
MD5 c23a60241079a23e5548cbdb3d91f379
BLAKE2b-256 c266c0e7d219373e2deb6572d740ca448cecf693ef421f74defa1361b4a147ef

See more details on using hashes here.

Provenance

The following attestation bundles were made for phpserialize_rs-0.1.1-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: release.yml on sokojh/phpserialize-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file phpserialize_rs-0.1.1-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for phpserialize_rs-0.1.1-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 ae1655c7920cd7f321883073b5bc60aa7f526b2e56382a0a9cfd318278085d5c
MD5 51ead304dbfa525a59e6f564fa98b92c
BLAKE2b-256 3eba479d7fc79e46fd1efb03e5f183b4bdbd03a6a7ab9cee4be6ca25168b7155

See more details on using hashes here.

Provenance

The following attestation bundles were made for phpserialize_rs-0.1.1-cp311-cp311-macosx_10_12_x86_64.whl:

Publisher: release.yml on sokojh/phpserialize-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file phpserialize_rs-0.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for phpserialize_rs-0.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 40e69f771338f56a9e9603d4a5b3a1cf4bb39543bb4909bf7ec04deccb254246
MD5 f5c2f76fbb5eed23018a2e2cc27efbf5
BLAKE2b-256 6deef7b0b6ae1b746ed0de6bdbd53d1b01410216fe4366c37a0deb359e3aa54f

See more details on using hashes here.

Provenance

The following attestation bundles were made for phpserialize_rs-0.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on sokojh/phpserialize-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page