Skip to main content

High-performance PHP serialize/unserialize parser written in Rust with Python bindings

Project description

phpserialize-rs

Crates.io PyPI License CI

High-performance PHP serialize/unserialize parser written in Rust with Python bindings.

Features

  • Zero-copy parsing - Minimal memory allocations for maximum performance
  • Full PHP serialize support - All types including objects, references, and PHP 8.1 enums
  • UTF-8 aware - Proper handling of multi-byte characters (Korean, Chinese, etc.)
  • Auto-unescape - Automatic detection and handling of DB-escaped strings
  • Auto-fallback - Automatic recovery from encoding mismatches (e.g., EUC-KR to UTF-8)
  • Error recovery - Configurable error handling for malformed data
  • PySpark integration - Ready-to-use Arrow-optimized UDFs for Databricks/Spark workloads

Installation

Python

pip install phpserialize-rs

Rust

[dependencies]
php-deserialize-core = "0.1"

Quick Start

Python

from php_deserialize import loads, loads_json

# Basic usage
data = b'a:2:{s:4:"name";s:5:"Alice";s:3:"age";i:30;}'
result = loads(data)
print(result)  # {'name': 'Alice', 'age': 30}

# Direct JSON conversion (optimized for Databricks)
json_str = loads_json(data)
print(json_str)  # {"name":"Alice","age":30}

# Handle DB-escaped strings automatically
escaped = b'"a:1:{s:4:""key"";s:5:""value"";}"'
result = loads(escaped)  # Auto-unescapes
print(result)  # {'key': 'value'}

# Auto-fallback for encoding mismatches (no option needed!)
# Handles data serialized with EUC-KR but stored as UTF-8
mismatch = b's:4:"\xed\x95\x9c\xea\xb8\x80";'  # "한글" with wrong length
result = loads(mismatch)  # Automatically recovers
print(result)  # '한글'

# Strict mode (disable auto-fallback)
result = loads(data, strict=True)  # Fails on length mismatch

# Error handling options
result = loads(data, errors="replace")  # Replace invalid UTF-8
result = loads(data, errors="bytes")    # Return bytes for invalid UTF-8

PySpark / Databricks

from php_deserialize.spark import php_to_json
from pyspark.sql.functions import get_json_object

# Convert PHP serialize to JSON (Arrow-optimized UDF)
df = spark.table("bronze.my_table")
df = df.withColumn("data_json", php_to_json("serialized_column"))

# Extract fields from JSON
df = df.withColumn("name", get_json_object("data_json", "$.name"))
df = df.withColumn("age", get_json_object("data_json", "$.age"))

df.display()

For Databricks installation:

%pip install phpserialize-rs

Rust

use php_deserialize_core::{from_bytes, PhpValue};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let data = br#"a:2:{s:4:"name";s:5:"Alice";s:3:"age";i:30;}"#;
    let value = from_bytes(data)?;

    if let PhpValue::Array(items) = value {
        for (key, val) in items {
            println!("{:?} => {:?}", key, val);
        }
    }

    Ok(())
}

Supported PHP Types

Type PHP Format Example
Null N; N;
Boolean b:0; / b:1; b:1;
Integer i:<value>; i:42;
Float d:<value>; d:3.14;
String s:<len>:"<data>"; s:5:"hello";
Array a:<count>:{...} a:1:{i:0;s:3:"foo";}
Object O:<len>:"<class>":<count>:{...} Object with properties
Reference R:<index>; / r:<index>; Circular references
Enum (PHP 8.1+) E:<len>:"<Class:Case>"; E:10:"Status:Active";

Performance

Benchmarked on Apple M1 Pro:

Operation Throughput
Simple array ~1.5 GB/s
Nested structure ~800 MB/s
Large string ~2.0 GB/s

Compared to php2json (Python):

  • 10-50x faster for typical workloads
  • 100x faster for large arrays

Error Handling

The library provides detailed error messages for debugging:

from php_deserialize import loads, PhpDeserializeError

try:
    loads(b"invalid data")
except PhpDeserializeError as e:
    print(f"Parse error at position {e.position}: {e.message}")

DB Escape Handling

When data is exported from databases (MySQL, PostgreSQL), strings may be double-quoted and escaped:

Original: a:1:{s:4:"key";s:5:"value";}
DB Export: "a:1:{s:4:""key"";s:5:""value"";}"

The library automatically detects and handles this format:

# Both work identically
loads(b'a:1:{s:4:"key";s:5:"value";}')
loads(b'"a:1:{s:4:""key"";s:5:""value"";}"')

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

Licensed under either of:

at your option.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phpserialize_rs-0.1.2.tar.gz (36.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

phpserialize_rs-0.1.2-cp311-cp311-win_amd64.whl (154.3 kB view details)

Uploaded CPython 3.11Windows x86-64

phpserialize_rs-0.1.2-cp311-cp311-macosx_11_0_arm64.whl (247.7 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

phpserialize_rs-0.1.2-cp311-cp311-macosx_10_12_x86_64.whl (255.8 kB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

phpserialize_rs-0.1.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (266.3 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file phpserialize_rs-0.1.2.tar.gz.

File metadata

  • Download URL: phpserialize_rs-0.1.2.tar.gz
  • Upload date:
  • Size: 36.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for phpserialize_rs-0.1.2.tar.gz
Algorithm Hash digest
SHA256 24421e0cd6403c565410010aefd43de064698121c91a52c5a6f9fb62c4745ed7
MD5 fa188bb075d84fb78829dcc18a4a2e02
BLAKE2b-256 9a64d357c117bb7f64333fe46bc495890e0db03f268e93a8827c713005ec1201

See more details on using hashes here.

Provenance

The following attestation bundles were made for phpserialize_rs-0.1.2.tar.gz:

Publisher: release.yml on sokojh/phpserialize-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file phpserialize_rs-0.1.2-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for phpserialize_rs-0.1.2-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 3af41ff90083647b5762659b275bd029c6861e9163ecbfe7957df8e3b8a7637b
MD5 f101dfc77d467c63a8db48293c01a012
BLAKE2b-256 abb7efda319fabfbc995c1852615895598b8e2786833ebbfe49b2c99470b9bef

See more details on using hashes here.

Provenance

The following attestation bundles were made for phpserialize_rs-0.1.2-cp311-cp311-win_amd64.whl:

Publisher: release.yml on sokojh/phpserialize-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file phpserialize_rs-0.1.2-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for phpserialize_rs-0.1.2-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 43ddb52e71d6fe0e9f04ea30cf37638c5beebf84efc4160d01e10e6188765726
MD5 bb9766c379516a88a05eb8ab7e717c93
BLAKE2b-256 dd708e07fc9cca3a5afa13750c7bb50e48ba35d11adaf642330054ca67dbff30

See more details on using hashes here.

Provenance

The following attestation bundles were made for phpserialize_rs-0.1.2-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: release.yml on sokojh/phpserialize-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file phpserialize_rs-0.1.2-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for phpserialize_rs-0.1.2-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 3dec7e9aa5830c76a9ff7738b4e6cda02edf388757eae3e90a06e0c0d6ebe9fd
MD5 949634eb3c23ab3fa113889abb4e9f0c
BLAKE2b-256 eda283f4fa98ef18aff4bea4debcb6b8d31f637feefb8cefd6caa720e3ed4647

See more details on using hashes here.

Provenance

The following attestation bundles were made for phpserialize_rs-0.1.2-cp311-cp311-macosx_10_12_x86_64.whl:

Publisher: release.yml on sokojh/phpserialize-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file phpserialize_rs-0.1.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for phpserialize_rs-0.1.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 81e6955c0802bfb09214515d32d03f5eb712a07aafc5cf3b4edbd14a896a86c5
MD5 4d9c148cb3ad666ebc403a74beb7f978
BLAKE2b-256 930612cc171facad63f76888304b8ec85da91e873ed786e88d6ca83a46a129f0

See more details on using hashes here.

Provenance

The following attestation bundles were made for phpserialize_rs-0.1.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on sokojh/phpserialize-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page