High-performance PHP serialize/unserialize parser written in Rust with Python bindings
Project description
phpserialize-rs
High-performance PHP serialize/unserialize parser written in Rust with Python bindings.
Features
- Zero-copy parsing - Minimal memory allocations for maximum performance
- Full PHP serialize support - All types including objects, references, and PHP 8.1 enums
- UTF-8 aware - Proper handling of multi-byte characters (Korean, Chinese, etc.)
- Auto-unescape - Automatic detection and handling of DB-escaped strings
- Auto-fallback - Automatic recovery from encoding mismatches (e.g., EUC-KR to UTF-8)
- Error recovery - Configurable error handling for malformed data
- PySpark integration - Ready-to-use Arrow-optimized UDFs for Databricks/Spark workloads
Installation
Python
pip install phpserialize-rs
Rust
[dependencies]
php-deserialize-core = "0.1"
Quick Start
Python
from php_deserialize import loads, loads_json
# Basic usage
data = b'a:2:{s:4:"name";s:5:"Alice";s:3:"age";i:30;}'
result = loads(data)
print(result) # {'name': 'Alice', 'age': 30}
# Direct JSON conversion (optimized for Databricks)
json_str = loads_json(data)
print(json_str) # {"name":"Alice","age":30}
# Handle DB-escaped strings automatically
escaped = b'"a:1:{s:4:""key"";s:5:""value"";}"'
result = loads(escaped) # Auto-unescapes
print(result) # {'key': 'value'}
# Auto-fallback for encoding mismatches (no option needed!)
# Handles data serialized with EUC-KR but stored as UTF-8
mismatch = b's:4:"\xed\x95\x9c\xea\xb8\x80";' # "한글" with wrong length
result = loads(mismatch) # Automatically recovers
print(result) # '한글'
# Strict mode (disable auto-fallback)
result = loads(data, strict=True) # Fails on length mismatch
# Error handling options
result = loads(data, errors="replace") # Replace invalid UTF-8
result = loads(data, errors="bytes") # Return bytes for invalid UTF-8
PySpark / Databricks
from php_deserialize.spark import php_to_json
from pyspark.sql.functions import get_json_object
# Convert PHP serialize to JSON (Arrow-optimized UDF)
df = spark.table("bronze.my_table")
df = df.withColumn("data_json", php_to_json("serialized_column"))
# Extract fields from JSON
df = df.withColumn("name", get_json_object("data_json", "$.name"))
df = df.withColumn("age", get_json_object("data_json", "$.age"))
df.display()
For Databricks installation:
%pip install phpserialize-rs
Rust
use php_deserialize_core::{from_bytes, PhpValue};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let data = br#"a:2:{s:4:"name";s:5:"Alice";s:3:"age";i:30;}"#;
let value = from_bytes(data)?;
if let PhpValue::Array(items) = value {
for (key, val) in items {
println!("{:?} => {:?}", key, val);
}
}
Ok(())
}
Supported PHP Types
| Type | PHP Format | Example |
|---|---|---|
| Null | N; |
N; |
| Boolean | b:0; / b:1; |
b:1; |
| Integer | i:<value>; |
i:42; |
| Float | d:<value>; |
d:3.14; |
| String | s:<len>:"<data>"; |
s:5:"hello"; |
| Array | a:<count>:{...} |
a:1:{i:0;s:3:"foo";} |
| Object | O:<len>:"<class>":<count>:{...} |
Object with properties |
| Reference | R:<index>; / r:<index>; |
Circular references |
| Enum (PHP 8.1+) | E:<len>:"<Class:Case>"; |
E:10:"Status:Active"; |
Performance
Benchmarked on Apple M1 Pro:
| Operation | Throughput |
|---|---|
| Simple array | ~1.5 GB/s |
| Nested structure | ~800 MB/s |
| Large string | ~2.0 GB/s |
Compared to php2json (Python):
- 10-50x faster for typical workloads
- 100x faster for large arrays
Error Handling
The library provides detailed error messages for debugging:
from php_deserialize import loads, PhpDeserializeError
try:
loads(b"invalid data")
except PhpDeserializeError as e:
print(f"Parse error at position {e.position}: {e.message}")
DB Escape Handling
When data is exported from databases (MySQL, PostgreSQL), strings may be double-quoted and escaped:
Original: a:1:{s:4:"key";s:5:"value";}
DB Export: "a:1:{s:4:""key"";s:5:""value"";}"
The library automatically detects and handles this format:
# Both work identically
loads(b'a:1:{s:4:"key";s:5:"value";}')
loads(b'"a:1:{s:4:""key"";s:5:""value"";}"')
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE or http://opensource.org/licenses/MIT)
at your option.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file phpserialize_rs-0.1.2.tar.gz.
File metadata
- Download URL: phpserialize_rs-0.1.2.tar.gz
- Upload date:
- Size: 36.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
24421e0cd6403c565410010aefd43de064698121c91a52c5a6f9fb62c4745ed7
|
|
| MD5 |
fa188bb075d84fb78829dcc18a4a2e02
|
|
| BLAKE2b-256 |
9a64d357c117bb7f64333fe46bc495890e0db03f268e93a8827c713005ec1201
|
Provenance
The following attestation bundles were made for phpserialize_rs-0.1.2.tar.gz:
Publisher:
release.yml on sokojh/phpserialize-rs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
phpserialize_rs-0.1.2.tar.gz -
Subject digest:
24421e0cd6403c565410010aefd43de064698121c91a52c5a6f9fb62c4745ed7 - Sigstore transparency entry: 910394374
- Sigstore integration time:
-
Permalink:
sokojh/phpserialize-rs@e80b55a6b6f2283358328ca7c1673191c8f0e721 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/sokojh
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e80b55a6b6f2283358328ca7c1673191c8f0e721 -
Trigger Event:
push
-
Statement type:
File details
Details for the file phpserialize_rs-0.1.2-cp311-cp311-win_amd64.whl.
File metadata
- Download URL: phpserialize_rs-0.1.2-cp311-cp311-win_amd64.whl
- Upload date:
- Size: 154.3 kB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3af41ff90083647b5762659b275bd029c6861e9163ecbfe7957df8e3b8a7637b
|
|
| MD5 |
f101dfc77d467c63a8db48293c01a012
|
|
| BLAKE2b-256 |
abb7efda319fabfbc995c1852615895598b8e2786833ebbfe49b2c99470b9bef
|
Provenance
The following attestation bundles were made for phpserialize_rs-0.1.2-cp311-cp311-win_amd64.whl:
Publisher:
release.yml on sokojh/phpserialize-rs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
phpserialize_rs-0.1.2-cp311-cp311-win_amd64.whl -
Subject digest:
3af41ff90083647b5762659b275bd029c6861e9163ecbfe7957df8e3b8a7637b - Sigstore transparency entry: 910394414
- Sigstore integration time:
-
Permalink:
sokojh/phpserialize-rs@e80b55a6b6f2283358328ca7c1673191c8f0e721 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/sokojh
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e80b55a6b6f2283358328ca7c1673191c8f0e721 -
Trigger Event:
push
-
Statement type:
File details
Details for the file phpserialize_rs-0.1.2-cp311-cp311-macosx_11_0_arm64.whl.
File metadata
- Download URL: phpserialize_rs-0.1.2-cp311-cp311-macosx_11_0_arm64.whl
- Upload date:
- Size: 247.7 kB
- Tags: CPython 3.11, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
43ddb52e71d6fe0e9f04ea30cf37638c5beebf84efc4160d01e10e6188765726
|
|
| MD5 |
bb9766c379516a88a05eb8ab7e717c93
|
|
| BLAKE2b-256 |
dd708e07fc9cca3a5afa13750c7bb50e48ba35d11adaf642330054ca67dbff30
|
Provenance
The following attestation bundles were made for phpserialize_rs-0.1.2-cp311-cp311-macosx_11_0_arm64.whl:
Publisher:
release.yml on sokojh/phpserialize-rs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
phpserialize_rs-0.1.2-cp311-cp311-macosx_11_0_arm64.whl -
Subject digest:
43ddb52e71d6fe0e9f04ea30cf37638c5beebf84efc4160d01e10e6188765726 - Sigstore transparency entry: 910394434
- Sigstore integration time:
-
Permalink:
sokojh/phpserialize-rs@e80b55a6b6f2283358328ca7c1673191c8f0e721 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/sokojh
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e80b55a6b6f2283358328ca7c1673191c8f0e721 -
Trigger Event:
push
-
Statement type:
File details
Details for the file phpserialize_rs-0.1.2-cp311-cp311-macosx_10_12_x86_64.whl.
File metadata
- Download URL: phpserialize_rs-0.1.2-cp311-cp311-macosx_10_12_x86_64.whl
- Upload date:
- Size: 255.8 kB
- Tags: CPython 3.11, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3dec7e9aa5830c76a9ff7738b4e6cda02edf388757eae3e90a06e0c0d6ebe9fd
|
|
| MD5 |
949634eb3c23ab3fa113889abb4e9f0c
|
|
| BLAKE2b-256 |
eda283f4fa98ef18aff4bea4debcb6b8d31f637feefb8cefd6caa720e3ed4647
|
Provenance
The following attestation bundles were made for phpserialize_rs-0.1.2-cp311-cp311-macosx_10_12_x86_64.whl:
Publisher:
release.yml on sokojh/phpserialize-rs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
phpserialize_rs-0.1.2-cp311-cp311-macosx_10_12_x86_64.whl -
Subject digest:
3dec7e9aa5830c76a9ff7738b4e6cda02edf388757eae3e90a06e0c0d6ebe9fd - Sigstore transparency entry: 910394453
- Sigstore integration time:
-
Permalink:
sokojh/phpserialize-rs@e80b55a6b6f2283358328ca7c1673191c8f0e721 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/sokojh
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e80b55a6b6f2283358328ca7c1673191c8f0e721 -
Trigger Event:
push
-
Statement type:
File details
Details for the file phpserialize_rs-0.1.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: phpserialize_rs-0.1.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 266.3 kB
- Tags: CPython 3.8, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
81e6955c0802bfb09214515d32d03f5eb712a07aafc5cf3b4edbd14a896a86c5
|
|
| MD5 |
4d9c148cb3ad666ebc403a74beb7f978
|
|
| BLAKE2b-256 |
930612cc171facad63f76888304b8ec85da91e873ed786e88d6ca83a46a129f0
|
Provenance
The following attestation bundles were made for phpserialize_rs-0.1.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
release.yml on sokojh/phpserialize-rs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
phpserialize_rs-0.1.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
81e6955c0802bfb09214515d32d03f5eb712a07aafc5cf3b4edbd14a896a86c5 - Sigstore transparency entry: 910394392
- Sigstore integration time:
-
Permalink:
sokojh/phpserialize-rs@e80b55a6b6f2283358328ca7c1673191c8f0e721 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/sokojh
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e80b55a6b6f2283358328ca7c1673191c8f0e721 -
Trigger Event:
push
-
Statement type: