Skip to main content

SAS → Parquet Hybrid Converter & Validator

Project description

sas2parquet

PyPI version Python versions License

The ultimate SAS (.sas7bdat) to Parquet converter — built to handle files that fail with standard tools.

sas2parquet automatically detects encodings, repairs schemas, infers correct data types, and performs pixel-perfect validation between SAS and Parquet outputs.


✨ Features

Feature Description
🔄 Auto Encoding Detects UTF-8, Latin1, CP1252 from metadata or fallback
🧠 Smart Types Infers datetime, numeric, string with 20+ retry strategies
Validation Chunk-by-chunk comparison (metadata, counts, values)
📊 Memory Safe Chunked processing (96GB RAM optimized, configurable)
💾 ZSTD Compression Level-6 ZSTD for efficient Parquet storage
📝 Detailed Logs Full conversion trace + mismatch reports
🎯 Two Modes Single file or recursive directory processing

🚀 Quick Start

Install

pip install sas2parquet

✅ Usage

Convert a directory (recommended)

sas2parquet path/to/sasdata/
  • Converts all .sas7bdat files recursively
  • Creates parquetdata/ and logging/ next to sasdata/

Convert a single file

sas2parquet path/to/file.sas7bdat

Output (default):

path/to/file.parquet

Specify output location

Directory mode — custom output directory

sas2parquet path/to/sasdata/ --out path/to/parquetdata/

File mode — custom output file

sas2parquet path/to/file.sas7bdat --out path/to/output.parquet

Custom log directory (directory mode)

sas2parquet path/to/sasdata/ --log-dir path/to/logs/

📁 Directory Mode Behavior

your-project/
├── sasdata/
│   ├── file1.sas7bdat
│   └── subfolder/
│       └── nested.sas7bdat
├── parquetdata/
│   ├── file1.parquet
│   └── subfolder_parquet/
│       └── nested.parquet
└── logging/
    └── conversion_20260205_1145.log

🛠️ CLI Reference

sas2parquet --help

⚙️ Configuration (Advanced)

Edit constants in:

src/sas2parquet/convert.py

📄 License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sas2parquet-1.0.0.tar.gz (8.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sas2parquet-1.0.0-py3-none-any.whl (9.7 kB view details)

Uploaded Python 3

File details

Details for the file sas2parquet-1.0.0.tar.gz.

File metadata

  • Download URL: sas2parquet-1.0.0.tar.gz
  • Upload date:
  • Size: 8.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.1 CPython/3.11.0 Darwin/25.2.0

File hashes

Hashes for sas2parquet-1.0.0.tar.gz
Algorithm Hash digest
SHA256 2e8b716e9ae0ed5c443d21a280a76f3eacc6e1c081cce7d599e9131934448e33
MD5 5fba0d7a97880d99d0a8a5396701e620
BLAKE2b-256 12ba74b0a5f7b58d961d24a3b7e1102d1b6db4a9478fb8f16989f977c1870f32

See more details on using hashes here.

File details

Details for the file sas2parquet-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: sas2parquet-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 9.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.1 CPython/3.11.0 Darwin/25.2.0

File hashes

Hashes for sas2parquet-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 99ea76bc0b7e4e46cabeb3ecb15aab061baf195c8d113860f3b3f6b85bdbd055
MD5 2aea131abd086ba2861ba97fdc8b65ad
BLAKE2b-256 96b4ccf0212e17b47b04544a0f00b7b33cb1c3905199de0c48da10121f0c801a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page