Skip to main content

SAS → Parquet Hybrid Converter & Validator

Project description

sas2parquet

PyPI version Python versions License

The ultimate SAS (.sas7bdat) to Parquet converter — built to handle files that fail with standard tools.

sas2parquet automatically detects encodings, repairs schemas, infers correct data types, and performs pixel-perfect validation between SAS and Parquet outputs.


✨ Features

Feature Description
🔄 Auto Encoding Detects UTF-8, Latin1, CP1252 from metadata or fallback
🧠 Smart Types Infers datetime, numeric, string with 20+ retry strategies
Validation Chunk-by-chunk comparison (metadata, counts, values)
📊 Memory Safe Chunked processing (96GB RAM optimized, configurable)
💾 ZSTD Compression Level-6 ZSTD for efficient Parquet storage
📝 Detailed Logs Full conversion trace + mismatch reports
🎯 Two Modes Single file or recursive directory processing

🚀 Quick Start

Install

pip install sas2parquet

✅ Usage

Convert a directory (recommended)

sas2parquet path/to/sasdata/
  • Converts all .sas7bdat files recursively
  • Creates parquetdata/ and logging/ next to sasdata/

Convert a single file

sas2parquet path/to/file.sas7bdat

Output (default):

path/to/file.parquet

Specify output location

Directory mode — custom output directory

sas2parquet path/to/sasdata/ --out path/to/parquetdata/

File mode — custom output file

sas2parquet path/to/file.sas7bdat --out path/to/output.parquet

Custom log directory (directory mode)

sas2parquet path/to/sasdata/ --log-dir path/to/logs/

📁 Directory Mode Behavior

your-project/
├── sasdata/
│   ├── file1.sas7bdat
│   └── subfolder/
│       └── nested.sas7bdat
├── parquetdata/
│   ├── file1.parquet
│   └── subfolder_parquet/
│       └── nested.parquet
└── logging/
    └── conversion_20260205_1145.log

🛠️ CLI Reference

sas2parquet --help

⚙️ Configuration (Advanced)

Edit constants in:

src/sas2parquet/convert.py

📄 License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sas2parquet-1.0.6.tar.gz (8.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sas2parquet-1.0.6-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file sas2parquet-1.0.6.tar.gz.

File metadata

  • Download URL: sas2parquet-1.0.6.tar.gz
  • Upload date:
  • Size: 8.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.1 CPython/3.11.0 Darwin/25.2.0

File hashes

Hashes for sas2parquet-1.0.6.tar.gz
Algorithm Hash digest
SHA256 abd170efadc128e4719e20d62dbaf936896ac912092aef9b4d4d4291b3faf62c
MD5 8f4bd58cb27d96322c40ef621aefbe6d
BLAKE2b-256 11c9590cdc627e284a79d635fad10b2fd326251a58a9464c12cf123bab08d9c1

See more details on using hashes here.

File details

Details for the file sas2parquet-1.0.6-py3-none-any.whl.

File metadata

  • Download URL: sas2parquet-1.0.6-py3-none-any.whl
  • Upload date:
  • Size: 9.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.1 CPython/3.11.0 Darwin/25.2.0

File hashes

Hashes for sas2parquet-1.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 17df6711a708d639f436bc741f977c5d570f2ef3820062d3cd3916a6423e5f31
MD5 4440eeed8fc3f992f538af59ba1a2cd5
BLAKE2b-256 a0563c9a2f0b245adb4f47bc39060f4df8136b708ad4bd4e89dab14876566f76

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page