Skip to main content

SAS → Parquet Hybrid Converter & Validator

Project description

sas2parquet

PyPI version Python versions License

The ultimate SAS (.sas7bdat) to Parquet converter — built to handle files that fail with standard tools.

sas2parquet automatically detects encodings, repairs schemas, infers correct data types, and performs pixel-perfect validation between SAS and Parquet outputs.


✨ Features

Feature Description
🔄 Auto Encoding Detects UTF-8, Latin1, CP1252 from metadata or fallback
🧠 Smart Types Infers datetime, numeric, string with 20+ retry strategies
Validation Chunk-by-chunk comparison (metadata, counts, values)
📊 Memory Safe Chunked processing (96GB RAM optimized, configurable)
💾 ZSTD Compression Level-6 ZSTD for efficient Parquet storage
📝 Detailed Logs Full conversion trace + mismatch reports
🎯 Two Modes Single file or recursive directory processing

🚀 Quick Start

Install

pip install sas2parquet

✅ Usage

Convert a directory (recommended)

sas2parquet path/to/sasdata/
  • Converts all .sas7bdat files recursively
  • Creates parquetdata/ and logging/ next to sasdata/

Convert a single file

sas2parquet path/to/file.sas7bdat

Output (default):

path/to/file.parquet

Specify output location

Directory mode — custom output directory

sas2parquet path/to/sasdata/ --out path/to/parquetdata/

File mode — custom output file

sas2parquet path/to/file.sas7bdat --out path/to/output.parquet

Custom log directory (directory mode)

sas2parquet path/to/sasdata/ --log-dir path/to/logs/

📁 Directory Mode Behavior

your-project/
├── sasdata/
│   ├── file1.sas7bdat
│   └── subfolder/
│       └── nested.sas7bdat
├── parquetdata/
│   ├── file1.parquet
│   └── subfolder_parquet/
│       └── nested.parquet
└── logging/
    └── conversion_20260205_1145.log

🛠️ CLI Reference

sas2parquet --help

⚙️ Configuration (Advanced)

Edit constants in:

src/sas2parquet/convert.py

👥 Authors

This package was built collaboratively.

  • Zaman Ziabakhshganji — creator and maintainer
  • Anete Kristiana Zonnenberga — co-author and contributor
  • Farshad Radman — co-author and contributor
  • Jos van Dongen — co-author and contributor

📄 License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sas2parquet-1.0.9.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sas2parquet-1.0.9-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file sas2parquet-1.0.9.tar.gz.

File metadata

  • Download URL: sas2parquet-1.0.9.tar.gz
  • Upload date:
  • Size: 8.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.1 CPython/3.11.0 Darwin/25.2.0

File hashes

Hashes for sas2parquet-1.0.9.tar.gz
Algorithm Hash digest
SHA256 d0f56d26c37ad9a1002e3f9eb155bde62df47f11d941eb005f521116036c5549
MD5 b562addc47e89e19020457d8127d1608
BLAKE2b-256 0892af2f1c62621be3f249fc04056643b05155a7e720d5428fdadb6b1d93e28a

See more details on using hashes here.

File details

Details for the file sas2parquet-1.0.9-py3-none-any.whl.

File metadata

  • Download URL: sas2parquet-1.0.9-py3-none-any.whl
  • Upload date:
  • Size: 10.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.1 CPython/3.11.0 Darwin/25.2.0

File hashes

Hashes for sas2parquet-1.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 a2f3bf82e008780ca9f7c5a82433be5fe54f2401de1a58963d11d70e1272be6b
MD5 6a793e19dc87cd3e0a72e59a04ff4f09
BLAKE2b-256 b9ee410639dea89b5c0a1f87c8dd5cb658c572d50852f9fed2b473dbac57a9da

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page