DataWarp is a powerful Python library that simplifies working with data files across various formats and storage locations and orchestrate data workflow for multiple backends or services.

These details have not been verified by PyPI

Project links

Project description

Data-Warp

Data-Warp is a powerful Python library that simplifies working with data files across various formats and storage locations. At its core is the FileConnector module, a universal connector designed to streamline data ingestion from multiple sources with minimal configuration.

A one-stop-shop for all data operations, connectors, orchestration, transformation, ELT, monitoring, dashboards, and reporting for data engineers.

Key Features

Multiple File Formats: Native support for CSV, JSON, Parquet, Excel, and extensible to other formats
Diverse Data Sources: Connect to files from local storage, HTTP endpoints, AWS S3, and more
Flexible Reading Engines: Choose between pandas, Python built-ins, or PyArrow for optimal performance
Efficient Data Handling: - Streaming capability for memory-efficient processing of large files - Batch processing to handle data in manageable chunks - Parallel data fetching for improved performance
Error Handling: Built-in retry mechanisms and comprehensive error reporting
User-Friendly API: Simple, consistent interface regardless of underlying data source or format

Use Cases

Data engineering pipelines requiring connection to various data sources
ETL processes working with multiple file formats
Data science workflows needing efficient data loading
Applications requiring streaming capabilities for large datasets
Cross-platform data access with consistent API

Example

# Connect to a local CSV file
connector = FileConnector(file_path="data.csv", source="local")
data = connector.fetch()

# Stream a large JSON file from S3
s3_connector = FileConnector(
    file_path="s3://bucket/large_data.json",
    file_type="json",
    source="s3",
    streaming=True
)
for chunk in s3_connector.stream(chunk_size=10000):
    process_data(chunk)

# Fetch multiple files in parallel
connector = FileConnector(file_path="data.parquet", source="local")
results = connector.fetch_parallel(["file1.parquet", "file2.parquet", "file3.parquet"]) 

# Fetch files in batch with builtin support with additional supportive methods,

# fetch_batch #built-in file format
FileConnector("huge_csv_file.csv", reader="builtin").fetch_batch()


# For json has various additional methods to deal with large files and useful for ad-hoc filter

#search
list(FileConnector(huge_json_file.json, reader="builtin").fetch_batch().search(lambda rec: rec.get("hash_id")=="6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b"))

#filter
filtered = FileConnector(huge_json_file.json, reader="builtin").fetch_batch().filter_batches(lambda rec: rec[0].get("int_field") < 8516)
print("filtered", filtered.next())

# Map batches:
mapped =  FileConnector(huge_json_file.json, reader="builtin").fetch_batch().map_batches(
    lambda batch: [rec for rec in batch if rec.get("date") < "2002-07-06"]
)
for batch in mapped:
    print("Mapped batch:", batch)

FileConnector(huge_json_file.json, reader="builtin").fetch_batch().to_dataframe().head()

Installation

Basic Installation

pip install data-warp

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.6

Mar 4, 2025

0.1.5

Mar 4, 2025

0.1.4

Mar 3, 2025

This version

0.1.2

Mar 3, 2025

0.1.1

Mar 3, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_warp-0.1.2.tar.gz (10.6 kB view details)

Uploaded Mar 3, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

data_warp-0.1.2-py3-none-any.whl (11.1 kB view details)

Uploaded Mar 3, 2025 Python 3

File details

Details for the file data_warp-0.1.2.tar.gz.

File metadata

Download URL: data_warp-0.1.2.tar.gz
Upload date: Mar 3, 2025
Size: 10.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.1 CPython/3.12.9 Linux/6.8.0-1021-azure

File hashes

Hashes for data_warp-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`e137f3ff7a0d41bd9cb7480b755789043f26532f40c768bc71b4df19a93673d7`
MD5	`e75aefcbafd2a1bf749c2f855e18c642`
BLAKE2b-256	`7c90cf5c22c8672386ed43efc0f7574b81ea0b768947e3a8ee8348e54d299cbb`

See more details on using hashes here.

File details

Details for the file data_warp-0.1.2-py3-none-any.whl.

File metadata

Download URL: data_warp-0.1.2-py3-none-any.whl
Upload date: Mar 3, 2025
Size: 11.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.1 CPython/3.12.9 Linux/6.8.0-1021-azure

File hashes

Hashes for data_warp-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a9730ab7adf2f7062c9afcd2689cf267a1ee956428c7b51a2d2ab332c9e3ed4e`
MD5	`70eaf79c3002b38fb268ab27ede21a98`
BLAKE2b-256	`a59e9c7aab56bfe226606ac5346bbb2a46ca0e404f6a02a2ba169bc88c8d4a19`

See more details on using hashes here.

data-warp 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Data-Warp

Key Features

Use Cases

Example

Installation

Basic Installation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes