Skip to main content

DataWarp is a powerful Python library that simplifies working with data files across various formats and storage locations and orchestrate data workflow for multiple backends or services.

Project description

codecov PyPI

Data-Warp 🌀

Data-Warp is a powerful Python library that simplifies working with data files across various formats and storage locations. At its core is the FileConnector module, a universal connector designed to streamline data ingestion from multiple sources with minimal configuration.

A one-stop-shop for all data operations, connectors, orchestration, transformation, ELT, monitoring, dashboards, and reporting for data engineers.

Simplification-2

Key Features

  • Multiple File Formats: Native support for CSV, JSON, Parquet, Excel, and extensible to other formats
  • Diverse Data Sources: Connect to files from local storage, HTTP endpoints, AWS S3, and more
  • Flexible Reading Engines: Choose between pandas, Python built-ins, or PyArrow for optimal performance
  • Efficient Data Handling: - Streaming capability for memory-efficient processing of large files - Batch processing to handle data in manageable chunks - Parallel data fetching for improved performance
  • Error Handling: Built-in retry mechanisms and comprehensive error reporting
  • User-Friendly API: Simple, consistent interface regardless of underlying data source or format

Use Cases

  • Data engineering pipelines requiring connection to various data sources
  • ETL processes working with multiple file formats
  • Data science workflows needing efficient data loading
  • Applications requiring streaming capabilities for large datasets
  • Cross-platform data access with consistent API

Example

# Connect to a local CSV file
connector = FileConnector(file_path="data.csv", source="local")
data = connector.fetch()

# Stream a large JSON file from S3
s3_connector = FileConnector(
    file_path="s3://bucket/large_data.json",
    file_type="json",
    source="s3",
    streaming=True
)
for chunk in s3_connector.stream(chunk_size=10000):
    process_data(chunk)

# Fetch multiple files in parallel
connector = FileConnector(file_path="data.parquet", source="local")
results = connector.fetch_parallel(["file1.parquet", "file2.parquet", "file3.parquet"]) 

# Fetch files in batch with builtin support with additional supportive methods,

# fetch_batch #built-in file format
FileConnector("huge_csv_file.csv", reader="builtin").fetch_batch()


# For json has various additional methods to deal with large files and useful for ad-hoc filter

#search
list(FileConnector(huge_json_file.json, reader="builtin").fetch_batch().search(lambda rec: rec.get("hash_id")=="6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b"))

#filter
filtered = FileConnector(huge_json_file.json, reader="builtin").fetch_batch().filter_batches(lambda rec: rec[0].get("int_field") < 8516)
print("filtered", filtered.next())

# Map batches:
mapped =  FileConnector(huge_json_file.json, reader="builtin").fetch_batch().map_batches(
    lambda batch: [rec for rec in batch if rec.get("date") < "2002-07-06"]
)
for batch in mapped:
    print("Mapped batch:", batch)

FileConnector(huge_json_file.json, reader="builtin").fetch_batch().to_dataframe().head()
     

Installation

Basic Installation

pip install data-warp --update

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_warp-0.1.6.tar.gz (10.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

data_warp-0.1.6-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file data_warp-0.1.6.tar.gz.

File metadata

  • Download URL: data_warp-0.1.6.tar.gz
  • Upload date:
  • Size: 10.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.12.9 Linux/6.8.0-1021-azure

File hashes

Hashes for data_warp-0.1.6.tar.gz
Algorithm Hash digest
SHA256 6aa14c15c8e2fe3dc33e7d83eac99e7547b9e9d3ff7d316e2bc70c8047fb7f3d
MD5 b75263b3996050e105326657c83ebeac
BLAKE2b-256 cb9315ef4b3bd926b6de11b2483a2db5412515c187c93f6c0f135d56c48e5d29

See more details on using hashes here.

File details

Details for the file data_warp-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: data_warp-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 11.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.12.9 Linux/6.8.0-1021-azure

File hashes

Hashes for data_warp-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 1f40d15a69be60438fca2084583605753762b74d944bbab7653b56deb346239d
MD5 a72fb0a8c325d73c489d583b5cc090b8
BLAKE2b-256 6349c7f3d46482e27ec00e42c5f6d0ee8145b1cb97ec8525b3a1e515f0f66dc8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page