Skip to main content

Add a short description here!

Project description

Project generated with PyScaffold Python Pipeline status

stateful-data-processor

stateful-data-processor is a utility designed to handle large amounts of data incrementally. It allows you to process data step-by-step, saving progress to avoid data loss in case of interruptions or errors. The processor can be subclassed to implement custom data processing logic.

Features

  • Incrementally process large datasets.

  • Save the processing state to a file.

  • Resume the processing state and skip already processed items automatically

  • Handle SIGINT and SIGTERM signals for graceful shutdown and state saving.

  • Easily subclass to implement custom data processing.

Problem

You have a large amount of data that you want to loop through and process incrementally. Processing takes time, and in case an error occurs, you do not want to lose all the progress. You want to save the data to a file and be able to continue processing from where you left off. You also want to be able to interrupt the processing with a SIGINT signal and save the data to the file. You want to be able to subclass the processor and implement the process_data and process_item methods. You want to be able to iterate through items and process them one by one.

Solution

StatefulDataProcessor class to process data incrementally:

  • Incremental Processing: Process large amounts of data in a JSON file incrementally.

  • Data Storage: The data is stored in a dictionary, and the processor keeps track of the current step being processed.

  • Graceful Interruption: The processor can be interrupted with a SIGINT or SIGTERM signal, and the data will be saved to the file.

  • Subclassing: The processor is meant to be subclassed, and the process_item method should be implemented.

  • Item Processing: The process_item is being called with all arguments forwarded from run, except for items, which is unpacked and iterated item by item.

  • Unique Labels: The data is be stored in a dictionary using unique labels corresponding to items. Thus, each item must be unique.

  • Customization: The process_data method can be overridden for more customized processing of the items.

Usage

import time
from stateful_data_processor.file_rw import FileRW
from stateful_data_processor.processor import StatefulDataProcessor

class MyDataProcessor(StatefulDataProcessor):

 def process_item(self, item, iteration_index: int, delay: float):
     ''' item and iteration_index are automatically supplied by the framework.
      iteration_index may or may not be used.
     '''
     self.data[item] = item ** 2  # Example processing: square the item
     time.sleep(delay)

# Example usage
file_rw = FileRW('data.json')
processor = MyDataProcessor(file_rw)

items_to_process = [1, 2, 3, 4, 5]
processor.run(items=items_to_process, delay=1.5)

The processor will handle SIGINT and SIGTERM signals to save the current state before exiting. Run your processor, and use Ctrl+C to send a SIGINT signal. When you run again, the processing will pick up from where you left off. A logger is automatically created if you do not inject it into the constructor.

Example usage in a large project:

alphaspread analysis of nasdaq symbols

filter ranging stocks

xtb to yfinance symbol conversion

Installation

You can install stateful-data-processor using pip:

pip install stateful-data-processor

Releasing

git tag x.y
tox
tox -e docs
tox -e build
tox -e publish -- --repository pypi --verbose

Note

This project has been set up using PyScaffold 4.5. For details and usage information on PyScaffold see https://pyscaffold.org/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stateful_data_processor-3.1.tar.gz (25.2 kB view details)

Uploaded Source

Built Distribution

stateful_data_processor-3.1-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file stateful_data_processor-3.1.tar.gz.

File metadata

  • Download URL: stateful_data_processor-3.1.tar.gz
  • Upload date:
  • Size: 25.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.13

File hashes

Hashes for stateful_data_processor-3.1.tar.gz
Algorithm Hash digest
SHA256 a7e2b16360fd13da297f235f24b0adc0d11d7eada6012fe2d1ebc44b0d728246
MD5 0919b6aff2a361d609dd1cbea44714eb
BLAKE2b-256 53480f9a001566d7ea4d94204cc5aee3f93d26bc1fbd954d7f0c56570f4bf9e7

See more details on using hashes here.

File details

Details for the file stateful_data_processor-3.1-py3-none-any.whl.

File metadata

File hashes

Hashes for stateful_data_processor-3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4ab62bb51cea51bba364c2571494421444a65f2f0cb96c98d83804e659982f06
MD5 93cd69f8f35b0e15c9c79b5ef2f29399
BLAKE2b-256 ee31e90521f297c4fd0344314cac96fc629ba6390b2fc2be4afbf1674e36bae9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page