Skip to main content

A Python client library for interacting with the scidx POP and create streams.

Project description

scidx Streaming

A Python library for managing streaming data using the sciDX platform and a Point of Presence. This library provides easy-to-use methods for creating, consuming, and managing Kafka streams and related resources.

Table of Contents

Installation

Ensure you have Python 3.7 or higher installed. Using a virtual environment is recommended.

Option 1: Install from GitHub

  1. Clone the repository:

    git clone https://github.com/sci-ndp/streaming-py.git
    cd streaming-py
    
  2. Create and activate a virtual environment:

    python3 -m venv .venv
    source .venv/bin/activate
    
  3. Install the package in editable mode:

    pip install -e .
    
  4. Install development dependencies (optional, for testing):

    pip install -r requirements.txt
    

Option 2: Install via pip

Once the package is published on PyPI, you can install it directly using pip:

pip install scidx-streaming

Tutorial

For a step-by-step guide on how to use the streaming library, check out our comprehensive tutorial: 10 Minutes for Streaming POP Data.

Running Tests

To run the tests, navigate to the project root and execute:

pytest

Usage examples

Below is an example showcasing how to set up the library, register a data object, create a filtered stream, and consume its data.

1. Set up the POP and Streaming libraries

This can be done by initializing the APIClient and using it to initilize the StreamingClient:

from streaming import StreamingClient
from pointofpresence import APIClient

API_URL = "http://your-api-url.com"
USERNAME = "your_username"
USERNAME = "your_password"

client = APIClient(base_url=API_URL, username=USERNAME, password=PASSWORD)
streaming = StreamingClient(client)

2. Register a data object

data_object_metadata = {
    "name": "sample_data_object",
    "type": "url",
    "url": "http://example.com/data.csv",
    "description": "Sample data object for streaming demo"
}
client.register_url(data_object_metadata)

3. Create a filtered data stream

# Define filters
filters = [
    "column_name > 100",
    "IF column_name < 50 THEN alert = 'low' ELSE alert = 'high'"
]

# Create a Kafka stream with filters
stream = await streaming.create_kafka_stream(
    keywords=["sample_data_object"],
    match_all=True,
    filter_semantics=filters
)
print(f"Stream created with topic: {stream.data_stream_id}")

4. Consuming the filtered data stream

# Consume stream data
consumer = streaming.consume_kafka_messages(stream.data_stream_id)
print(consumer.dataframe.head())

5. Cleaning up

consumer.stop()
# Delete the stream and the data object
await streaming.delete_stream(stream)
client.delete_resource_by_id(search_results[0]["id"])
print("Cleanup completed.")

Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository
  2. Create a new branch (git checkout -b feature/new-feature)
  3. Make your changes and commit (git commit -m 'Add new feature')
  4. Push to the branch (git push origin feature/new-feature)
  5. Open a Pull Reques

Contributing in PyPI

To publish the library to PyPI, follow these steps:

Ensure setup.py is correctly configured.

Build the distribution files:

python setup.py sdist bdist_wheel

Upload to PyPI using twine:

twine upload dist/*

Verify the package on PyPI:

Visit https://pypi.org/ and check your package listing.

If you need to update the library on PyPI:

  • Make your changes and update the version in setup.py.
  • Run the above steps to rebuild and upload the new version.

License

This project is licensed under the MIT License. See LICENSE.md for more details.

Contact

For any questions or suggestions, please open an issue on GitHub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scidx_streaming-0.1.4.tar.gz (33.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scidx_streaming-0.1.4-py3-none-any.whl (44.2 kB view details)

Uploaded Python 3

File details

Details for the file scidx_streaming-0.1.4.tar.gz.

File metadata

  • Download URL: scidx_streaming-0.1.4.tar.gz
  • Upload date:
  • Size: 33.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for scidx_streaming-0.1.4.tar.gz
Algorithm Hash digest
SHA256 deb0a6330f5299643cf618a40294828de9722c823c4f053ac854884a531d4640
MD5 dfa06ebb25571264160e410da2407289
BLAKE2b-256 a143a15e9f44a888055e2ee9102fd0c167aeb098b65a6e9cc5eb6197cbbe6756

See more details on using hashes here.

File details

Details for the file scidx_streaming-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for scidx_streaming-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 dbc392f469b1eb169d4a75a9c752d81e1aef11a315b54396c186d1a981a3b288
MD5 fb85efbacac96ff0eae71f8b69fc737e
BLAKE2b-256 9d45c43b8b3e810ac1a05ce2c5159d6181370d277b6a31d3fbb758bfa3913d0a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page