Skip to main content

Data Science Toolkit

Project description

sibi-dst

Data Science Toolkit

Data Science Toolkit built with Python, Pandas, Dask, OpenStreetMaps, Scikit-Learn, XGBOOST, Django ORM, SQLAlchemy, DjangoRestFramework, FastAPI

Major Functionality

  1. Build DataCubes, DataSets, and DataObjects from diverse data sources, including relational databases, Parquet files, Excel (.xlsx), delimited tables (.csv, .tsv), JSON, and RESTful APIs (JSON API REST).
  2. Comprehensive DataFrame Management utilities for efficient data handling, transformation, and optimization using Pandas and Dask.
  3. Flexible Data Sharing with client applications by writing to Data Warehouses, local filesystems, and cloud storage platforms such as Amazon S3, Google Cloud Storage (GCS), and Azure Blob Storage.
  4. Microservices for Data Access – Build scalable API-driven services using RESTful APIs (Django REST Framework, FastAPI) and gRPC for high-performance data exchange.

Supported Technologies

  • Data Processing: Pandas, Dask
  • Machine Learning: Scikit-Learn, XGBoost
  • Databases & Storage: SQLAlchemy, Django ORM, Parquet, Amazon S3, GCS, Azure Blob Storage
  • Mapping & Geospatial Analysis: OpenStreetMaps, OSMnx, Geopy
  • API Development: Django REST Framework, gRPC, FastAPI

Installation

pip install sibi-dst

Usage

Loading Data from SQLAlchemy

from sibi_dst.df_helper import DfHelper
from conf.transforms.fields.crm import customer_fields
from conf.credentials import replica_db_conf
from conf.storage import get_fs_instance

config = {
    'backend': 'sqlalchemy',
    'connection_url': replica_db_conf.get('db_url'),
    'table': 'crm_clientes_archivo',
    'field_map': customer_fields,
    'legacy_filters': True,
    'fs': get_fs_instance()
}

df_helper = DfHelper(**config)
result = df_helper.load(id__gte=1)

Saving Data to ClickHouse

clk_creds = {
    'host': '192.168.3.171',
    'port': 18123,
    'user': 'username',
    'database': 'xxxxxxx',
    'table': 'customer_file',
    'order_by': 'id'
}

df_helper.save_to_clickhouse(**clk_creds)

Saving Data to Parquet

df_helper.save_to_parquet(
    parquet_filename='filename.parquet',
    parquet_storage_path='/path/to/my/files/'
)

Backends Supported

Backend Description
sqlalchemy Load data from SQL databases using SQLAlchemy.
django_db Load data from Django ORM models.
parquet Load and save data from Parquet files.
http Fetch data from HTTP endpoints.
osmnx Geospatial mapping and routing using OpenStreetMap.
geopy Geolocation services for address lookup and reverse geocoding.

Geospatial Utilities

**OSMnx Helper (sibi_dst.osmnx_helper)

** Provides OpenStreetMap-based mapping utilities using osmnx and folium.

🔹 Key Features

  • BaseOsmMap: Manages interactive Folium-based maps.
  • PBFHandler: Loads .pbf (Protocolbuffer Binary Format) files for network graphs.

Example: Generating an OSM Map

from sibi_dst.osmnx_helper import BaseOsmMap
osm_map = BaseOsmMap(osmnx_graph=my_graph, df=my_dataframe)
osm_map.generate_map()

**Geopy Helper (sibi_dst.geopy_helper)

** Provides geolocation services using Geopy for forward and reverse geocoding.

🔹 Key Features

  • GeolocationService: Interfaces with Nominatim API for geocoding.
  • Error Handling: Manages GeocoderTimedOut and GeocoderServiceError gracefully.
  • Singleton Geolocator: Efficiently reuses a global geolocator instance.

Example: Reverse Geocoding

from sibi_dst.geopy_helper import GeolocationService
gs = GeolocationService()
location = gs.reverse((9.935,-84.091))
print(location)

Advanced Features

Querying with Custom Filters

Filters can be applied dynamically using Django-style syntax:

result = df_helper.load(date__gte='2023-01-01', status='active')

Parallel Processing

Leverage Dask for parallel execution:

result = df_helper.load_parallel(status='active')

Testing

To run unit tests, use:

pytest tests/

Contributing

Contributions are welcome! Please submit pull requests or open issues for discussions.

License

sibi-dst is licensed under the MIT License.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sibi_dst-0.3.52.tar.gz (121.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sibi_dst-0.3.52-py3-none-any.whl (149.9 kB view details)

Uploaded Python 3

File details

Details for the file sibi_dst-0.3.52.tar.gz.

File metadata

  • Download URL: sibi_dst-0.3.52.tar.gz
  • Upload date:
  • Size: 121.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.11.2 Darwin/24.3.0

File hashes

Hashes for sibi_dst-0.3.52.tar.gz
Algorithm Hash digest
SHA256 44d2691f5c68a9eebf710cfe24ce81e8806484a5d4f19cf429036cad52e408a7
MD5 3dd82835ce6de78fd66e38641ed90c86
BLAKE2b-256 2e959fb16ac16a63235c0ed0e04573a909aa2e3feb6953eae783b44584d4c057

See more details on using hashes here.

File details

Details for the file sibi_dst-0.3.52-py3-none-any.whl.

File metadata

  • Download URL: sibi_dst-0.3.52-py3-none-any.whl
  • Upload date:
  • Size: 149.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.11.2 Darwin/24.3.0

File hashes

Hashes for sibi_dst-0.3.52-py3-none-any.whl
Algorithm Hash digest
SHA256 827401acac4a2f7bb27ab4de4fb0e591a0579fe7b78a3334e6fcd1512b69f35e
MD5 a8f8c14f7d47c25316a6d89d23032cf3
BLAKE2b-256 7da6d6d7e69195004aedb1c03a6e4b32d598a06dc1aace4ad5b82027f0c2a5b2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page