Data Science Toolkit
Project description
sibi-dst
Data Science Toolkit
Data Science Toolkit built with Python, Pandas, Dask, OpenStreetMaps, Scikit-Learn, XGBOOST, Django ORM, SQLAlchemy, DjangoRestFramework, FastAPI
Major Functionality
- Build DataCubes, DataSets, and DataObjects from diverse data sources, including relational databases, Parquet files, Excel (
.xlsx), delimited tables (.csv,.tsv), JSON, and RESTful APIs (JSON API REST). - Comprehensive DataFrame Management utilities for efficient data handling, transformation, and optimization using Pandas and Dask.
- Flexible Data Sharing with client applications by writing to Data Warehouses, local filesystems, and cloud storage platforms such as Amazon S3, Google Cloud Storage (GCS), and Azure Blob Storage.
- Microservices for Data Access – Build scalable API-driven services using RESTful APIs (
Django REST Framework,FastAPI) and gRPC for high-performance data exchange.
Supported Technologies
- Data Processing: Pandas, Dask
- Machine Learning: Scikit-Learn, XGBoost
- Databases & Storage: SQLAlchemy, Django ORM, Parquet, Amazon S3, GCS, Azure Blob Storage
- Mapping & Geospatial Analysis: OpenStreetMaps, OSMnx, Geopy
- API Development: Django REST Framework, gRPC
Installation
pip install sibi-dst
Usage
Loading Data from SQLAlchemy
from sibi_dst.df_helper import DfHelper
from conf.transforms.fields.crm import customer_fields
from conf.credentials import replica_db_conf
from conf.storage import get_fs_instance
config = {
'backend': 'sqlalchemy',
'connection_url': replica_db_conf.get('db_url'),
'table': 'crm_clientes_archivo',
'field_map': customer_fields,
'legacy_filters': True,
'fs': get_fs_instance()
}
df_helper = DfHelper(**config)
result = df_helper.load(id__gte=1)
Saving Data to ClickHouse
clk_creds = {
'host': '192.168.3.171',
'port': 18123,
'user': 'username',
'database': 'xxxxxxx',
'table': 'customer_file',
'order_by': 'id'
}
df_helper.save_to_clickhouse(**clk_creds)
Saving Data to Parquet
df_helper.save_to_parquet(
parquet_filename='filename.parquet',
parquet_storage_path='/path/to/my/files/'
)
Backends Supported
| Backend | Description |
|---|---|
sqlalchemy |
Load data from SQL databases using SQLAlchemy. |
django_db |
Load data from Django ORM models. |
parquet |
Load and save data from Parquet files. |
http |
Fetch data from HTTP endpoints. |
osmnx |
Geospatial mapping and routing using OpenStreetMap. |
geopy |
Geolocation services for address lookup and reverse geocoding. |
Geospatial Utilities
**OSMnx Helper (sibi_dst.osmnx_helper)
**
Provides OpenStreetMap-based mapping utilities using osmnx and folium.
🔹 Key Features
- BaseOsmMap: Manages interactive Folium-based maps.
- PBFHandler: Loads
.pbf(Protocolbuffer Binary Format) files for network graphs.
Example: Generating an OSM Map
from sibi_dst.osmnx_helper import BaseOsmMap
osm_map = BaseOsmMap(osmnx_graph=my_graph, df=my_dataframe)
osm_map.generate_map()
**Geopy Helper (sibi_dst.geopy_helper)
**
Provides geolocation services using Geopy for forward and reverse geocoding.
🔹 Key Features
- GeolocationService: Interfaces with
NominatimAPI for geocoding. - Error Handling: Manages
GeocoderTimedOutandGeocoderServiceErrorgracefully. - Singleton Geolocator: Efficiently reuses a global geolocator instance.
Example: Reverse Geocoding
from sibi_dst.geopy_helper import GeolocationService
gs = GeolocationService()
location = gs.reverse_geocode(lat=9.935, lon=-84.091)
print(location)
Advanced Features
Querying with Custom Filters
Filters can be applied dynamically using Django-style syntax:
result = df_helper.load(date__gte='2023-01-01', status='active')
Parallel Processing
Leverage Dask for parallel execution:
results = df_helper.load_parallel(status='active')
Testing
To run unit tests, use:
pytest tests/
Contributing
Contributions are welcome! Please submit pull requests or open issues for discussions.
License
sibi-dst is licensed under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sibi_dst-0.3.46.tar.gz.
File metadata
- Download URL: sibi_dst-0.3.46.tar.gz
- Upload date:
- Size: 119.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.5 CPython/3.11.2 Darwin/24.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
807b1fc071f0bdc17d686042befb3cff41ab7091dc08d49d076eec2669656cb3
|
|
| MD5 |
3dac0334b0ebd03f51a22ff4c9e831ed
|
|
| BLAKE2b-256 |
b75e225baf52af70f6a9b389459146cba2d07d76d9c30b19b21381a8facd212b
|
File details
Details for the file sibi_dst-0.3.46-py3-none-any.whl.
File metadata
- Download URL: sibi_dst-0.3.46-py3-none-any.whl
- Upload date:
- Size: 148.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.5 CPython/3.11.2 Darwin/24.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
753f0abd403ace0a39301169b58646d380cd69572392acdd92f4652cdb3b125f
|
|
| MD5 |
8e921b668ad7de292b4452c16b58c72c
|
|
| BLAKE2b-256 |
1f0d7a3922f9a589c9490d6fc3f245b3f18522705f2a69cc365540e46ed07383
|