Data Science tools to ease access and use of data and models
Project description
Data Science Manager 👨💻
Data Science tools to ease access and use of data and models
Install
The easiest way to install scikit-learn is using pip
:
pip install dsmanager
or poetry
poetry add dsmanager
or conda
conda install dsmanager
Multiple sub dependencies are available depending on the needs:
pip install dsmanager[sharepoint] # Add Sharepoint source handling
pip install dsmanager[salesforce] # Add SalesForce source handling
pip install dsmanager[kaggle] # Add Kaggle source handling
pip install dsmanager[snowflake] # Add Snowflkae source handling
pip install dsmanager[mysql] # Add MySQL source handling
pip install dsmanager[pgsql] # Add PostgreSQL source handling
pip install dsmanager[all_sources] # All the supported sources
Usage
The DS Manager has 3 main components:
- A DataManager component
- A Controller component
- A Model component
DataManager
The DataManager allows to manage different types of data sources among which one can mention:
- File (File locally or online)
- Http (Http requests)
- Ftp (Ftp hosted files)
- Sql (Sql database tables)
- Sharepoint (Microsoft OneDrive files)
- SalesForce (SalesForce classes)
- Kaggle (Kaggle datasets)
The first step to use the DataManager is to instance it with a metadata path.
from dsmanager import DataManager
dm = DataManager("data/metadata.json")
The metadata file is generated if it does not exist and it consist of a dict of sources following this schema:
{
"SOURCE_NAME": {
"source_type": "name_of_the_source",
"args": {}
}
}
Each source has a source_type
corresponding to the name of the source. You can access this list with this command:
DataManager().datasources
Each of these data sources has its own read and write schemas because of its own parameters requierements. You can also add additional arguments which are not required with the parameter args
.
You can obtain the schemas for a specific datasource with the following commands:
source_name = "file"
DataManager().datasources[source_name].read_schema #use write_schema for the output sources.
Output:
{
"source_type": "file",
"path": "local_path | online_uri",
"file_type": "csv | excel | text | json | ...",
"encoding": "utf-8",
"args": {
"pandas_read_file_argument_keyword": "value_for_this_argument"
}
}
Development
Source code
You can check the latest sources with the command:
git clone https://gitlab.com/bigrayou/dsmanager
Testing
After installation, you can launch the test suite from outside the dsmanager directory (you will need to have pytest >= 7.1.3 installed):
pytest -v
Dependencies
The DSManager requires:
- aiohttp >=3.8.3
- cryptography 38.0.4
- dash >=2.7.1,<3.0.0
- llvmlite >=0.39.1,<0.40.0
- nest-asyncio >=1.5.6,<2.0.0
- numba >=0.56.4,<0.57.0
- numexpr >=2.8.4,<3.0.0
- numpy >=1.23.3,<2.0.0
- openpyxl >=3.0.10,<4.0.0
- optuna >=3.0.5,<4.0.0
- pandas >=1.5.0,<2.0.0
- paramiko >=2.12.0,<3.0.0
- pickle-mixin >=1.0.2,<2.0.0
- python-dotenv >=0.21.0,<0.22.0
- requests >=2.28.1,<3.0.0
- scikit-learn >=1.2.0,<2.0.0
- setuptools >=65.6.3,<66.0.0
- shap >=0.41.0,<0.42.0
- sqlalchemy >=1.4.45,<2.0.0
- sweetviz >=2.1.4,<3.0.0
- tqdm >=4.64.1,<5.0.0
Optionnaly, the DSManager could require:
- azure-common >=1.1.28,<2.0.0
- azure-storage-blob >=12.14.1,<13.0.0
- azure-storage-common >=2.1.0,<3.0.0
- kaggle >=1.5.12,<2.0.0
- mysqlclient >=2.1.1,<3.0.0
- psycopg2-binary >=2.9.5,<3.0.0
- shareplum >=0.5.1,<0.6.0
- simple-salesforce >=1.12.2,<2.0.0
- snowflake-connector-python >=2.9.0,<3.0.0
- snowflake-sqlalchemy >=1.4.4,<2.0.0
Author
👤 Rayane Amrouche
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for dsmanager-1.2.9.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f25d24abc2e527969a4b44238d3ec470140463001c1a17176f4c412944d1a529 |
|
MD5 | 874e76f818dc471d0e2f1408495b1784 |
|
BLAKE2b-256 | 8d6fe5cb6322d47f18c6d567f72bc1d6e64fc860fa845fe24e604caf358e8e9a |