Data Science tools to ease access and use of data and models
Project description
Data Science Manager 👨💻
Data Science tools to ease access and use of data and models
Install
The easiest way to install scikit-learn is using pip
:
pip install dsmanager
or poetry
poetry add dsmanager
or conda
conda install dsmanager
Multiple sub dependencies are available depending on the needs:
pip install dsmanager[sharepoint] # Add Sharepoint source handling
pip install dsmanager[salesforce] # Add SalesForce source handling
pip install dsmanager[kaggle] # Add Kaggle source handling
pip install dsmanager[snowflake] # Add Snowflkae source handling
pip install dsmanager[mysql] # Add MySQL source handling
pip install dsmanager[pgsql] # Add PostgreSQL source handling
Usage
The DS Manager has 3 main components:
- A DataManager component
- A Controller component
- A Model component
DataManager
The DataManager allows to manage different types of data sources among which one can mention:
- File (File locally or online)
- Http (Http requests)
- Ftp (Ftp hosted files)
- Sql (Sql database tables)
- Sharepoint (Microsoft OneDrive files)
- SalesForce (SalesForce classes)
- Kaggle (Kaggle datasets)
The first step to use the DataManager is to instance it with a metadata path.
from dsmanager import DataManager
dm = DataManager("data/metadata.json")
The metadata file is generated if it does not exist and it consist of a dict of sources following this schema:
{
"SOURCE_NAME": {
"source_type": "name_of_the_source",
"args": {}
}
}
Each source has a source_type
corresponding to the name of the source. You can access this list with this command:
DataManager().datasources
Each of these data sources has its own schema because of its own parameters requierements. You can also add additional arguments which are not required with the parameter args
.
You can obtain the schema for a specific datasource with the following command:
source_name = "file"
DataManager().datasources[source_name].schema
Output:
{
"source_type": "file",
"path": "local_path | online_uri",
"file_type": "csv | excel | text | json | ...",
"encoding": "utf-8",
"args": {
"pandas_read_file_argument_keyword": "value_for_this_argument"
}
}
Development
Source code
You can check the latest sources with the command:
git clone https://gitlab.com/bigrayou/dsmanager
Testing
After installation, you can launch the test suite from outside the dsmanager directory (you will need to have pytest >= 7.1.3 installed):
pytest -v
Dependencies
The DSManager requires:
- dash >=2.7.1,<3.0.0
- explainerdashboard >=0.4.0,<0.5.0
- llvmlite >=0.39.1,<0.40.0
- numba >=0.56.4,<0.57.0
- numexpr >=2.8.4,<3.0.0
- numpy >=1.23.3,<2.0.0
- openpyxl >=3.0.10,<4.0.0
- optuna >=3.0.5,<4.0.0
- pandas >=1.5.0,<2.0.0
- paramiko >=2.12.0,<3.0.0
- pickle-mixin >=1.0.2,<2.0.0
- python-dotenv >=0.21.0,<0.22.0
- requests >=2.28.1,<3.0.0
- scikit-learn >=1.2.0,<2.0.0
- setuptools >=65.6.3,<66.0.0
- sqlalchemy >=1.4.45,<2.0.0
- sweetviz >=2.1.4,<3.0.0
- tqdm >=4.64.1,<5.0.0
Optionnaly, the DSManager could require:
- azure-common >=1.1.28,<2.0.0
- azure-storage-blob >=12.14.1,<13.0.0
- azure-storage-common >=2.1.0,<3.0.0
- kaggle >=1.5.12,<2.0.0
- mysqlclient >=2.1.1,<3.0.0
- psycopg2-binary >=2.9.5,<3.0.0
- shareplum >=0.5.1,<0.6.0
- simple-salesforce >=1.12.2,<2.0.0
- snowflake-sqlalchemy >=1.4.4,<2.0.0
Author
👤 Rayane Amrouche
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for dsmanager-1.1.8.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 02918a101a78c95fa76a702eaf0f265dc36b172c297e9d058df7a801fe5e855d |
|
MD5 | 13be0a3af6e39654008ea1c2e39d3a72 |
|
BLAKE2b-256 | 83a0e9d72431c6831f090163293581509e05b6b473aec91f4033c3b7fb888c6c |