Skip to main content

Data Science tools to ease access and use of data and models

Project description

Data Science Manager 👨‍💻

Version Documentation License: Adel Rayane Amrouche

Data Science tools to ease access and use of data and models

Install

The easiest way to install scikit-learn is using pip:

pip install dsmanager

or poetry

poetry add dsmanager

or conda

conda install dsmanager

Multiple sub dependencies are available depending on the needs:

pip install dsmanager[sharepoint] # Add Sharepoint source handling
pip install dsmanager[salesforce] # Add SalesForce source handling
pip install dsmanager[kaggle] # Add Kaggle source handling
pip install dsmanager[snowflake] # Add Snowflkae source handling
pip install dsmanager[mysql] # Add MySQL source handling
pip install dsmanager[pgsql] # Add PostgreSQL source handling
pip install dsmanager[all_sources] # All the supported sources

Usage

The DS Manager has 3 main components:

  • A DataManager component
  • A Controller component
  • A Model component

DataManager

The DataManager allows to manage different types of data sources among which one can mention:

  • File (File locally or online)
  • Http (Http requests)
  • Ftp (Ftp hosted files)
  • Sql (Sql database tables)
  • Sharepoint (Microsoft OneDrive files)
  • SalesForce (SalesForce classes)
  • Kaggle (Kaggle datasets)

The first step to use the DataManager is to instance it with a metadata path.

from dsmanager import DataManager
dm = DataManager("data/metadata.json")

The metadata file is generated if it does not exist and it consist of a dict of sources following this schema:

{
  "SOURCE_NAME": {
    "source_type": "name_of_the_source",
    "args": {}
  }
}

Each source has a source_type corresponding to the name of the source. You can access this list with this command:

DataManager().datasources

Each of these data sources has its own read and write schemas because of its own parameters requierements. You can also add additional arguments which are not required with the parameter args.

You can obtain the schemas for a specific datasource with the following commands:

source_name = "file"
DataManager().datasources[source_name].read_schema #use write_schema for the output sources.

Output:

{
    "source_type": "file",
    "path": "local_path | online_uri",
    "file_type": "csv | excel | text | json | ...",
    "encoding": "utf-8",
    "args": {
        "pandas_read_file_argument_keyword": "value_for_this_argument"
    }
}

Development

Source code

You can check the latest sources with the command:

git clone https://gitlab.com/bigrayou/dsmanager

Testing

After installation, you can launch the test suite from outside the dsmanager directory (you will need to have pytest >= 7.1.3 installed):

pytest -v

Dependencies

The DSManager requires:

  • aiohttp >=3.8.3
  • cryptography 38.0.4
  • dash >=2.7.1,<3.0.0
  • llvmlite >=0.39.1,<0.40.0
  • nest-asyncio >=1.5.6,<2.0.0
  • numba >=0.56.4,<0.57.0
  • numexpr >=2.8.4,<3.0.0
  • numpy >=1.23.3,<2.0.0
  • openpyxl >=3.0.10,<4.0.0
  • optuna >=3.0.5,<4.0.0
  • pandas >=1.5.0,<2.0.0
  • paramiko >=2.12.0,<3.0.0
  • pickle-mixin >=1.0.2,<2.0.0
  • python-dotenv >=0.21.0,<0.22.0
  • requests >=2.28.1,<3.0.0
  • scikit-learn >=1.2.0,<2.0.0
  • setuptools >=65.6.3,<66.0.0
  • shap >=0.41.0,<0.42.0
  • sqlalchemy >=1.4.45,<2.0.0
  • sweetviz >=2.1.4,<3.0.0
  • tqdm >=4.64.1,<5.0.0

Optionnaly, the DSManager could require:

  • azure-common >=1.1.28,<2.0.0
  • azure-storage-blob >=12.14.1,<13.0.0
  • azure-storage-common >=2.1.0,<3.0.0
  • kaggle >=1.5.12,<2.0.0
  • mysqlclient >=2.1.1,<3.0.0
  • psycopg2-binary >=2.9.5,<3.0.0
  • shareplum >=0.5.1,<0.6.0
  • simple-salesforce >=1.12.2,<2.0.0
  • snowflake-connector-python >=2.9.0,<3.0.0
  • snowflake-sqlalchemy >=1.4.4,<2.0.0

Author

👤 Rayane Amrouche

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dsmanager-1.2.9.1.tar.gz (44.4 kB view hashes)

Uploaded source

Built Distribution

dsmanager-1.2.9.1-py3-none-any.whl (56.5 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page