Skip to main content

Data Science tools to ease access and use of data and models

Project description

DS Template 👨‍💻

Version Documentation License: Adel Rayane Amrouche

Template to Kickstart Data Science projects that follows Sklearn standards and brings usefull tools to handle data and models quickly and easily

Install

pip install poetry
poetry install

Run tests

pytest src/
pylint src/
mypy src/

Documentation

pdoc src/

Usage

The DS Template has 3 main components:

  • A DataManager component
  • A Controller component
  • A Model component

The latter depends on the projects type and can be let empty if no model is needed.

That lets us with both the controller and the data manager. On the one hand, the controller is supposed to be the center of the project and interact with all the components to get the output asked by the user. On the other hand, the data manager is working around a json metadata file. This metafile follows a specific schema for both input sources and output sources.

Input Sources

Sources can take 2 inputs regardless of the type of source and regardless if it is an intput or an output source:

  • A type that can be either csv, excel or sql.
  • A list of args that will be executed by pandas to get access to the source and prepare correctly the dataset.

Three kind of input sources are handled :

  • The local files that only take a path to the source file
  • The sharepoint files that take:
    • The sharepoint address and the sites where the source is
    • The folder where the file source is
    • The file source name
  • The Sql sources that take:
    • An username and a password environment variables names
    • An address of the database and dialect of the sql type of database
    • A query which is not mandatory since sql sources can be handled as a database instead of a simple dataset. You will be then able to query this database like you would do using sqlalchemy

Samples are available in the file data/metadata.json

Two kind of output sourcess are handled :

  • The local files that only take a path to the source file
  • The Sql sources that take:
    • An username and a password environment variables names
    • An address of the database and dialect of the sql type of database
    • A table_name where the dataset will be added
    • You also might need to include the schema and the parameter if_exists (either "replace" or append" regarding the way you want to update the table) in the list of args

Author

👤 Rayane Amrouche

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dsmanager-1.0.11.3.tar.gz (32.8 kB view hashes)

Uploaded Source

Built Distribution

dsmanager-1.0.11.3-py3-none-any.whl (42.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page