Skip to main content

Sermos - Machine Learning for the Real World

Project description

Sermos

Note: This represents Sermos' developer README. Public documentation generated using Sphinx and located at Sermos.

Quickstart

  1. Add sermos as a dependency to your Python application

  2. Install extras depending on what you are building:

  3. flask - Convenient interface for Flask applications

  4. flask_auth - Utilize Sermos' authentication system (Sermos Cloud only)

  5. web - Some standard web server dependencies we like

  6. workers - Installs Celery and networkx, which are required if using Sermos Cloud for your deployed pipelines and scheduled tasks

  7. deploy - Only required for deploying to Sermos Cloud

Overview

Sermos provides a set of tools and design roadmap for developers to quickly and effectively integrate Data Science into real-world applications. The core design decisions and recommended technologies are based on nearly a decade of putting data science to work in demanding applications such as real-time motorsports strategy at Pit Rho, custom implementations across industries as diverse as energy, finance, and healthcare at Rho AI, and climate impact assesment tools at CRANE. With that said, this is built by developers and data scientists and strives to strike an appropriate balance between opinionated decisions and personal choice.

This is open source software and we look forward to seeing what you can build on top of Sermos. For those looking for quick, scalable, enterprise-grade deployments, make sure to check out Sermos Cloud, which is purpose-built for running complex, scalable, highly available Machine Learning (ML) workloads, including Natural Language Processing (NLP), Computer Vision (CV), Decision Modeling, Internet of Things (IoT), and more.

A standard Sermos application comprises two key libraries:

  1. Sermos
  • This is the base package with optional scaffolding, architecture components (e.g. DAG implementation on top of Celery), and some useful utilities.
  1. Sermos Tools
  • Tool catalog for use in Sermos (or other!) Machine Learning applications.

Sermos

  • Celery Configuration
  • Pipelines
  • APIs
  • Utilities

Sermos Tools

  • NLP Tools
    • Document Classification
    • Document Similarity Searching
    • Date Finding
    • etc.
  • IoT Tols
  • General Tools
    • Slack Notifications

Your Application

aka "Sermos Client" or "Client Codebase"

This is where all of your code lives and only has a few requirements:

  1. It is a base application written in Python (additional language support slated for the future).
  2. Scheduled tasks and Pipeline nodes must be Python Methods that accept at least one positional argument: event
  3. [Optional] A sermos.yaml file, which is a configuration file if you choose to deploy using Sermos Cloud

Celery [optional]

Sermos provides sensical default configurations for the use of Celery. The default deployment uses RabbitMQ, and is recommended. This library can be implemented in any other workflow (e.g. Kafka) as desired.

There are two core aspects of Celery that Sermos handles and differ from a standard Celery deployment.

ChainedTask

In celery.py when imported it will configure Celery and also run GenerateCeleryTasks().generate(), which will use the sermos.yaml config to turn customer methods into decorated Celery tasks.

Part of this process includes adding ChainedTask as the base for all of these dynamically generated tasks.

ChainedTask is a Celery Task that injects tools and event into the signature of all dynamically generated tasks.

SermosScheduler

We allow users to set new scheduled / recurring tasks on-the-fly. Celery's default beat_scheduler does not support this behavior and would require the Beat process be killed/restarted upon every change. Instead, we set our custom sermos.celery_beat:SermosScheduler as the beat_scheduler, which takes care of watching the database for new/modified entries and reloads dynamically.

Workers / Tasks / Pipeline Nodes

If running in Sermos Cloud, Sermos runs all work (whether scheduled tasks or pipeline nodes) as Celery workers. Sermos handles decorating the tasks, generating the correct Celery chains, etc.

Customer code has one requirement: write a python method that accepts one positional argument: event

e.g.

def demo_pipeline_node_a(event):
    logger.info(f"RUNNING demo_pipeline_node_a: {event}")
    return

Sermos Tools [optional]

Sermos provides much of the scaffolding and design guidance for running machine learning workloads and has a companion project Sermos Tools, which provides a set of useful (and growing) tools intended to streamline development of machine learning workflows and tasks.

Generators

TODO: This needs to be updated both in code and documentation. Leaving here because it's valuable to update in the future.

A common task associated with processing batches of documents is generating the list of files to process. sermos.generators contains two helpful classes to generate lists of files from S3 and from a local file system.

S3KeyGenerator will produce a list of object keys in S3. Example:

gen = S3KeyGenerator('access-key', 'secret-key')
files = gen.list_files(
    'bucket-name',
    'folder/path/',
    offset=0,
    limit=4,
    return_full_path=False
)

LocalKeyGenerator will produce a list of file names on a local file system. Example:

gen = LocalKeyGenerator()
files = gen.list_files('/path/to/list/')

Testing

If you are developing Sermos core and want to test this package, install the test dependencies:

$ pip install -e .[test]

Now, run the tests:

$ tox

Contributors

Thank you to everyone who has helped in our quest to put machine learning to work in the real-world!

  • Kevin Lyons
  • Alejandro Mesa
  • Cassie Borish
  • Vickram Premakumar
  • Aral Tasher
  • Akshay Pakhle
  • Gilman Callsen
  • Your Name Here!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sermos-3.0.0.tar.gz (253.4 kB view hashes)

Uploaded Source

Built Distribution

sermos-3.0.0-py2.py3-none-any.whl (263.2 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page