Skip to main content

Data Science for Software Engieering (ds4se) is an academic initiative to perform exploratory analysis on software engieering artifact and metadata. Data Management, Analysis, and Benchmarking for DL and Traceability

Project description

ds4se

Data Science for Software Engieering (ds4se) is an academic initiative to perform exploratory analysis on software engineering artifacts and metadata. Data Management, Analysis, and Benchmarking for DL and Traceability.

A Data Science for Software Engineering Library (DS4SE-API)

Project Leads: Nathan, @danaderp

Description: Software data comprises any type of artifacts like source code, requirements, user stories, screens, binaries, etc. Automating software engineering tasks based on Machine Learning requires a huge effort of adapting algorithms and deep learning approaches for processing software data. SEMERU Lab is working on a solution for processing any type of data that is a product of software lifecycle. DS4SE library was coined to manage, describe, explore, infer, visualize, represent, and mine software data by relying on statistical theory and machine learning libraries. The DS4SE architecture follows the paradigm of “exploratory programming” to enhance the development process. However, most of the modules that compose the library are incomplete, incommunicated, or undocumented. In this project, we need a motivated team that will help us to connect, refactor, and implement several data science components critical for the future research in SEMERU Lab. You will be working on the back-end. The team is going to be divided into 3 domains:

  • Back-End Development and Refactoring,
  • Interface and Facade Implementation (or API), and
  • Testing.

Project Description for CSCI 435/535

Project Goals:

  • Implement the Initial Data Analysis module based on SE metrics theory
  • Refactor the Exploratory Data Analysis module based on information science theory
  • Integrate from other repositories (i.e. COMET) data science components like causal inference and data representation
  • Expose the API to be consumed by other teams (Team of Project#1 should consume your services)

Requirements:

  • Required Knowledge Prerequisites: Python and Git
  • Preferred Knowledge Prerequisites: Machine Learning, Statistical Computing

Recommended Readings:

  • Exploratory Programming with Nbdev link
  • Manage your Data Science Project Structure in Early Stage Blog

Install

pip install ds4se

How to use

Fill me in please! Don't forget code examples:

1+1
2

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ds4se-0.0.2.tar.gz (28.8 kB view hashes)

Uploaded Source

Built Distribution

ds4se-0.0.2-py3-none-any.whl (52.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page