Skip to main content

Computational Models for Understanding Scientific Software

Project description

Scientific Software (Predictive) Models

Computational predictive models to assist in the identification, classification, and study of scientific software.

Models

Developer-Author Entity Matching

This model is a binary classifier that predicts whether a developer and an author are the same person. It is trained on a dataset of 3000 developer-author pairs that have been annotated as either matching or not matching.

Usage

Given a set of developers and authors, we use the model on each possible pair of developer and author to predict whether they are the same person. The model returns a list of only the found matches in MatchedDevAuthor objects, each containing the developer, author, and the confidence of the prediction.

from sci_soft_models import dev_author_em

devs = [
    dev_author_em.DeveloperDetails(
        username="evamaxfield",
        name="Eva Maxfield Brown",
    ),
    dev_author_em.DeveloperDetails(
        username="nniiicc",
    ),
]

authors = [
    "Eva Brown",
    "Nicholas Weber",
]

matches = dev_author_em.match_devs_and_authors(devs=devs, authors=authors)
print(matches)
# [
#   MatchedDevAuthor(
#       dev=DeveloperDetails(
#           username='evamaxfield',
#           name='Eva Maxfield Brown',
#           email=None,
#       ),
#       author='Eva Brown',
#       confidence=0.9851127862930298
#   )
# ]

Extra Notes

Developer-Author-EM Dataset

This model was originally created and managed as a part of rs-graph and as such, to regenerate the dataset for annotation, the following steps can be taken:

git clone https://github.com/evamaxfield/rs-graph.git
cd rs-graph
git checkout c1d8ec89
pip install -e .
rs-graph-modeling create-developer-author-em-dataset-for-annotation

Link to annotation set creation function.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sci_soft_models-0.2.2.tar.gz (9.2 MB view details)

Uploaded Source

Built Distribution

sci_soft_models-0.2.2-py3-none-any.whl (20.7 kB view details)

Uploaded Python 3

File details

Details for the file sci_soft_models-0.2.2.tar.gz.

File metadata

  • Download URL: sci_soft_models-0.2.2.tar.gz
  • Upload date:
  • Size: 9.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for sci_soft_models-0.2.2.tar.gz
Algorithm Hash digest
SHA256 92a3025d9665d93cb82a0db0f5bb5aa1f76bb5a38af5dc62570000abf232de11
MD5 93c778aa2c39f097bcdfe12fd0f5357c
BLAKE2b-256 b4abca5f927e6bf850a8fd6118bb94e43b6181c1c85a56eb9c050be73c2c8551

See more details on using hashes here.

File details

Details for the file sci_soft_models-0.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for sci_soft_models-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a560bc24f17f9eb491e1e1df3c860883b022d2a9a49e1de709f3d9dd54551ee0
MD5 3271f1d5bd26a46d0dbc99e6bd4c3cd7
BLAKE2b-256 aca87997456d02753a7a4fd3f45edecf3f27199698ae9383ddb875aa3a21cb7d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page