Skip to main content

Computational Models for Understanding Scientific Software

Project description

Predictive Models for Researching Scientific Software

Computational predictive models to assist in the identification, classification, and study of scientific software.

Models

Developer-Author Entity Matching

This model is a binary classifier that predicts whether a developer and an author are the same person. It is trained on a dataset of 3000 developer-author pairs that have been annotated as either matching or not matching.

Usage

Given a set of developers and authors, we use the model on each possible pair of developer and author to predict whether they are the same person. The model returns a list of only the found matches in MatchedDevAuthor objects, each containing the developer, author, and the confidence of the prediction.

from sci_soft_models import dev_author_em

devs = [
    dev_author_em.DeveloperDetails(
        username="evamaxfield",
        name="Eva Maxfield Brown",
    ),
    dev_author_em.DeveloperDetails(
        username="nniiicc",
    ),
]

authors = [
    "Eva Brown",
    "Nicholas Weber",
]

matches = dev_author_em.match_devs_and_authors(devs=devs, authors=authors)
print(matches)
# [
#   MatchedDevAuthor(
#       dev=DeveloperDetails(
#           username='evamaxfield',
#           name='Eva Maxfield Brown',
#           email=None,
#       ),
#       author='Eva Brown',
#       confidence=0.9851127862930298
#   )
# ]

Extra Notes

Developer-Author-EM Dataset

This model was originally created and managed as a part of rs-graph and as such, to regenerate the dataset for annotation, the following steps can be taken:

git clone https://github.com/evamaxfield/rs-graph.git
cd rs-graph
git checkout c1d8ec89
pip install -e .
rs-graph-modeling create-developer-author-em-dataset-for-annotation

Link to annotation set creation function.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sci_soft_models-0.2.0.tar.gz (50.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sci_soft_models-0.2.0-py3-none-any.whl (50.9 MB view details)

Uploaded Python 3

File details

Details for the file sci_soft_models-0.2.0.tar.gz.

File metadata

  • Download URL: sci_soft_models-0.2.0.tar.gz
  • Upload date:
  • Size: 50.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sci_soft_models-0.2.0.tar.gz
Algorithm Hash digest
SHA256 7363a719469b55948483b04959cd5eb11d85d70f1ce971295d139d0edcd66429
MD5 23cfb6c26226ab6002bf6c12549418c4
BLAKE2b-256 d4f65e0f1d08cee00c6f3d5663676ac3ab5f7b879208bfa10e36748413a7c825

See more details on using hashes here.

Provenance

The following attestation bundles were made for sci_soft_models-0.2.0.tar.gz:

Publisher: ci.yml on evamaxfield/sci-soft-models

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sci_soft_models-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sci_soft_models-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fb2a08df269d57d8a20f8f27d0eed9cda775285536d6abbf30f35fc49c38c24c
MD5 ed57b41192e63f1cf7c28f9f541bc10e
BLAKE2b-256 6eb46c5097b66fcfd20c358a46ad505255f7f0fad30e86671818b644eac28f0d

See more details on using hashes here.

Provenance

The following attestation bundles were made for sci_soft_models-0.2.0-py3-none-any.whl:

Publisher: ci.yml on evamaxfield/sci-soft-models

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page