Computational Models for Understanding Scientific Software
Project description
Predictive Models for Researching Scientific Software
Computational predictive models to assist in the identification, classification, and study of scientific software.
Models
Developer-Author Entity Matching
This model is a binary classifier that predicts whether a developer and an author are the same person. It is trained on a dataset of 3000 developer-author pairs that have been annotated as either matching or not matching.
Usage
Given a set of developers and authors, we use the model on each possible pair of developer and author to predict whether they are the same person. The model returns a list of only the found matches in MatchedDevAuthor objects, each containing the developer, author, and the confidence of the prediction.
from sci_soft_models import dev_author_em
devs = [
dev_author_em.DeveloperDetails(
username="evamaxfield",
name="Eva Maxfield Brown",
),
dev_author_em.DeveloperDetails(
username="nniiicc",
),
]
authors = [
"Eva Brown",
"Nicholas Weber",
]
matches = dev_author_em.match_devs_and_authors(devs=devs, authors=authors)
print(matches)
# [
# MatchedDevAuthor(
# dev=DeveloperDetails(
# username='evamaxfield',
# name='Eva Maxfield Brown',
# email=None,
# ),
# author='Eva Brown',
# confidence=0.9851127862930298
# )
# ]
Extra Notes
Developer-Author-EM Dataset
This model was originally created and managed as a part of rs-graph and as such, to regenerate the dataset for annotation, the following steps can be taken:
git clone https://github.com/evamaxfield/rs-graph.git
cd rs-graph
git checkout c1d8ec89
pip install -e .
rs-graph-modeling create-developer-author-em-dataset-for-annotation
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sci_soft_models-0.2.0.tar.gz.
File metadata
- Download URL: sci_soft_models-0.2.0.tar.gz
- Upload date:
- Size: 50.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7363a719469b55948483b04959cd5eb11d85d70f1ce971295d139d0edcd66429
|
|
| MD5 |
23cfb6c26226ab6002bf6c12549418c4
|
|
| BLAKE2b-256 |
d4f65e0f1d08cee00c6f3d5663676ac3ab5f7b879208bfa10e36748413a7c825
|
Provenance
The following attestation bundles were made for sci_soft_models-0.2.0.tar.gz:
Publisher:
ci.yml on evamaxfield/sci-soft-models
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sci_soft_models-0.2.0.tar.gz -
Subject digest:
7363a719469b55948483b04959cd5eb11d85d70f1ce971295d139d0edcd66429 - Sigstore transparency entry: 624527444
- Sigstore integration time:
-
Permalink:
evamaxfield/sci-soft-models@7e4ba03807f87041a8ddadedbf6ac9a60233825c -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/evamaxfield
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@7e4ba03807f87041a8ddadedbf6ac9a60233825c -
Trigger Event:
push
-
Statement type:
File details
Details for the file sci_soft_models-0.2.0-py3-none-any.whl.
File metadata
- Download URL: sci_soft_models-0.2.0-py3-none-any.whl
- Upload date:
- Size: 50.9 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fb2a08df269d57d8a20f8f27d0eed9cda775285536d6abbf30f35fc49c38c24c
|
|
| MD5 |
ed57b41192e63f1cf7c28f9f541bc10e
|
|
| BLAKE2b-256 |
6eb46c5097b66fcfd20c358a46ad505255f7f0fad30e86671818b644eac28f0d
|
Provenance
The following attestation bundles were made for sci_soft_models-0.2.0-py3-none-any.whl:
Publisher:
ci.yml on evamaxfield/sci-soft-models
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sci_soft_models-0.2.0-py3-none-any.whl -
Subject digest:
fb2a08df269d57d8a20f8f27d0eed9cda775285536d6abbf30f35fc49c38c24c - Sigstore transparency entry: 624527491
- Sigstore integration time:
-
Permalink:
evamaxfield/sci-soft-models@7e4ba03807f87041a8ddadedbf6ac9a60233825c -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/evamaxfield
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@7e4ba03807f87041a8ddadedbf6ac9a60233825c -
Trigger Event:
push
-
Statement type: