Computational Models for Understanding Scientific Software
Project description
Scientific Software (Predictive) Models
Computational predictive models to assist in the identification, classification, and study of scientific software.
Models
Developer-Author Entity Matching
This model is a binary classifier that predicts whether a developer and an author are the same person. It is trained on a dataset of 3000 developer-author pairs that have been annotated as either matching or not matching.
Usage
Given a set of developers and authors, we use the model on each possible pair of developer and author to predict whether they are the same person. The model returns a list of only the found matches in MatchedDevAuthor
objects, each containing the developer, author, and the confidence of the prediction.
from sci_soft_models import dev_author_em
devs = [
dev_author_em.DeveloperDetails(
username="evamaxfield",
name="Eva Maxfield Brown",
),
dev_author_em.DeveloperDetails(
username="nniiicc",
),
]
authors = [
"Eva Brown",
"Nicholas Weber",
]
matches = dev_author_em.match_devs_and_authors(devs=devs, authors=authors)
print(matches)
# [
# MatchedDevAuthor(
# dev=DeveloperDetails(
# username='evamaxfield',
# name='Eva Maxfield Brown',
# email=None,
# ),
# author='Eva Brown',
# confidence=0.9851127862930298
# )
# ]
Extra Notes
Developer-Author-EM Dataset
This model was originally created and managed as a part of rs-graph and as such, to regenerate the dataset for annotation, the following steps can be taken:
git clone https://github.com/evamaxfield/rs-graph.git
cd rs-graph
git checkout c1d8ec89
pip install -e .
rs-graph-modeling create-developer-author-em-dataset-for-annotation
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file sci_soft_models-0.1.1.tar.gz
.
File metadata
- Download URL: sci_soft_models-0.1.1.tar.gz
- Upload date:
- Size: 9.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 39b5027ec66572f15bda96614ebf97b7ab37aa5d613f241fad453719795c1c5e |
|
MD5 | 2d5a671a21fa617a2a37147965ec63a9 |
|
BLAKE2b-256 | 14d624efa77a1aa98bc9aea0f916af31a3c1d45a3ef79be6097a89c98edd4113 |
File details
Details for the file sci_soft_models-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: sci_soft_models-0.1.1-py3-none-any.whl
- Upload date:
- Size: 9.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 55e063cd1f91c8df399a595266f815ffb77e7719c597ad320d6b6adb0539654f |
|
MD5 | a0b310192b9bc3411bdde73ec983ec8d |
|
BLAKE2b-256 | 0fbd86826bcba23c421ad3601692960998e33583a7114222864d63ff32abe563 |