Skip to main content

Did I Find It?

Project description

difi

Did I Find It?
Build Status Build Status Coverage Status Docker Pulls
Python 3.6 License

About

difi is a simple package that takes pre-formatted linkage information from software such as MOPS, pytrax, or THOR and analyzes which objects have been found given a set of known labels (or truths). A key performance criteria is that difi needs to be fast by avoiding Python for loops and instead uses clever pandas.DataFrame manipulation.

Installation

The following installation paths are available:
Anaconda
PyPi
Docker
Source

Anaconda

difi can be downloaded directly from anaconda:
conda install -c moeyensj difi

Or, if preferred, installed into its own environment via:
conda create -n difi_py38 -c moeyensj difi python=3.8

PyPi

difi is also available from the Python package index:
pip install difi

Docker

A Docker container with the latest version of the code can be pulled using:
docker pull moeyensj/difi:latest

To run the container:
docker run -it moeyensj/difi:latest

The difi code is installed the /projects directory, and is by default also installed in the container's Python installation.

Source

Clone this repository using either ssh or https. Once cloned and downloaded, cd into the repository.

To install difi in its own conda enviroment please do the following:
conda create -n difi_py38 -c defaults -c conda-forge --file requirements.txt python=3.8

Or, to install difi in a pre-existing conda environment called difi_py38:
conda activate difi_py38
conda install -c defaults -c conda-forge --file requirements.txt

Or, to install pre-requisite software using pip:
pip install -r requirements.txt

Once pre-requisites have been installed using either one of the three options above, then:
python setup.py install

Or, if you would like to make an editable install then:
python setup.py develop

You should now be able to start Python and import difi.

Example

The example below can be found in greater detail in this Jupyter Notebook.

Assumed Inputs

difi is designed to analyze a set of linkages made by external software where some of the underlying true linkages are known. It needs just two DataFrames of data:

    1. a DataFrame containing observations, with a column for observation ID and a column for the underlying truth (don't worry! -- difi can handle false positives and unknown truths as well)

observations

    1. a DataFrame describing the linkages that were found in the observations by the external software. This DataFrame needs just two columns, one with the linkage ID and the other with the observation IDs that form that linkage

linkage_members

What Can I Find?

In most cases the user can determine what known truths in their observations dataframe can be found by their respective linking algorithm. difi has two simple findability metrics:

The 'min_obs' metric: any object with this many or more observations is considered findable.
analyzeObservations

The 'nightly_linkages' metric: any object with this many or more observations is considered findable.
analyzeObservations

Which objects are findable?
all_truths

What observations made each object findable?
findable_observations

A summary of what kinds of objects are findable might be useful.
summary

Did I Find It?

Now lets see what the external linking software did find.

analyzeLinkages

difi assumes there to be three different types of linkages:

  • 'pure': all observations in a linkage belong to a unique truth
  • 'partial': up to a certain percentage of non-unique thruths are allowed so long as one truth has at least the minimum required number of unique observations
  • 'mixed': a linkage containing different observations belonging to different truths, we avoid using the word 'false' for these linkages as they may contain unknown truths depending on the use case. We leave interpretation up to the user.

Thanks to the power of pandas it can be super easy to isolate the different linkage types and analyze them separately. Selecting 'pure' linkages:

all_linkages_pure

Selecting 'partial' linkages:

all_linkages_partial

Selecting 'mixed' linkages:

all_linkages_mixed

Understanding the specifics behind each linkage is one thing, but how did the linking algorithm perform on an object by object basis. allTruths

Tutorial

A detailed tutorial on difi functionality can be found here.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

difi-1.1rc1.tar.gz (4.8 MB view details)

Uploaded Source

Built Distribution

difi-1.1rc1-py3-none-any.whl (18.6 kB view details)

Uploaded Python 3

File details

Details for the file difi-1.1rc1.tar.gz.

File metadata

  • Download URL: difi-1.1rc1.tar.gz
  • Upload date:
  • Size: 4.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.6.13

File hashes

Hashes for difi-1.1rc1.tar.gz
Algorithm Hash digest
SHA256 ae42e3afdb532d8e43a199d2f16ab693f5d25bbfa22b845f11d4a457a44ae01a
MD5 93abf7674cad4f63d5ea1f1b03c4411e
BLAKE2b-256 f37c29112471955f70feebf3f45317e17480e8bb7577b6805a1f346a99c9aebd

See more details on using hashes here.

Provenance

File details

Details for the file difi-1.1rc1-py3-none-any.whl.

File metadata

  • Download URL: difi-1.1rc1-py3-none-any.whl
  • Upload date:
  • Size: 18.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.6.13

File hashes

Hashes for difi-1.1rc1-py3-none-any.whl
Algorithm Hash digest
SHA256 c7f3095be6ebc4359cfe92d8cfa6b568c5df84443e865724af2e8f5c68998efa
MD5 45278adcd4c21112b9979e043ce5b505
BLAKE2b-256 8f35df43275dd1ae4790782f8618f7d5f014b592e2daec4f89d68455bc4dfb80

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page