Skip to main content

Detects potential corrupt entries in a dataframe with lat,lng and country tagged data.

Project description

ITALLIC: A tool for automatically identifying and correcting errors in location based plant breeding data

One of the challenges of integrating plant breeding data to collectively analyze it with other sources of data such as genotype, environment, management, and socioeconomic data is errors in location data. Collectively, this data could be used to inform genetic predictive models for maize, wheat, and other crops. Typical errors in plant breeding location data include flipped latitude and longitude values, missing negative signs, and, in some cases, missing data. This tool, an Integrated Tool for Automatic Lat Long Imputation and Cleaning (ITALLIC), automatically detects and corrects errors in location data and imputes missing values for location-dependent data, such as region name.

This page contains instructions for installing and uaing ITALLIC. These instructions assume familiarity working on a terminal.

Pre-Installation

ITALLIC is a Python 3 application. In addition to Python 3, we highly recommend also installing Conda. Click this link for more information on installing Conda.

Even though you do not need Conda to use ITALLIC, using Conda has some advantages that will make life easier. It will not only make installation for ITALLIC and other Python packages easy, it also enables use of conda environments. Use of environments is a good way to prevent conflicts that might arise when working on different projects that require different versions of the same software package. This blog nicely summarizes some advantages of using environments.

Prepare working environment

Create a conda environment for data cleaning and install ITALLIC in that environment. The command below uses "DataCleaning" as the environment name and Python 3.8 as the Python version to use. You can use a different name for your conda environment but we recommend sticking with Python 3.8. Any Python 3 version should work but since ITALLIC was tested on Python version 3.8, we recommend using the same Python version.

  • Create conda environment.
$ conda create --name DataCleaning python=3.8 -y
  • Activate the environment.
$ conda activate DataCleaning
  • Install Jupter Notebook. ITALLIC has a visualization tool that works well with Jupyer Notebook. Use conda to install Jupter.
$ conda install -c conda-forge jupyter -y
  • Install dependencies needed to use jupyter.
$ conda install -c conda-forge ipykernel -y
  • Create kernel for this environment to use with jupyter notebook. We recommend using the same name for the kernel that was used for the environment.
$ ipython kernel install --user --name=DataCleaning

Installation

Now that you have the environment setup, and installed jupyter, you are ready to install ITALLIC.

  • Install ITALLIC.
$ conda install -c conda-forge itallic -y
  • You can now deactivate the conda environment and switch to using Jupyter Notebook to get started.
$ conda deactivate

Getting Started

  • Create a working directory
$ mkdir DataCleaningDir
  • Navigate into the directory
$ cd DataCleaningDir
  • Get compressed folder with country boundary data and a sample dataset to use for testing
$ wget https://github.com/getiria-onsongo/itallic/raw/main/resources/data.tar.gz

If your platform does not have wget, you can install it using conda "conda install -c conda-forge wget"

  • Uncompress data folder
$ tar -xvf data.tar.gz 

You can also download the compressed folder by clicking on this link and then clicking the "Download" button.

  • Download a Getting Started Python Notebook with basic commands on how to get started.
$ wget https://github.com/getiria-onsongo/itallic/raw/main/resources/GettingStarted.ipynb

More instructions coming soon....

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

itallic-0.0.7.tar.gz (3.6 MB view hashes)

Uploaded Source

Built Distribution

itallic-0.0.7-py3.8.egg (44.0 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page