Skip to main content

Detects potential corrupt entries in a dataframe with lat,lng and country tagged data.

Project description

ITALLIC: A tool for automatically identifying and correcting errors in location based plant breeding data

One of the challenges of integrating plant breeding data to collectively analyze it with other sources of data such as genotype, environment, management, and socioeconomic data is errors in location data. Collectively, this data could be used to inform genetic predictive models for maize, wheat, and other crops. Typical errors in plant breeding location data include flipped latitude and longitude values, missing negative signs, and, in some cases, missing data. This tool, an Integrated Tool for Automatic Lat Long Imputation and Cleaning (ITALLIC), automatically detects and corrects errors in location data and imputes missing values for location-dependent data, such as region name.

This page contains instructions for installing and uaing ITALLIC. These instructions assume familiarity working on a terminal.

Pre-Installation

ITALLIC is a Python 3 application. In addition to Python 3, we highly recommend also installing Conda. Click this link for more information on installing Conda.

Even though you do not need Conda to use ITALLIC, using Conda has some advantages that will make life easier. It will not only make installation for ITALLIC and other Python packages easy, it also enables use of conda environments. Use of environments is a good way to prevent conflicts that might arise when working on different projects that require different versions of the same software package. This blog nicely summarizes some advantages of using environments.

Prepare working environment

Create a conda environment for data cleaning and install ITALLIC in that environment. The command below uses "DataCleaning" as the environment name and Python 3.8 as the Python version to use. You can use a different name for your conda environment but we recommend sticking with Python 3.8. Any Python 3 version should work but since ITALLIC was tested on Python version 3.8, we recommend using the same Python version.

  • Create conda environment.
$ conda create --name DataCleaning python=3.8 -y
  • Activate the environment.
$ conda activate DataCleaning
  • Install Jupter Notebook. ITALLIC has a visualization tool that works well with Jupyer Notebook. Use conda to install Jupter.
$ conda install -c conda-forge jupyter -y
  • Install dependencies needed to use jupyter.
$ conda install -c conda-forge ipykernel -y
  • Create kernel for this environment to use with jupyter notebook. We recommend using the same name for the kernel that was used for the environment.
$ ipython kernel install --user --name=DataCleaning

Installation

Now that you have the environment setup, and installed jupyter, you are ready to install ITALLIC.

  • Install ITALLIC.
$ conda install -c conda-forge itallic -y
  • You can now deactivate the conda environment and switch to using Jupyter Notebook to get started.
$ conda deactivate

Getting Started

  • Create a working directory
$ mkdir DataCleaningDir
  • Navigate into the directory
$ cd DataCleaningDir
  • Get compressed folder with country boundary data and a sample dataset to use for testing
$ wget https://github.com/getiria-onsongo/itallic/raw/main/resources/data.tar.gz

If your platform does not have wget, you can install it using conda "conda install -c conda-forge wget"

  • Uncompress data folder
$ tar -xvf data.tar.gz 

You can also download the compressed folder by clicking on this link and then clicking the "Download" button.

  • Download a Getting Started Python Notebook with basic commands on how to get started.
$ wget https://github.com/getiria-onsongo/itallic/raw/main/resources/GettingStarted.ipynb

More instructions coming soon....

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

itallic-0.0.8.tar.gz (3.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

itallic-0.0.8-py3.8.egg (44.0 kB view details)

Uploaded Egg

File details

Details for the file itallic-0.0.8.tar.gz.

File metadata

  • Download URL: itallic-0.0.8.tar.gz
  • Upload date:
  • Size: 3.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/56.0.0 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.9.1

File hashes

Hashes for itallic-0.0.8.tar.gz
Algorithm Hash digest
SHA256 b9779e9f7e5b604397f569dbf311f6eab15e44198fbc3aa41420dec4ab2176b8
MD5 3fbf7e92d1a157d3563abaab33f12581
BLAKE2b-256 f18323519aa337576af168e22e6e02db1262a40c9b5f53e874fbb957916614de

See more details on using hashes here.

File details

Details for the file itallic-0.0.8-py3.8.egg.

File metadata

  • Download URL: itallic-0.0.8-py3.8.egg
  • Upload date:
  • Size: 44.0 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/56.0.0 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.9.1

File hashes

Hashes for itallic-0.0.8-py3.8.egg
Algorithm Hash digest
SHA256 225d92f17d72e3372b4db64cf52b73ee42c169e2b3fb1b5519d9f6821ebc6823
MD5 2c9625106b52fcd22257cb2af33ad14f
BLAKE2b-256 e8485a8d0ac8184d1672b34e37b7119a6ee39d4157bca57cb84fec9052eb199d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page