Skip to main content

Tiger_Assessment is a python package for developed as a part of interview process.

Project description

Bird Collision vs Artificial Light

The below project is a part of Tiger Machine Learning Engineer position.

Winger et al, 2019 examined nocturnal flight-calling behavior and vulnerability to artificial light in migratory birds.

"Understanding interactions between biota and the built environment is increasingly important as human modification of the landscape expands in extent and intensity. For migratory birds, collisions with lighted structures are a major cause of mortality, but the mechanisms behind these collisions are poorly understood. Using 40 years of collision records of passerine birds, we investigated the importance of species' behavioural ecologies in predicting rates of building collisions during nocturnal migration through Chicago, IL and Cleveland, OH, USA. "

"One of the few means to examine species-specific dynamics of social biology during nocturnal bird migration is through the study of short vocalizations made in flight by migrating birds. Many species of birds, especially passerines (order Passeriformes), produce such vocal signals during their nocturnal migrations. These calls (hereafter, ‘flight calls’) are hypothesized to function as important social cues for migrating birds that may aid in orientation, navigation and other decision-making behaviours.not all nocturnally migratory species make flight calls, raising the possibility that different lineages of migratory birds vary in the degree to which social cues and collective decisions are important for accomplishing migration. "

As per the researcher the bird collision with buildings in the metropolitan cities are mainly caused by artificial light. Birds react to this artificial light and initiate a flight call which attracts other birds for flight and results in collision with the building structure.

Citations

When using this data, please cite the original publication:

Winger BM, Weeks BC, Farnsworth A, Jones AW, Hennen M, Willard DE (2019) Nocturnal flight-calling behaviour predicts vulnerability to artificial light in migratory birds. Proceedings of the Royal Society B 286(1900): 20190364. https://doi.org/10.1098/rspb.2019.0364

If using the data alone, please cite the Dryad data package:

Winger BM, Weeks BC, Farnsworth A, Jones AW, Hennen M, Willard DE (2019) Data from: Nocturnal flight-calling behaviour predicts vulnerability to artificial light in migratory birds. Dryad Digital Repository. https://doi.org/10.5061/dryad.8rr0498

Program Execution

The goal of the project is to generate a summary table of all given 3 data files. Below is the step by step process of executing this program. The user runs bird_collision.py file. This opens up a GUI in which the user have to provide

1. Input Path and File Name 2. Output Folder Path

Data Files

As a part of this project, there are 3 JSON files ( Collision, Flight_Call, Light_Levels). These 3 files are placed in the landing zone as zip file. The path and file name of the landing zone is given by the user and the output file directory is also given by the user.

Libraries

Below are the libraries used as a part of this project.

  • pandas
  • numpy
  • matplotlib
  • seaborn
  • logging
  • os
  • datetime
  • zipfile
  • json
  • subprocess
  • sys

Project Files & Folders

  • Data
  • This folder will have the data files required to for the program. Below is the file structure and description.

    • Input
    • This folder will have the input files required to for the program.

    • Ouput
    • This folder will have the output files required to for the program.

  • Tiger Assessment
  • This folder just has the init.py file required to initiate the package

  • Logs
  • This folder has all the logs generated by program

  • config.py
  • This file initial configuration setting like paths etc.

  • requrirements
  • This file has all the required packages

  • LICENSE
  • This is an MIT license

  • setup.py
  • This is a setup file required by python to package and distribute the code. This file has all the indetail description and specifications.

  • my_functions.py
  • This file has all the classes and functions required for the project

  • bird_collision.py
  • This is the main file of the project. The user runs this file which will take input path and file and generate the summary table in given output path.

    Data Dictionary

    chicago_collision_data

    Variable Class Description
    genus factor Bird Genus
    species factor Bird species
    date date Date of collision death (ymd)
    locality factor MP or CHI - recording at either McCormick Place or greater Chicago area

    flight_call

    Variable Class Description
    genus factor Bird Genus
    species factor Bird species
    family factor Bird Family
    flight_call factor Does the bird use a flight call - yes or no
    habitat factor Open, Forest, Edge - their habitat affinity
    stratum factor Typical occupied stratum - ground/low or canopy/upper
    collisions integer The total number of collision in the last 40 year

    light_levels

    Variable Class Description
    date date Date of light level observed
    light source integer Number of windows lit at the McCormick Place, Chicago - higher = more light

    Data Cleaning and Preprocessing

    Below are the following steps used to clean and preprocess the data.

    1. UnZipping

    The data files are extracted from zip, they are placed in ./Data/Input/ directory and the originial zip file is deleted.

    2. JSON to Data Frame

    The files received are in JSON format. These files have to be processed and converted into data frame for analysis purposes. I have created my_func.json_df that take path of json files from config file and returns a data frame

    3. Cleaning the Data Files

    Data_Process class has all the necessary functions required to clean the data.

    Below are the steps used to clean the data file.

    1. Cleaning and Mapping Columns

      We see that flight_call has its column order wrong. I have used a dictionary to map the columns correctly. I also stripped the white space in columns which helps in standardizing column names.

    2. Trimming the Leading and Trailing whitespaces

      As a best practice, it is always recommended to clean the leading and trailing whitespaces. data_process.trim fuction trim the leading an trailing whitespace for non-numeric columns.

    3. Standardizing the Dates

      As a best practice, it is always recommended to standardize Dates columns.

    4. Sort by Dates

      As a best practice, it is always recommended to sort data by Dates columns.

    5. Dropping rows with missing Date in light_levels data

      Since the light_source is recorded by date, we can drop rows with empty dates.

    6. Capitalize the non-numeric columns

      As a best practice, it is always recommended to Capitalize the factor (non-numeric) columns, especially identifiers like names, places, etc, so that some of the Data Entry errors (like John Doe entered as john Doe) can be fixed.

    7. Dropping duplicate

      As a best practice, it is always recommended to drop duplicate records if, it has any unique or key values. In our case, flight_call and light_level have data column as unique value. The function data_process.drop_dup take in data frame and drop duplicate records.

    8. Interpolating

      As a best practice, it is always recommended to interpolate missing values where ever deemed necessary, but with extreme caution. In our case, light_level has missing records. We can use a simple linear interpolate method because the light_source levels, on an average don't change drastically from the previous few days.

    9. In depth cleaning

      It is observed that flight_call column of flight_call data has an extra Rare factor. Since it can only have Yes/No, we can assume that Rare is Yes.

    Merging the Data Frames

    After doing the data preprocessing and clean, we obtain clean files that we can merge. my_func.file_merge takes in 3 data frames and output 1 final data frame on which we can do our analysis.

    Generating Summary File and Plots

    The final step is generate the result and plots. my_func.summary_stats.summarize generates the summary file as a csv because it is very easy to interpret and do custom analysis on csv. my_func.summary_stats.count_plot generate bar plot of different features like family, genus, Locality etc.

    Insights

    Collision by Flight Call of Birds

    We can see that birds that employ flight calls have significantly (almost 35000 times) more collision than the birds that don't employ flight calls. Flight Call_bar_plot

    We can see here that the flight call is a significant factor but we may need futher testing to be sure.

    Collision by Family of Birds

    We can see that Passerellidae Family of birds have the highest collisions followed by Parulidae and Turdidae. Family Bar Plot

    Here we observe that Passerellidae is most common family that have collisions. But its is highly possible that Passerellidae may be majority in the observed cities. If we an estimate of percentage distribution of birds by family we can take a weighted proportion and then check for most common family of birds.

    Collision by Genus of Birds

    We can see that Melospiza Genus of birds have the highest collisions followed by Zonotrichia and Catharus. Genus_bar_plot

    Here we observe that Melospiza is most common genus that have collisions. But its is highly possible that Melospiza may be majority in the observed cities. If we an estimate of percentage distribution of genus by family we can take a weighted proportion and then check for most common genus of birds.

    Collision by Species of Birds

    We can see that Albicollis species of birds have the highest collisions followed by Hyemalis and Melodia. Species_bar_plot

    Here we observe that Albicollis is most common species that have collisions. But its is highly possible that Albicollis may be majority in the observed cities. If we an estimate of percentage distribution of species by family we can take a weighted proportion and then check for most common species of birds.

    Collision by Habitat of Birds

    We can see that birds that usually dwell in Forest are twice as much as birds that live on edge and almost 7 times as much as birds who live in the open. Habitat_bar_plot

    Here we can see that birds that are not used to artificial lighting are more susuptible for collisions. This could be a compelling feature.

    Collision by Locality of Birds

    We can see that there is an equal distribution of birds in both localities. Locality_bar_plot

    Collision by Stratum of Birds

    We can see that the lower stratum birds have twice as much as collision than upper stratum birds.Stratum_bar_plot

    Here we can see that birds who live on lower stratum are more susuptible for collisions. This could be a compelling feature.

    Summary & Findings

    1. Passerellidae Family of birds have the highest collisions followed by Parulidae and Turdidae

    2. Melospiza Genus of birds have the highest collisions followed by Zonotrichia and Catharus

    3. Albicollis species of birds have the highest collisions followed by Hyemalis and Melodia

    4. Birds that usually dwell in Forest are twice as much as birds that live on edge and almost 7 times as much as birds who live in the open

    5. The lower stratum birds have twice as much as collision than upper stratum birds

    6. Birds who employ flight calls have twice as much as collision than the birds who don't employ flight calls

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Tiger_Assessment-0.0.9.tar.gz (10.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

Tiger_Assessment-0.0.9-py3-none-any.whl (14.7 kB view details)

Uploaded Python 3

File details

Details for the file Tiger_Assessment-0.0.9.tar.gz.

File metadata

  • Download URL: Tiger_Assessment-0.0.9.tar.gz
  • Upload date:
  • Size: 10.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.6.8

File hashes

Hashes for Tiger_Assessment-0.0.9.tar.gz
Algorithm Hash digest
SHA256 de9e340b3aa64e310b2c90f8a28a52739c4939338127ee3fe0c1e01600b7ad48
MD5 18dbed2b1266a729ab0bad757cf3f198
BLAKE2b-256 5b2f8631a0e945f70521b0b6e2306a0baec184faed8fb8403da991223d871fa5

See more details on using hashes here.

File details

Details for the file Tiger_Assessment-0.0.9-py3-none-any.whl.

File metadata

  • Download URL: Tiger_Assessment-0.0.9-py3-none-any.whl
  • Upload date:
  • Size: 14.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.6.8

File hashes

Hashes for Tiger_Assessment-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 1f9df1e8f40d43583998758f80c292064b53ab16bf29f95d23566fbc783c4a89
MD5 c8befac537cea32798790407b4165088
BLAKE2b-256 c1ee3ce6c16b65179fa2f9856123139c8e5409e18e422c497b38adff944e5a0e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page