Tiger_Assessment is a python package for developed as a part of interview process.
Project description
Bird Collision vs Artificial Light
The below project is a part of Tiger Machine Learning Engineer position.
Winger et al, 2019 examined nocturnal flight-calling behavior and vulnerability to artificial light in migratory birds.
"Understanding interactions between biota and the built environment is increasingly important as human modification of the landscape expands in extent and intensity. For migratory birds, collisions with lighted structures are a major cause of mortality, but the mechanisms behind these collisions are poorly understood. Using 40 years of collision records of passerine birds, we investigated the importance of species' behavioural ecologies in predicting rates of building collisions during nocturnal migration through Chicago, IL and Cleveland, OH, USA. "
"One of the few means to examine species-specific dynamics of social biology during nocturnal bird migration is through the study of short vocalizations made in flight by migrating birds. Many species of birds, especially passerines (order Passeriformes), produce such vocal signals during their nocturnal migrations. These calls (hereafter, ‘flight calls’) are hypothesized to function as important social cues for migrating birds that may aid in orientation, navigation and other decision-making behaviours.not all nocturnally migratory species make flight calls, raising the possibility that different lineages of migratory birds vary in the degree to which social cues and collective decisions are important for accomplishing migration. "
As per the researcher the bird collision with buildings in the metropolitan cities are mainly caused by artificial light. Birds react to this artificial light and initiate a flight call which attracts other birds for flight and results in collision with the building structure.
Citations
When using this data, please cite the original publication:
Winger BM, Weeks BC, Farnsworth A, Jones AW, Hennen M, Willard DE (2019) Nocturnal flight-calling behaviour predicts vulnerability to artificial light in migratory birds. Proceedings of the Royal Society B 286(1900): 20190364. https://doi.org/10.1098/rspb.2019.0364
If using the data alone, please cite the Dryad data package:
Winger BM, Weeks BC, Farnsworth A, Jones AW, Hennen M, Willard DE (2019) Data from: Nocturnal flight-calling behaviour predicts vulnerability to artificial light in migratory birds. Dryad Digital Repository. https://doi.org/10.5061/dryad.8rr0498
Program Execution
The goal of the project is to generate a summary table of all given 3 data files. Below is the step by step process of executing this program. The user import Tiger_Assessment library from pip by running the following command. (pip install Tiger-Assessment ). This opens up a GUI in which the user have to provide
**1. Input Path and File Name
- Output Folder Path**
Data Files
As a part of this project, there are 3 JSON files ( Collision, Flight_Call, Light_Levels). These 3 files are placed in the landing zone. The path of the landing zone is given by the user and the output file directory is also given by the user.
Libraries
Below are the libraries used as a part of this project.
- pandas
- numpy
- matplotlib
- seaborn
- logging
- os
- datetime
- zipfile
- json
- subprocess
- sys
Project Files & Folders
- Data
- Input
- Ouput
- Tiger Assessment
- Logs
- config.py
- requrirements
- LICENSE
- setup.py
- my_functions.py
- Tiger_Assessment.py
This folder will have the data files required to for the program. Below is the file structure and description.
This folder will have the input files required to for the program.
This folder will have the output files required to for the program.
This folder just has the init.py file required to initiate the package
This folder has all the logs generated by program
This file initial configuration setting like paths etc.
This file has all the required packages
This is an MIT license
This is a setup file required by python to package and distribute the code. This file has all the indetail description and specifications.
This file has all the classes and functions required for the project
This is the main file of the project. The user runs this file which will take input path and file and generate the summary table in given output path.
Data Dictionary
chicago_collision_data
| Variable | Class | Description |
|---|---|---|
| genus | factor | Bird Genus |
| species | factor | Bird species |
| date | date | Date of collision death (ymd) |
| locality | factor | MP or CHI - recording at either McCormick Place or greater Chicago area |
flight_call
| Variable | Class | Description |
|---|---|---|
| genus | factor | Bird Genus |
| species | factor | Bird species |
| family | factor | Bird Family |
| flight_call | factor | Does the bird use a flight call - yes or no |
| habitat | factor | Open, Forest, Edge - their habitat affinity |
| stratum | factor | Typical occupied stratum - ground/low or canopy/upper |
| collisions | integer | The total number of collision in the last 40 year |
light_levels
| Variable | Class | Description |
|---|---|---|
| date | date | Date of light level observed |
| light source | integer | Number of windows lit at the McCormick Place, Chicago - higher = more light |
Data Cleaning and Preprocessing
Below are the following steps used to clean and preprocess the data.
1. UnZipping
The data files are extracted from zip, they are placed in ./Data/Input/ directory and the originial zip file is deleted.
2. JSON to Data Frame
The files received are in JSON format. These files have to be processed and converted into data frame for analysis purposes. I have created my_func.json_df that take path of json files from config file and returns a data frame
3. Cleaning the Data Files
Data_Process class has all the necessary functions required to clean the data.
Below are the steps used to clean the data file.
-
Cleaning and Mapping Columns
We see that flight_call has its column order wrong. I have used a dictionary to map the columns correctly. I also stripped the white space in columns which helps in standardizing column names.
-
Trimming the Leading and Trailing whitespaces
As a best practice, it is always recommended to clean the leading and trailing whitespaces. data_process.trim fuction trim the leading an trailing whitespace for non-numeric columns.
-
Standardizing the Dates
As a best practice, it is always recommended to standardize Dates columns.
-
Sort by Dates
As a best practice, it is always recommended to sort data by Dates columns.
-
Dropping rows with missing Date in light_levels data
Since the light_source is recorded by date, we can drop rows with empty dates.
-
Capitalize the non-numeric columns
As a best practice, it is always recommended to Capitalize the factor (non-numeric) columns, especially identifiers like names, places, etc, so that some of the Data Entry errors (like John Doe entered as john Doe) can be fixed.
-
Dropping duplicate
As a best practice, it is always recommended to drop duplicate records if, it has any unique or key values. In our case, flight_call and light_level have data column as unique value. The function data_process.drop_dup take in data frame and drop duplicate records.
-
Interpolating
As a best practice, it is always recommended to interpolate missing values where ever deemed necessary, but with extreme caution. In our case, light_level has missing records. We can use a simple linear interpolate method because the light_source levels, on an average don't change drastically from the previous few days.
-
In depth cleaning
It is observed that flight_call column of flight_call data has an extra Rare factor. Since it can only have Yes/No, we can assume that Rare is Yes.
Merging the Data Frames
After doing the data preprocessing and clean, we obtain clean files that we can merge. my_func.file_merge takes in 3 data frames and output 1 final data frame on which we can do our analysis.
Generating Summary File and Plots
The final step is generate the result and plots. my_func.summary_stats.summarize generates the summary file as a csv because it is very easy to interpret and do custom analysis on csv. my_func.summary_stats.count_plot generate bar plot of different features like family, genus, Locality etc.
Insights
Collision by Flight Call of Birds
We can see that birds that employ flight calls have significantly (almost 35000 times) more collision than the birds that don't employ flight calls.
We can see here that the flight call is a significant factor but we may need futher testing to be sure.
Collision by Family of Birds
We can see that Passerellidae Family of birds have the highest collisions followed by Parulidae and Turdidae.
Here we observe that Passerellidae is most common family that have collisions. But its is highly possible that Passerellidae may be majority in the observed cities. If we an estimate of percentage distribution of birds by family we can take a weighted proportion and then check for most common family of birds.
Collision by Genus of Birds
We can see that Melospiza Genus of birds have the highest collisions followed by Zonotrichia and Catharus.
Here we observe that Melospiza is most common genus that have collisions. But its is highly possible that Melospiza may be majority in the observed cities. If we an estimate of percentage distribution of genus by family we can take a weighted proportion and then check for most common genus of birds.
Collision by Species of Birds
We can see that Albicollis species of birds have the highest collisions followed by Hyemalis and Melodia.
Here we observe that Albicollis is most common species that have collisions. But its is highly possible that Albicollis may be majority in the observed cities. If we an estimate of percentage distribution of species by family we can take a weighted proportion and then check for most common species of birds.
Collision by Habitat of Birds
We can see that birds that usually dwell in Forest are twice as much as birds that live on edge and almost 7 times as much as birds who live in the open.
Here we can see that birds that are not used to artificial lighting are more susuptible for collisions. This could be a compelling feature.
Collision by Locality of Birds
We can see that there is an equal distribution of birds in both localities.
Collision by Stratum of Birds
We can see that the lower stratum birds have twice as much as collision than upper stratum birds.
Here we can see that birds who live on lower stratum are more susuptible for collisions. This could be a compelling feature.
Summary & Findings
1. Passerellidae Family of birds have the highest collisions followed by Parulidae and Turdidae
2. Melospiza Genus of birds have the highest collisions followed by Zonotrichia and Catharus
3. Albicollis species of birds have the highest collisions followed by Hyemalis and Melodia
4. Birds that usually dwell in Forest are twice as much as birds that live on edge and almost 7 times as much as birds who live in the open
5. The lower stratum birds have twice as much as collision than upper stratum birds
6. Birds who employ flight calls have twice as much as collision than the birds who don't employ flight calls
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file Tiger_Assessment-0.0.11.tar.gz.
File metadata
- Download URL: Tiger_Assessment-0.0.11.tar.gz
- Upload date:
- Size: 10.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.6.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fdc18fc943fbfd9eda8db984ef974f3fd36fe97107a9bc978daff1537eab1a55
|
|
| MD5 |
3f7eb2517a9c44661eb2a092f1cf589b
|
|
| BLAKE2b-256 |
131c0e0e7175876246018b7cf8f03dd4357bb3fa1761f802ad6c5e32e6fbde0f
|
File details
Details for the file Tiger_Assessment-0.0.11-py3-none-any.whl.
File metadata
- Download URL: Tiger_Assessment-0.0.11-py3-none-any.whl
- Upload date:
- Size: 14.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.6.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f678007b8a0ee42224b8e5f663eb26ab31c5a8927b774cffae1cab1fc59f0a35
|
|
| MD5 |
631f328c4f14d419fb47f72fad3b56ec
|
|
| BLAKE2b-256 |
9e4931fa0d1a031687656e38450ed691739438b047183d9a48cb2608c426d1dd
|