Estimate and analyze fishing effort using geographical position of vessels (AIS or VMS)
Project description
fishingeffort
Estimate and analyze fishing effort of demersal mobile fishing gears (e.g., bottom trawlers) from global positioning
systems (GPS) such as AIS and VMS data.
Calculates vessel tracks, independent fishing hauls, and fishing effort in swept area ratio (SAR).
Additional functions are available to visualize the output (raw datapoints, hauls, or SAR).
Installation
You can install fishingeffort using pip:
pip install fishingeffort
Requirements
Currently, fishingeffort has the following dependencies:
- pandas
- numpy
- geopandas
- geocube
- scipy
- cartopy
- rasterio
- seaborn
Usage
A full description of the application and validation of this package will be provided in a peer-reviewed manuscript, currently under preparation.
Since this package relies on geographical data, knowledge on how to handle geographical data and different projections, is highly benefitial.
1. Read data and clean up
To be able to quantify fishing effort, the data needs to at least have columns that relate to date and time) (e.g., 2025-01-25 12:57:01), vessel identifier or unique vessel name (there can not be two different vessels with the same name), coordinates latitude and longitude in decimal degrees (e.g., 41.015, 3.007), vessel speed in a consistent unit (knots, km/h, miles/h).
Although optional, to properly quantify fishing effort, it is important to have additional information about the fishing fleet. For instance, what is the width of the fishing gear in contact with the seafloor? This metric is used to properly quantify the swept area ratio (SAR).
Also, you will have to validate the identification of hauls, so a general knowledge of the fishing grounds is important (if the algorithm identifies that fishing is happening in a extremely rocky bottom, you know that the trawlers are definitely not fishing there). In these situations, the inputted parameters such as minimum haul time, mininum duration of false negatives or maximum duration of false positives (see next section for more detail).
import fishingeffort.fishingeffort as fe
# The package currently supports reading data stored in .csv, .xls, or .xlsx files.
# If the file is not in the working directory, you can specify in which directory to find it.
df_csv = fe.read_df(file_dir='C:/Your/file/directory',
file='your_file.csv')
df_xlsx = fe.read_df(file_dir='C:/Your/file/directory',
file='your_file.xlsx')
If the frequency of data is too high (every few seconds), it will be very computationally intensive to run the algorithm. In those situations, we recommend to reduce the frequency to once every minute (Paradis et al., 2021).
The package fishingeffort has a function that does this.
import fishingeffort.fishingeffort as fe
# Function that extracts the first entry of each minute for each vessel to make the algorithm less computationally intensive
new_df = fe.data_reduction_min(df=df, # DataFrame with the data
datetime_column='Date', # Column with date and time. It will be converted to datetime format.
name_column='Vessel_Id', # Column with unique vessel identifier
additional_columns=['latitude', 'longitude', # List of additional columns to be retained
'vessel_speed', 'fishing_gear', 'LoA']
)
2. Define fishing speed
The first step in the quantification of fishing effort is the identification of bottom trawler. As observed by Natale et al. (2015), Oberle et al. (2016), and Paradis et al. (2021), bottom trawlers' speed follow a bimodal distribution corresponding to fishing (first gaussian distribution, low speed) and navigating (second gaussian distribution, high speed). In some exceptions, the speed distribution can present a trimodal distribution, associated to the drifting of vessels by strong currents (Arjona-Camas et al., 2022).
The range of fishing speed is extracted by fitting either a bimodal or trimodal gaussian distribution of the dataset.
import fishingeffort.fishingeffort as fe
# Identify fishing and navigation speed by fitting a bimodal gaussian distribution (default) to the dataset
speed_parameters = fe.define_fishing_speed(df=df, # DataFrame with all the data
speed_column='vessel_speed', # Column name of vessel speed
mean_trawl=2, # Estimated mean fishing speed
mean_nav=10 # Estimated mean navigating speed
)
# If the data has a trimodal distribution
speed_parameters = fe.define_fishing_speed(df=df,
speed_column='vessel_speed',
modal='trimodal', # If data has a trimodal distribution, it is defined here
mean_drift=0.5, # If a trimodal distribution needs to be fitted, provide mean speed of the first distribution
mean_trawl=2,
mean_nav=10)
# In the case of a bi-modal distribution, it returns a DataFrame like this
+------------+---------+-----------+---------------+---------------+
| | mean | std_dev | range_lower | range_upper |
+============+=========+===========+===============+===============
| trawl | 2.54851 | 0.780046 | 1.01962 | 4.0774 |
+------------+---------+-----------+---------------+---------------+
| navigating | 8.82649 | 2.02624 | 4.85507 | 12.7979 |
+------------+---------+-----------+---------------+---------------+
The package fishingeffort also has a helper functions to easily visualize different outputs.
For example, you can visualize the identification of fishing speed using the draw_histogram function in the show module as such:
import fishingeffort.show as fe_show
fe_show.draw_histogram(data=df['vessel_speed'], # Data Series with vessel speed
df_speed_params=speed_parameters, # Output DataFrame of the define_fishing_speed function
fig_name='speed_hist', # Name that the figure will be stored as (optional)
format_fig='jpg' # Format of the figure to be stored in
)
3. Identify demersal fishing activity based on fishing speed
Simply filtering datapoints based on speed can lead to false-positives, when fishing vessels are moving at fishing speeds, but they are actually not fishing (e.g., pulling up the net while drifting at the same fishing speed), or false-negatives when vessels are fishing at anomalous speeds for a few minutes due to specific conditions (e.g., trawlers reduce their speed when fishing downslope in order to keep the gear on the seafloor).
To account for this, a tolerance of the maximum duration of false-positives and false-negatives is set:
- If the vessel presents consecutive fishing speeds that last less than the maximum duration of false-positives, it is considered that the vessel was not fishing and the vessel was moving at fishing speed while not fishing (it was a false-positive).
- If the vessel presents consecutive fishing speeds that last less than the maximum duration of false-negatives, and this occurs before and after the vessel is actually moving at fishing speeds (the vessel is moving at fishing speeds and for a short period of time it stops moving a fishing speed), it is considered that the vessel is actually fishing at these anomalous speeds (it was a false-negative).
After accounting for false-positives and false-negatives, the identification of a fishing trip, or a fishing haul, is determined when a vessel has consecutive entries at fishing speeds for a minimum duration (min_haul). For example, many bottom trawlers maintain their nets on the seafloor for at least one hour. Hence, a vessel needs to have consecutive entries that last at least one hour for it to be considered a haul.
Finally, vessel positioning may be missing for a specific time, but this does not necessarily imply that the haul has ended. This is accounted for by establishing a tolerance time (turn_off_time) where this gap of data is simply ignored. For instance, there may be a gap of data that lasts 30 mins, but it should not be considered as the end of a haul.
Hauls are automatically separated based on day (the same haul can not occur in two different days) and vessel (consecutive entries are assessed for each vessel independently).
This leads to four important parameters:
- min_haul = Minimum haul duration in minutes
- max_duration_false_positive = Maximum duration in minutes that a vessel is not fishing but still moving at fishing speed
- max_duration_false_negative = Maximum duration in minutes that a vessel is fishing but not at the specified fishing speed.
- turn_off_time = Maximum time in minutes where a gap of data is ignored.
import fishingeffort.fishingeffort as fe
# Function that identifies if a vessel if fishing based on the criteria described above.
df_out, cnt = fe.identify_fishing(df=df, # DataFrame with the data
datetime_column='Date', # Column with date and time. It will be converted to datetime format.
name_column='Vessel_Id', # Column with unique vessel identifier.
speed_column='vessel_speed', # Column with vessel speed.
min_trawl_speed=1.02, # Minimum fishing speed, as defined from the previous criteria
max_trawl_speed=4.08, # Maximum fishing speed, as defined from the previous criteria
min_nav_speed=4.86, # Minimum navigating speed, as defined from the previous criteria
max_duration_false_positive=15, # Maximum duration (mins) before the data is considered to be false-positive
max_duration_false_negative=5, # Maximum duration (mins) before the data is considered to be false-negative
min_haul=60, # Minimum duration (mins) of a fishing haul
turn_off_time=30, # Maximum time (mins) where a gap of data is ignored
remove_no_hauls=False, # Boolean to determine whether the output DataFrame should remove entries that are
# not considered fishing (True) or if these entries should be kept (False)
date_format="%d-%m-%y %H:$M" # Date-time format. See datetime.strftime() codes
)
# This function returns a DataFrame (df) with an additional column "Haul id" which is an identifier of all the unique hauls
# detected in this dataset, as well as the total number of hauls detected (cnt).
Determining these parameters is crucial to properly identify fishing effort. General knowledge of the fishing activity is important to determine if these four parameters are correct. Luckily, the package fishingeffort also has helper functions to easily visualize the output.
import fishingeffort.show as fe_show
# Function that iteratively plots vessel positioning, randomly choosing a vessel and day.
fe_show.fishing_identification_check(df=df_out, # Output DataFrame of identify_fishing() function
name_column='Vessel_Id', # Column with unique vessel identifier.
datetime_column='Date', # Column with date and time. It will be converted to datetime format.
latitude='lat', # Column with latitude
longitude='lon', # Column with longitude
input_crs='epsg:4326', # Input coordinate system of coordinates. Defaults to WGS 84.
n_fig=9, # Number of subplots to generate.
fig_name='parameter_evaluation', # Name of output figure and figure format.
format_fig='jpg'
)
Note that for Vessel 224087460 on January 2 and on January 3, consecutive entries of True trawling events are interrupted by False trawling events. Since trawlers seldomly stop fishing for such a short period, this most probably indicates that it's a false-negative, the vessel was actually fishing but it was classified as non-fishing. In addition, no fishing trips were identified for Vessel 224566950 on January 2 and for Vessel 224137940 on January 2, which probably indicates that the parameters are too strict (e.g., minimum haul is too long).
4. Convert point data (geographical positioning) to line data (fishing hauls)
After identifying the vessel positions when fishing is occurring, the data can be transformed into line data by connecting the points of the same fishing trip (Haul id) identified during the previous step.
This function requires the following parameters:
- df = DataFrame with "Haul id" as a column. It should be the output of the previous function identify_fishing
- name_column = Column name of the unique vessel identifier
- latitude = Column name with latitude in decimal degrees (WGS84)
- longitude = Column name with longitude in decimal degrees (WGS84)
- output_crs (optional) = epsg code of the output coordinate reference system, ideally a UTM projection so that the units are in meters (needed to identify fishing effort, next step). If left blank, it will return a GeoDataFrame in WGS84 coordinate system.
import fishingeffort.fishingeffort as fe
df_hauls = fe.point_to_line(df=df,
name_column='Vessel_Id',
latitude='latitude', longitude='longitude',
output_crs='epsg: 32631')
# If you don't know the UTM zone, it can also be extracted using the following additional function
utm_zone = fe.utm_zone_epsg(df=df, latitude='latitude', longitude='longitude')
# And add it to the previous function
df_hauls = fe.point_to_line(df=df,
name_column='Vessel_Id',
latitude='latitude', longitude='longitude',
output_crs=utm_zone)
5. Calculate fishing effort as swept area ratio (SAR)
If you want to convert the hauls (line data) into swept area ratio (raster), this can be done with the swept_area_ratio function.
This function requires as an input a GeoDataFrame with lines as geometries, such as the output of the point_to_line function.
You also need to specify the grid size (the width or height of each grid cell). The units are the same as the input
Coordinate Reference System, which is why we recommend users to convert to UTM (meters).
Ideally, you should provide the gear width,
since this is needed to calculate the estimated area impacted by the fishing gear.
You can optionally also specify the bounds, or outer limits, of the raster if you want to ensure that the raster cells are
always positioned in the same location, or to ensure that the raster is aligned to another raster.
The data is then saved in a raster as a GeoTiff (.tif).
import fishingeffort.fishingeffort as fe
fe.swept_area_ratio(grid_size=100, # Since the input Coordinate Reference System is in UTM (meters),
# then the units of the grid size are also in meters. In this case, the output grid size is in 100 x 100 m
gdf=df_hauls, # GeoDataFrame of hauls in lines
file_name=f'SAR.tif', # Output file name, in .tif
gear_width=100, # Width of trawling gear, in the same units of the CRS, in this case meters
dir_out='output' # Like the other function, you can specify in which directory to save the data in
)
Additional functions
To facilitate handling of data, there are several additional functions in this package.
Since data of vessel positioning (GPS points) and hauls (lines) can be quite large, the package has a series of functions to save the data.
import fishingeffort.fishingeffort as fe
# If you want to save all the data in a specified format (.csv, .shp, .xlsx, or .xls)
fe.save_all_data(df=df,
file_name='VMS_positions',
output_type='csv',
dir_output='C:/My/Folder' # Specify which directory to save the data to, if different than the working directory
)
# You can also save your files into a shapefile.
# If the input DataFrame is not a GeoDataFrame, it will be created before saving it as a shapefile.
# In those situations, the columns for latitude and longitude, as well as the coordinate system, need to be specified
# if it is a DataFrame. If it is a GeoDataFrame, it will automatically save with its own geometry.
fe.save_all_data(df=df,
file_name='VMS_positions',
output_type='.shp',
latitude='Lat', longitude='Lon', # Column names of latitude and longitude
input_crs='epsg:4326' # Coordinate system EPSG code (e.g., WGS84)
)
# If you have a lot of data, it can also be separated and saved based on year or month
fe.save_all_data_months(df=df_hauls,
datetime_column='Date',
dir_output='C:/My/Folder/Monthly_data', # Since the data is not saved in one file, but is
# instead separated into smaller files per month, if an output directory is not specified,
# all the individual files will be saved in the current working directory.
# The output name of each individual file will be the year and month of the data (e.g., 2019_02.csv)
output_type='.csv')
fe.save_all_data_years(df=df_hauls,
datetime_column='Date',
dir_output='C:/My/Folder/Monthly_data', # Since the data is not saved in one file, but is
# instead separated into smaller files per year, if an output directory is not specified,
# all the individual files will be saved in the current working directory.
# The output name of each individual file will be the year of the data (e.g., 2019.csv)
output_type='.csv')
Likewise, there are functions that reads all the files in one directory and combines them all in one DataFrame. This function assumes that all the data to be opened in the directory is saved in the same format (they are all .csv, .xlsx, or .shp) and that they have the same columns, to be able to easily concatenate them.
import fishingeffort.fishingeffort as fe
df = fe.read_all_data(file_dir='C:/My/Folder/Monthly_data',
file_type='.csv')
There is a function that quickly filters out data based on speed or based on fishing gear. These functions are simple and are aimed to simplify the task of someone that is not familiar with Python and Pandas.
import fishingeffort.fishingeffort as fe
df = fe.filter_speeds(df=df,
speed_column='vessel_speed', # Column name where the vessel speed is stored
min_speed=0, max_speed=20 # Default minimum and maximum speed (in knots)
)
df = fe.filter_trawlers(df=df,
column_gear='Gear_type', # Column name where the gear type is stored
gear_name='OTB' # Name of the gear type to filter
)
About us
Author: Sarah Paradis
Cite us: Proper citation of this package will be coming soon.
Release History
- 0.1.0 August 2025
- New release of fishingeffort
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contributing
Contributions are welcome!
Please open issues or pull requests on GitHub.
Send an email with questions or any other bugs you encounter.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fishingeffort-0.1.1.tar.gz.
File metadata
- Download URL: fishingeffort-0.1.1.tar.gz
- Upload date:
- Size: 20.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d6dfdd4dde64c00d49df161608725540c7c72618e1795a71df708a1d38eb5642
|
|
| MD5 |
14a0903c621dbd0670cc14d45a8d4d67
|
|
| BLAKE2b-256 |
2c226f6591d93f34306223735c8c7a20029f3a22f4a2793facc140f74c228076
|
File details
Details for the file fishingeffort-0.1.1-py3-none-any.whl.
File metadata
- Download URL: fishingeffort-0.1.1-py3-none-any.whl
- Upload date:
- Size: 23.7 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
32bd970b11cf1496cb2ff6bd423702147c4f4fee4d37e7cc9f1be96889d29b28
|
|
| MD5 |
06668af9d7034e151736fec095c16cd5
|
|
| BLAKE2b-256 |
5037defc822ab6ebc619757cdcbce1bc5ff2e880c6e4e9fbae0397d408efdb17
|