Skip to main content

Georeferenced CSV data processing

Project description

PositionData package

PositionData is a Python package specifically tailored for surveyors and geophysicists engaged in aerial data collection. This user-friendly tool is designed to streamline the processing of positional CSV data generated by SkyHub, an advanced onboard computer system used in drones.

SkyHub is renowned for its versatility in data logging from a variety of drone-mounted sensors. These sensors range from methane detectors to wind sensors, magnetometers, echo sounders, and more, each providing valuable scalar georeferenced readings essential for a wide array of applications.

Our package simplifies the handling of this rich dataset, offering efficient ways to interpret, analyze, and visualize the collected geospatial data. Whether it's for environmental monitoring, resource exploration, or geographical mapping, PositionData enhances the capabilities of professionals in extracting meaningful insights from their aerial surveys.

Key features include trajectory analysis, data cleaning, spatial interpolation, and export functionalities that convert raw data into actionable intelligence. Designed with the needs of surveyors and geophysicists in mind, this package is an indispensable tool in the era of drone-assisted geophysical exploration and surveying.

Package is being maintained by SPH Engineering .

Classes/Features

  • PositionData - base methods for loading, filtering, clipping, exporting data
  • MethaneData - methane map generation
  • Trajectory - creating and exporting geographic trajectories
  • WindData - true wind vector processing, wind rose generation

Examples

Mapping methane leaks

from PositionData import PositionData
from PositionData import MethaneData

# Assuming PositionData is already loaded with necessary columns
position_data = PositionData('path/to/your/data.csv')

# Create a MethaneData instance
methane_data = MethaneData(position_data)

# Path where the GeoTIFF will be saved
output_map_path = 'path/to/save/methane_map.tif'

# EPSG code for the area's coordinate system
area_epsg = '32635'

# Generate the methane concentration map
methane_data.map_methane(map_path=output_map_path, 
                         area_epsg=area_epsg, 
                         grid_rows=100, 
                         grid_columns=100, 
                         environment_methane_perc=95, 
                         ignore_invalid=True)

Making the trajectory polyline from sensor readings

from PositionData import Trajectory

# Assuming you have a PositionData instance named 'position_data'
# which includes 'Date' and 'Time' columns
trajectory = Trajectory(position_data, 'Date', 'Time', tolerance=5.0, projection='EPSG:32635')

# Calculating duration
duration_in_minutes = trajectory.duration(unit='minutes')
print(f"Duration of the trajectory: {duration_in_minutes} minutes")

# Generating polyline and estimating length
polyline_gdf, length = trajectory.polyline()
print(f"Length of the simplified trajectory: {length} meters")

Wind data processing

This example demonstrates how to process wind data using the PositionData and WindData classes. The steps include loading data from a CSV file, clipping by a polygon, calculating the platform direction, and generating a windrose.

from PositionData import PositionData
from PositionData  import WindData

# Assuming 'data.csv' is your CSV file with wind data
position_data = PositionData('data.csv')

# Assuming 'clip_polygon.geojson' is your GeoJSON file with the clipping polygon
clipped_data = position_data.clip_by_polygon('clip_polygon.geojson')

# Calculate platform direction relative to north as a Direction column
data_with_direction = clipped_data.calculate_direction('Direction')

# Initialize wind data and generate true wind as TrueWindSpeed and TrueWindDirection
wind_data = WindData(clipped_data, 'Air:Speed', 'Air:Direction', 'Velocity', 'Direction', 'TrueWindSpeed', 'TrueWindDirection')

# Save the windrose plot as 'windrose.png'
wind_data.build_windrose('TrueWindSpeed', 'TrueWindDirection', 'windrose.png')

Reference

PositionData Class

The PositionData class is designed for handling and processing geospatial data from CSV or GeoJSON files. It provides methods for cleaning data, filtering, clipping, computing statistics, and more.

Initialization

PositionData(input_file, file_format='csv', latitude_prop='Latitude', longitude_prop='Longitude', crs="epsg:4326")

Initializes the PositionData object with data from a CSV or GeoJSON file.

Parameters:

  • input_file: Path to the CSV or GeoJSON file.
  • file_format: The format of the input file ('csv' or 'geojson').
  • latitude_prop: Name of the latitude column (default 'Latitude').
  • longitude_prop: Name of the longitude column (default 'Longitude').
  • crs: Coordinate reference system for the GeoDataFrame (default 'epsg:4326').

Example:

position_data = PositionData("data.csv")

Methods

clean_nan(columns)

Cleans the data by removing rows with NaN values in the specified columns. This method is useful for ensuring data quality and integrity.

Parameters:

  • columns: A list of column names to check for NaN values.

Example:

# Assuming position_data is an instance of PositionData
cleaned_data = position_data.clean_nan(['Latitude', 'Longitude'])

shape()

Returns the shape of the data, which includes the number of rows and columns in the GeoDataFrame. This method is essential for understanding the dimensions of your dataset.

Example:

# Assuming position_data is an instance of PositionData
data_shape = position_data.shape()
print("Number of rows and columns:", data_shape)

filter_range(column_name, min, max)

Filters the data by column value within a specified range. This method is particularly useful for narrowing down the dataset to a specific range of values in a given column, which can be essential for focused analysis or data visualization.

Parameters:

  • column_name: Name of the column to apply the filter on.
  • min: The minimum value of the range. If None, no lower limit is applied.
  • max: The maximum value of the range. If None, no upper limit is applied.

Example:

# Assuming position_data is an instance of PositionData
# Filter data where the values in 'Velocity' column are between 10 and 20
filtered_data = position_data.filter_range('Velocity', 10, 20)
print(filtered_data)

clip_by_polygon(clip_polygon_geojson)

Clips the internal data to the boundaries of a provided polygon, as specified in a GeoJSON file. This method is useful for spatially subsetting the data to a specific geographic area, allowing for focused analysis within that area.

Parameters:

  • clip_polygon_geojson: The path to the GeoJSON file containing the polygon against which the data will be clipped.

Example:

# Assuming position_data is an instance of PositionData
# Clip the data using the boundaries defined in 'clip_polygon.geojson'
clipped_data = position_data.clip_by_polygon('clip_polygon.geojson')
print(clipped_data)

filter_noize(property_name, filter_type, window_size=3)

Applies a moving window filter to a specified property of the GeoDataFrame. This method is useful for smoothing or reducing noise in the data, particularly in cases where the data contains fluctuations or irregularities that can obscure underlying trends or patterns.

Parameters:

  • property_name: The name of the property (column) on which to apply the filter.
  • filter_type: The type of filter to apply ('average' or 'median').
  • window_size: The size of the moving window, defaulting to 3.

Example:

# Assuming position_data is an instance of PositionData
# Apply a moving average filter with a window size of 5 to the 'Velocity' property
filtered_data = position_data.filter_noize('Velocity', 'average', 5)
print(filtered_data)

columns()

Retrieves an array of column names from the GeoDataFrame within the PositionData instance. This method provides a quick way to access and review the columns present in the geospatial dataset, aiding in data exploration and analysis.

Returns:

  • Array of Column Names: An array containing the names of all columns in the GeoDataFrame.

Example:

# Assuming position_data is an instance of PositionData
# Retrieve and print the column names
column_names = position_data.columns()
print("Column names:", column_names)

statistics(column, bins=10)

Calculates and returns key statistics and a probability distribution for a selected column in the GeoDataFrame. This method is instrumental for understanding the distribution and central tendencies of data in a particular column, which is crucial for data analysis and decision-making.

Parameters:

  • column: The name of the column for which statistics are to be calculated.
  • bins: The number of bins to use for the probability distribution histogram, with a default value of 10.

Example:

# Assuming position_data is an instance of PositionData
# Calculate statistics for the 'Velocity' column
velocity_stats = position_data.statistics('Velocity')
print(velocity_stats)

calculate_direction(direction_property)

Calculates the direction between consecutive points in the GeoDataFrame and stores it in a specified property. This method is valuable for analyzing the directional trends in spatial data, such as determining the course of movement in tracking data or understanding directional patterns.

Parameters:

  • direction_property: The name of the property (column) where the calculated direction values will be stored.

Example:

# Assuming position_data is an instance of PositionData
# Calculate the direction between consecutive points and store in a new column 'Direction'
direction_data = position_data.calculate_direction('Direction')
print(direction_data)

export_as_geojson(self, output_path)

Exports the current state of the GeoDataFrame to a GeoJSON file. This method is useful for saving processed or analyzed geospatial data in a standardized format, which can then be used in various GIS applications or further data analysis tools.

Parameters:

  • output_path: The file path where the GeoJSON file will be saved.

Example:

# Assuming position_data is an instance of PositionData
# Export the data to 'exported_data.geojson'
position_data.export_as_geojson('exported_data.geojson')

export_as_csv(self, output_path)

Exports the current state of the GeoDataFrame to a CSV file. This method is useful for saving processed or analyzed geospatial data in a standardized format, which can then be used in various GIS applications or further data analysis tools.

Parameters:

  • output_path: The file path where the CSV file will be saved.

Example:

# Assuming position_data is an instance of PositionData
# Export the data to 'exported_data.geojson'
position_data.export_as_csv('exported_data.geojson')

deduplicate_skyhub_data()

Deduplicates the GeoDataFrame stored in the PositionData instance. This method specifically targets a predefined set of columns related to skyhub data (like 'GAS:Methane', 'GAS:Status', 'AIR:Speed', 'AIR:Direction', along with latitude and longitude properties) for the deduplication process. It filters out the duplicates based on the intersection of these predefined columns and the columns actually present in the data. The method ensures that only unique records are retained, making the dataset more concise and relevant for analysis.

Returns:

  • A new instance of PositionData containing the deduplicated data.

Example:

# Assuming position_data is an instance of PositionData
# Deduplicate the data and store in a new instance
deduplicated_data = position_data.deduplicate_skyhub_data()

cut_useless_skyhub_columns()

Streamlines the GeoDataFrame within the PositionData instance by retaining only a specified subset of columns. This method focuses on the columns listed in self.skyhub_columns and ensures that the essential 'geometry' column is also included. By filtering out unnecessary columns, this method helps in creating a more focused and relevant dataset, particularly useful in scenarios where only specific data points are of interest.

  • Keeps only the columns that are both listed in self.skyhub_columns and present in the GeoDataFrame.
  • Ensures the inclusion of the 'geometry' column, crucial for maintaining geospatial data integrity.
  • Excludes all other columns not specified in self.skyhub_columns or absent from the DataFrame.

Returns:

  • A new instance of PositionData containing the streamlined GeoDataFrame.

Example:

# Assuming position_data is an instance of PositionData
# Streamline the data to include only specified columns
streamlined_data = position_data.cut_useless_skyhub_columns()

MethaneData Class

Class Overview

MethaneData is a Python class designed for processing and visualizing methane concentration data. It generates a GeoTIFF map based on methane readings, taking into account the location, status, and environmental thresholds of methane concentration.

Initialization

MethaneData(position_data, methane_column='GAS:Methane', status_column='GAS:Status')

Initializes the MethaneData object.

Parameters:

  • position_data (PositionData): An instance of PositionData containing methane readings along with location data.
  • methane_column (str): The name of the column in PositionData that contains methane readings. Default is 'GAS:Methane'.
  • status_column (str): The name of the column in PositionData that indicates the status of methane readings. Default is 'GAS:Status'.

Description:

The constructor initializes the MethaneData instance, cleaning the data in position_data by removing NaN values from the specified methane and status columns. It also sets the NO_DATA_MAX_LEVEL and NO_DATA_VALUE for handling missing data in the interpolation process.


Methods

map_methane(map_path, area_epsg, grid_rows=100, grid_columns=100, environment_methane_perc=95, ignore_invalid=True)

Generates a GeoTIFF map representing methane concentration levels.

Parameters:

  • map_path (str): File path where the GeoTIFF file will be saved.
  • area_epsg (str): The EPSG code of the area for handling coordinate reference system conversions.
  • grid_rows (int): Number of rows in the interpolation grid. Default is 100.
  • grid_columns (int): Number of columns in the interpolation grid. Default is 100.
  • environment_methane_perc (int): The percentage used to determine the environmental methane threshold. Default is 95.
  • ignore_invalid (bool): If set to True, invalid readings (based on status_column) will be ignored. Default is True.

Description:

This method processes the methane data and generates a GeoTIFF map. It first filters out invalid readings if ignore_invalid is True. It then calculates an adjusted methane concentration by subtracting an environmental methane threshold (determined by environment_methane_perc) from the actual readings. The method interpolates these adjusted values over a specified grid and saves the result as a GeoTIFF file at map_path.

Notes:

  • The method checks if the coordinate reference system (CRS) of position_data is geographic (EPSG:4326). If it is not, CRS conversion is performed based on area_epsg.
  • Zero values in the interpolated grid are replaced with NO_DATA_VALUE to represent areas with no data.
  • The method handles NaN values and ensures that the output GeoTIFF correctly represents the methane concentration across the given area.

Trajectory Class

Class Overview

The Trajectory class provides functionalities for creating, managing, and exporting geographic trajectories. It inherits from PositionBase and utilizes geographical data to generate simplified trajectory polylines and calculate durations.

Initialization

Trajectory(position_data, date_column, time_column, tolerance, projection)

Description

The __init__ method initializes the Trajectory object, setting up essential parameters and generating the trajectory polyline. This method processes positional data with specified columns for date and time, creating a simplified trajectory representation.

Parameters

  • position_data (PositionData): An instance of PositionData containing the positional information for the trajectory.
  • date_column (str): The name of the column in position_data that contains the date information.
  • time_column (str): The name of the column in position_data that contains the time information.
  • tolerance (float): The tolerance distance in meters for simplifying the trajectory. Determines how much deviation from the original path is allowed.
  • projection (str): The EPSG code of the projected coordinate system for distance calculations. A projected CRS is crucial for accurate distance measurements.

Example Usage

from SkyHubDataProcessor import Trajectory

# Example: Creating a Trajectory instance
# Assume 'position_data' is an instance of PositionData with date and time columns
trajectory = Trajectory(position_data, 'DateColumn', 'TimeColumn', tolerance=5.0, projection='EPSG:32635')

Methods

duration(unit='seconds')

Description

The duration method of the Trajectory class calculates the total duration between the first and last record in the trajectory data. This duration is useful for understanding the time span covered by the trajectory, which can be important for analyses like calculating average speeds, understanding usage patterns, or synchronizing with other time-dependent data.

Parameters

  • unit (str, optional): Specifies the unit of time for the duration. The available options are 'seconds', 'minutes', and 'hours'. The default is 'seconds'.

Returns

  • float: The duration between the first and last record in the specified unit of time.

Example Usage

# Assuming 'trajectory' is an instance of the Trajectory class
duration_in_seconds = trajectory.duration(unit='seconds')

polyline()

Description

The polyline method generates a simplified representation of the trajectory as a polyline. This method simplifies the trajectory data to a LineString geometry based on a specified tolerance, which can be helpful for visualizing or analyzing the path in a more concise form.

Parameters

  • output_path (str): The file path where the simplified trajectory will be saved in GeoJSON format.
  • tolerance (float): The tolerance distance for simplification in meters. Smaller values will result in a polyline closer to the original trajectory, while larger values will produce a more simplified representation.
  • projection (str): The projection system to use for distance calculation during simplification. This should be a string representation of an EPSG code for a projected coordinate system.

Returns

  • A tuple containing:
    • GeoDataFrame: A GeoDataFrame object containing the simplified polyline.
    • float: The length of the simplified polyline in the units of the specified projection system.

Example Usage

# Assuming 'trajectory' is an instance of the Trajectory class
polyline_gdf, polyline_length = trajectory.polyline(output_path='simplified_trajectory.geojson', tolerance=5.0, projection='EPSG:32635')

export_as_geojson(output_path)

Description

The export_as_geojson method exports the trajectory's simplified polyline as a GeoJSON file. This method is useful for creating a standard GeoJSON representation of the trajectory, which can be used in various GIS applications or for further geographic analyses. The method ensures that the exported GeoJSON is in the WGS 84 coordinate reference system (EPSG:4326), which is the standard for GeoJSON files.

Parameters

  • output_path (str): The file path where the GeoJSON file will be saved.

Returns

This method does not return a value. It creates a GeoJSON file at the specified output_path.

Example Usage

# Assuming 'trajectory' is an instance of the Trajectory class
trajectory.export_as_geojson('trajectory.geojson')
print(f"Trajectory exported as GeoJSON to 'trajectory.geojson'")

WindData Class

The WindData class is designed for processing and analyzing wind data in a geospatial context. It includes methods for calculating true wind speed and direction, gridding measurements, and building windrose plots.

Initialization

WindData(position_data, air_speed_prop, air_dir_prop, platform_speed_prop, platform_dir_prop, true_speed_prop, true_dir_prop, sensor_cw_rot=0, sensor_to_north=False)

Initializes the WindData object with an instance of PositionData and properties related to wind and platform motion. it automatically calculates tru wind vectors.

Parameters:

  • position_data: An instance of PositionData.
  • air_speed_prop: Property name for air speed.
  • air_dir_prop: Property name for air direction.
  • platform_speed_prop: Property name for platform speed.
  • platform_dir_prop: Property name for platform direction.
  • true_speed_prop: Property name for true wind speed.
  • true_dir_prop: Property name for true wind direction.
  • sensor_cw_rot: CW rotation of the sensor relative to the platform nose.
  • sensor_to_north: If true, sensor readings are related to North; otherwise, relative to the platform nose.

Example:

wind_data = WindData(position_data, 'Air:Speed', 'Air:Direction', 'Velocity', 'Direction', 'TrueWindSpeed', 'TrueWindDirection')

Methods

build_windrose(speed_col, direction_col, output_path, bins=[0,2,4,6,8,10], nsector=16, title="Windrose")

Builds and saves a windrose plot. This method is valuable for visually representing the distribution of wind speeds and directions, which is crucial in meteorological studies and applications such as sailing, aviation, and architecture.

Parameters:

  • speed_col: Name of the wind speed column.
  • direction_col: Name of the wind direction column.
  • output_path: Path to save the generated windrose image.
  • bins: Binning for wind speed (default is [0,2,4,6,8,10]).
  • nsector: Number of sectors for the windrose (default is 16).
  • title: Title of the windrose plot (default is "Windrose").

Example:

# Assuming wind_data is an instance of WindData
wind_data.build_windrose('TrueWindSpeed', 'TrueWindDirection', 'windrose.png', bins=[0,2,4,6,8,10], nsector=16, title="Windrose")

grid_wind(speed_property, direction_property, method='linear', resolution=100)

Creates a gridded representation of the wind measurements. This method is useful for visualizing and analyzing spatial variations in wind patterns, particularly in applications like meteorology, environmental monitoring, and renewable energy studies.

Parameters:

  • speed_property: The name of the column representing wind speed.
  • direction_property: The name of the column representing wind direction.
  • method: The interpolation method for gridding (default is 'linear'). Other options are available as per scipy.interpolate.griddata.
  • resolution: The resolution of the grid (default is 100). Higher values provide finer grids.

Example:

# Assuming wind_data is an instance of WindData
gridded_wind_data = wind_data.grid_wind('TrueWindSpeed', 'TrueWindDirection', method='linear', resolution=100)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PositionData-0.1.11.tar.gz (31.1 kB view hashes)

Uploaded Source

Built Distribution

PositionData-0.1.11-py3-none-any.whl (29.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page