Converts GIS annotations to Microsoft's Common Objects In Context (COCO) dataset format

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

PyPI - Python Version

Easily transform your GIS annotations into Microsoft's Common Objects In Context (COCO) datasets with GeoCOCO. This tool allows users to leverage the advanced digitizing solutions of modern GIS software for the annotations of image objects in geographic imagery.

Built with Pydantic and pycocotools, it features a complete implementation of the COCO standard for object detection with out-of-the-box support for JSON-encoding and RLE compression. The resulting datasets are versioned, easily extendable with new annotations and fully compatible with other data applications that accept the COCO format.

Key features

User-friendly: GeoCOCO is designed for ease of use, requiring minimal configuration and domain knowledge
Version Control: Datasets created with GeoCOCO are versioned and designed for expansion with future annotations
Command-line Tool: Use GeoCOCO from your terminal for quick conversions
Python Module: Integrate GeoCOCO in your own data applications with the geococo package
Representation: GeoCOCO maximizes label representation through an adaptive moving window approach
COCO Standard: Output datasets are fully compatible with other COCO-accepting applications
Compact File Size: JSON-encoding and RLE compression are employed to ensure compact file sizes

Installation

Installing from the Python Package Index (PyPI):

# Install from PYPI with Python's package installer (pip)
pip install geococo

Example of usage

After installing geococo, there are a number of ways you can interact with its API.

Command line interface

The easiest way to use geococo is to simply call it from your preferred terminal. You can use the tool entirely from your terminal by providing paths to your input data and the desired output image sizes like this.

# Example with local data and non-existent JSON file
geococo image.tif labels.shp coco_folder dataset.json 512 512

Creating new dataset..
Dataset version: 0.1.0
Dataset description: Test dataset
Dataset contributor: User
Dataset date: 2023-09-05 18:12:31.435591
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 234/234 [00:04<00:00, 50.36it/s]

For more information on the different options, call geococo with --help

geococo --help
Usage: cli.py [OPTIONS] IMAGE_PATH LABELS_PATH JSON_PATH OUTPUT_DIR WIDTH HEIGHT

  Transform your GIS annotations into a COCO dataset.

  This method generates a COCO dataset by moving across the given image
  (image_path) with a moving window (image_size), constantly checking for
  intersecting annotations (labels_path)  that represent image objects in said
  image (e.g. buildings in satellite imagery; denoted  by category_attribute).
  Each valid intersection will add n Annotations entries to the dataset
  (json_path) and save a subset of the input image that contained these entries
  (output_dir).

  The output data size depends on your input labels, as the moving window
  adjusts its step size  to accommodate the average annotation size, optimizing
  dataset representation and minimizing  tool configuration.

Arguments:
  IMAGE_PATH                 Path to the geospatial image containing image
                             objects (e.g. buildings in satellite imagery)
                             [required]
  LABELS_PATH                Path to the annotations representing these image
                             objects (='category_id')  [required]
  JSON_PATH                  Path to the json file that will store the COCO
                             dataset (will be appended to if already exists)
                             [required]
  OUTPUT_DIR                 Path to the output directory for image subsets
                             [required]
  WIDTH                      Width of the output images  [required]
  HEIGHT                     Height of the output images  [required]

Options:
  --category-attribute TEXT  Column that contains category_id values per
                             annotation feature  [default: category_id]
  --help                     Show this message and exit.

Python module

This is recommended for most developers as it gives you more granular control over the various steps. It does assume a basic understanding of the geopandas and rasterio packages.

import geopandas as gpd
import rasterio
from datetime import datetime
from geococo import create_dataset, load_dataset, save_dataset, labels_to_dataset

# Replace this with your preferred output paths
data_path = pathlib.Path("path/to/your/coco/output/images")
json_path = pathlib.Path("path/to/your/coco/json/file")

# Dimensions of the moving window and output images
width, height = 512, 512

# Creating dataset instance from scratch
description = "My First Dataset"
contributor = "User'
date_created = datetime.now()

dataset = create_dataset(
  version = version, 
  description = description, 
  contributor = contributor, 
  date_created = date_created
)

# You can also load existing COCO datasets
# dataset = load_dataset(json_path=json_path)

# Loading GIS data with rasterio and geopandas
labels = gpd.read_file(labels_path)
raster_source = rasterio.open(image_path)

# Moving across raster_source and appending all intersecting annotations
dataset = labels_to_dataset(
    dataset = dataset, 
    images_dir = output_dir,
    src = raster_source,
    labels = labels,
    window_bounds = [(width, height)]
    )

# Encode CocoDataset instance as JSON and save to json_path
save_dataset(dataset=dataset, json_path=json_path)

Visualization with FiftyOne

Like the official COCO project, the open source tool FiftyOne can be used to visualize and evaluate your datasets. This does require the fiftyone and pycocotools packages (the former of which is not installed by geococo so you would need to install this separately, see https://docs.voxel51.com/getting_started/install.html for instructions). After installing fiftyone, you can run the following to inspect your data in your browser.

# requires pycocotools and fiftyone
import fiftyone as fo
import fiftyone.zoo as foz
import pathlib

data_path = pathlib.Path("path/to/your/coco/output/images")
json_path = pathlib.Path("path/to/your/coco/json/file")

# Load COCO formatted dataset
coco_dataset = fo.Dataset.from_dir(
    dataset_type=fo.types.COCODetectionDataset,
    data_path=data_path,
    labels_path=json_path,
    include_id=True,
)

# Launch the app
session = fo.launch_app(coco_dataset, port=5151)

Project details

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.5.4

Sep 26, 2023

0.5.3

Sep 26, 2023

0.5.2

Sep 26, 2023

0.5.1

Sep 25, 2023

0.5.0

Sep 21, 2023

0.4.2

Sep 20, 2023

0.4.1

Sep 20, 2023

0.4.0

Sep 19, 2023

This version

0.3.0

Sep 13, 2023

0.2.1

Sep 12, 2023

0.2.0

Sep 12, 2023

0.1.3

Sep 5, 2023

0.1.2

Sep 5, 2023

0.1.0a0 pre-release

Aug 21, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geococo-0.3.0.tar.gz (26.5 kB view hashes)

Uploaded Sep 13, 2023 Source

Built Distribution

geococo-0.3.0-py3-none-any.whl (27.2 kB view hashes)

Uploaded Sep 13, 2023 Python 3

Hashes for geococo-0.3.0.tar.gz

Hashes for geococo-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`ddbac035e0b198c745d71935c7217dcb2aa63705bea1cd5f6bb481a2e10579e7`
MD5	`2abe83c6ee55f714781680534a15b1bd`
BLAKE2b-256	`3f112fe36148989d78662ef3244236075906e15520c03f4176c769fc6fdd2290`

Hashes for geococo-0.3.0-py3-none-any.whl

Hashes for geococo-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2f26eb9f17ba2802f127b4b25d63ff9c1e8257032e9c58c0cff5ec1c5276e46b`
MD5	`dbca91b2d44826276dc9c0aeec05f26d`
BLAKE2b-256	`ff0b96e6fc44fd87212b73392b491af1bf34d20b2b9f90bb5c6e689a75a33861`