Skip to main content

A python implementation of the npm package geojsplit. Used to split GeoJSON files into smaller pieces.

Project description

geojsplit

A python implementation of the node package geojsplit: https://github.com/woodb/geojsplit

Build Status Documentation Status Coverage Status PyPI GitHub

Installation

With poetry

For an introduction to poetry.

$ poetry add geojsplit

will add geojsplit to your current virtual environment and update your poetry.lock file. If you would like to contribute or develop geojsplit

$ git clone https://github.com/underchemist/geojsplit.git
$ cd geojsplit
$ poetry install

You may need some extra configuration to make poetry play nice with conda virtual environments

poetry config settings.virtualenvs.path <path_to_conda_install>/envs  # tell poetry where you virtualenvs are stored
poetry config settings.virtualenvs.create 0  # tell poetry not to try to create its own virtualenvs.

See https://github.com/sdispater/poetry/issues/105#issuecomment-498042062 for more info.

$ poetry config settings.virtualenvs.path $CONDA_ENV_PATH
$ poetry config settings.virtualenvs.create 0

With pip

Though geojsplit is developed using poetry (and as such does not have a setup.py), pep517 implementation in pip means we can install it directly

$ pip install geojsplit

Usage

Although both the library code and the command line tool of geojsplit are relatively simple, there are use cases for both. You may want to use the backend GeoJSONBatchStreamer class directly in order to do more sophisticated manipulations with GeoJOSN documents. As a command line tool geojsplit also works well as a preprocessing step for working with large GeoJSON documents i.e. for piping into GDAL’s ogr2ogr tool.

As a library

Once installed, geojsplit can be imported in like

from geojsplit import geojsplit

geojson = geojsplit.GeoJSONBatchStreamer("/path/to/some.geojson")

for feature_collection in geojson.stream():
    do_something(feature_collection)
    ...

If the /path/to/some.geojson does not exists, FileNotFound will be raised.

You can control how many features are streamed into a Feature Collection using the batch parameter (Default is 100).

>>> g = geojson.stream(batch=2)  # instatiate generator object
>>> data = next(g)
>>> print(data)
{"features": [{"geometry": {"coordinates": [[[-118.254638, 33.7843], [-118.254637,
33.784231], [-118.254556, 33.784232], [-118.254559, 33.784339], [-118.254669,
33.784338], [-118.254668, 33.7843], [-118.254638, 33.7843]]], "type": "Polygon"},
"properties": {}, "type": "Feature"}, {"geometry": {"coordinates": [[[-118.254414,
33.784255], [-118.254232, 33.784255], [-118.254232, 33.784355], [-118.254414,
33.784355], [-118.254414, 33.784255]]], "type": "Polygon"}, "properties": {}, "type":
"Feature"}], "type": "FeatureCollection"}
>>> print(len(data["features"]))
2

If your GeoJSON document has a different format or you want to iterate over different elements on your document, you can also pass a different value to the prefix keyword argument (Default is 'features.item'). This is an argument passed directly down to a ijson.items call, for more information see https://github.com/ICRAR/ijson.

As a command line tool

After installing you should have the geojsplit executable in your PATH.

$ geojsplit -h
usage: geojsplit [-h] [-l GEOMETRY_COUNT] [-a SUFFIX_LENGTH] [-o OUTPUT]
                [-n LIMIT] [-v] [-d] [--version]
                geojson

Split a geojson file into many geojson files.

positional arguments:
geojson               filename of geojson file to split

optional arguments:
-h, --help            show this help message and exit
-l GEOMETRY_COUNT, --geometry-count GEOMETRY_COUNT
                        the number of features to be distributed to each file.
-a SUFFIX_LENGTH, --suffix-length SUFFIX_LENGTH
                        number of characters in the suffix length for split
                        geojsons
-o OUTPUT, --output OUTPUT
                        output directory to save split geojsons
-n LIMIT, --limit LIMIT
                        limit number of split geojson file to at most LIMIT,
                        with GEOMETRY_COUNT number of features.
-v, --verbose         increase output verbosity
-d, --dry-run         see output without actually writing to file
--version             show geojsplit version number

By default splitted GeoJSON files are saved as filename_x<SUFFIX_LENGTH characters long>.geojson. Default SUFFIX_LENGTH is 4, meaning that 456976 unique files can be generated. If you need more use -a or --suffix-length to increase this value appropriately.

The --geometry-count flag corresponds to the batch keyword argument for GeoJSONBatchStreamer.stream method. Note that if GEOMETRY_COUNT does not divide equally into the number of features in the Feature Collection, the last batch of features will be < GEOMETRY_COUNT.

Finally, to only iterate over the the first n elements of a GeoJSON document, use --limit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geojsplit-0.1.2.tar.gz (8.0 kB view details)

Uploaded Source

Built Distribution

geojsplit-0.1.2-py3-none-any.whl (8.2 kB view details)

Uploaded Python 3

File details

Details for the file geojsplit-0.1.2.tar.gz.

File metadata

  • Download URL: geojsplit-0.1.2.tar.gz
  • Upload date:
  • Size: 8.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/0.12.17 CPython/3.6.7 Linux/4.15.0-1028-gcp

File hashes

Hashes for geojsplit-0.1.2.tar.gz
Algorithm Hash digest
SHA256 44d5721dbaf6e0045e0546cb6957a61d22991854304c65b5a7fe46dd9012385b
MD5 906180b544dda7aa9d08be494240e646
BLAKE2b-256 6519a4dbf77e49124d5ae17b3a1f13b7a4b9c3de9f011852b823c2019d3619a4

See more details on using hashes here.

File details

Details for the file geojsplit-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: geojsplit-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 8.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/0.12.17 CPython/3.6.7 Linux/4.15.0-1028-gcp

File hashes

Hashes for geojsplit-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 31097c1f688cae9c12c634fd597e1f0dea3e2e4a8e38c67b017df9a993e95f30
MD5 603f5b56ce6f266791691f7f427c6f16
BLAKE2b-256 e566052c86fa497531bf23725406af69d2b1c4eab633b6281e72c99374a4a629

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page