A python implementation of the npm package geojsplit. Used to split GeoJSON files into smaller pieces.
Project description
geojsplit
A python implementation of the node package geojsplit: https://github.com/woodb/geojsplit
Installation
With poetry
For an introduction to poetry.
$ poetry add geojsplit
will add geojsplit to your current virtual environment and update your poetry.lock file. If you would like to contribute or develop geojsplit
$ git clone https://github.com/underchemist/geojsplit.git
$ cd geojsplit
$ poetry install
You may need some extra configuration to make poetry play nice with conda virtual environments
poetry config settings.virtualenvs.path <path_to_conda_install>/envs # tell poetry where you virtualenvs are stored
poetry config settings.virtualenvs.create 0 # tell poetry not to try to create its own virtualenvs.
See https://github.com/sdispater/poetry/issues/105#issuecomment-498042062 for more info.
$ poetry config settings.virtualenvs.path $CONDA_ENV_PATH
$ poetry config settings.virtualenvs.create 0
With pip
Though geojsplit is developed using poetry (and as such does not have a setup.py), pep517 implementation in pip means we can install it directly
$ pip install geojsplit
Usage
Although both the library code and the command line tool of geojsplit are relatively simple, there are use cases for both. You may want to use the backend GeoJSONBatchStreamer
class directly in order to do more sophisticated manipulations with GeoJOSN documents. As a command line tool geojsplit also works well as a preprocessing step for working with large GeoJSON documents i.e. for piping into GDAL’s ogr2ogr tool.
As a library
Once installed, geojsplit can be imported in like
from geojsplit import geojsplit
geojson = geojsplit.GeoJSONBatchStreamer("/path/to/some.geojson")
for feature_collection in geojson.stream():
do_something(feature_collection)
...
If the /path/to/some.geojson
does not exists, FileNotFound
will be raised.
You can control how many features are streamed into a Feature Collection using the batch parameter (Default is 100).
>>> g = geojson.stream(batch=2) # instatiate generator object
>>> data = next(g)
>>> print(data)
{"features": [{"geometry": {"coordinates": [[[-118.254638, 33.7843], [-118.254637,
33.784231], [-118.254556, 33.784232], [-118.254559, 33.784339], [-118.254669,
33.784338], [-118.254668, 33.7843], [-118.254638, 33.7843]]], "type": "Polygon"},
"properties": {}, "type": "Feature"}, {"geometry": {"coordinates": [[[-118.254414,
33.784255], [-118.254232, 33.784255], [-118.254232, 33.784355], [-118.254414,
33.784355], [-118.254414, 33.784255]]], "type": "Polygon"}, "properties": {}, "type":
"Feature"}], "type": "FeatureCollection"}
>>> print(len(data["features"]))
2
If your GeoJSON document has a different format or you want to iterate over different elements on your document, you can also pass a different value to the prefix
keyword argument (Default is 'features.item'
). This is an argument passed directly down to a ijson.items
call, for more information see https://github.com/ICRAR/ijson.
As a command line tool
After installing you should have the geojsplit executable in your PATH
.
$ geojsplit -h
usage: geojsplit [-h] [-l GEOMETRY_COUNT] [-a SUFFIX_LENGTH] [-o OUTPUT]
[-n LIMIT] [-v] [-d] [--version]
geojson
Split a geojson file into many geojson files.
positional arguments:
geojson filename of geojson file to split
optional arguments:
-h, --help show this help message and exit
-l GEOMETRY_COUNT, --geometry-count GEOMETRY_COUNT
the number of features to be distributed to each file.
-a SUFFIX_LENGTH, --suffix-length SUFFIX_LENGTH
number of characters in the suffix length for split
geojsons
-o OUTPUT, --output OUTPUT
output directory to save split geojsons
-n LIMIT, --limit LIMIT
limit number of split geojson file to at most LIMIT,
with GEOMETRY_COUNT number of features.
-v, --verbose increase output verbosity
-d, --dry-run see output without actually writing to file
--version show geojsplit version number
By default splitted GeoJSON files are saved as filename_x<SUFFIX_LENGTH characters long>.geojson
. Default SUFFIX_LENGTH is 4, meaning that 456976 unique files can be generated. If you need more use -a
or --suffix-length
to increase this value appropriately.
The --geometry-count
flag corresponds to the batch keyword argument for GeoJSONBatchStreamer.stream
method. Note that if GEOMETRY_COUNT does not divide equally into the number of features in the Feature Collection, the last batch of features will be < GEOMETRY_COUNT.
Finally, to only iterate over the the first n elements of a GeoJSON document, use --limit
.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file geojsplit-0.1.2.tar.gz
.
File metadata
- Download URL: geojsplit-0.1.2.tar.gz
- Upload date:
- Size: 8.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/0.12.17 CPython/3.6.7 Linux/4.15.0-1028-gcp
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 44d5721dbaf6e0045e0546cb6957a61d22991854304c65b5a7fe46dd9012385b |
|
MD5 | 906180b544dda7aa9d08be494240e646 |
|
BLAKE2b-256 | 6519a4dbf77e49124d5ae17b3a1f13b7a4b9c3de9f011852b823c2019d3619a4 |
File details
Details for the file geojsplit-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: geojsplit-0.1.2-py3-none-any.whl
- Upload date:
- Size: 8.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/0.12.17 CPython/3.6.7 Linux/4.15.0-1028-gcp
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 31097c1f688cae9c12c634fd597e1f0dea3e2e4a8e38c67b017df9a993e95f30 |
|
MD5 | 603f5b56ce6f266791691f7f427c6f16 |
|
BLAKE2b-256 | e566052c86fa497531bf23725406af69d2b1c4eab633b6281e72c99374a4a629 |