Fast file-based format for geometries with Geopandas
A faster file-based format for geometries with
This project capitalizes on the very fast
feather file format to store geometry (points, lines, polygons) data for interoperability with
Why does this exist?
This project exists because reading and writing standard spatial formats (e.g., shapefile) in
geopandas is slow. I was working with millions of geometries in multiple processing steps, and needed a fast way to read and write intermediate files.
In our benchmarks, we see about 5-6x faster file writes than writing from geopandas to shapefile via
.to_file() on a
We see about 2x faster reads compared to geopandas
How does it work?
feather format works brilliantly for standard
pandas data frames. In order to leverage the
feather format, we simply convert the geometry data from
shapely objects into Well Known Binary (WKB) format, and then store that column as raw bytes.
We store the coordinate reference system using JSON format in a sidecar file
Available on PyPi at: https://pypi.org/project/geofeather/
pip install geofeather
Given an existing
my_gdf, pass this into
my_gdf = from_geofeather('test.feather')
pygeos provides much faster operations of geospatial operations over arrays of geospatial data.
geopandas is in the process of migrating to using
pygeos geometries as its internal data storage instead of
pygeos is fully integrated, there are shims in
geofeather to support interoperability with pandas DataFrames containing
pygeos geometries. If you are already using
pygeos against data you read from
geofeather, using the following shims will generate 3-7x speedups reading and writing data compared to
geofeather reading into GeoDataFrames.
Internally, the feather file is identical to the one created above.
pygeos is required in order to use this functionality.
WARNING: this will be deprecated as soon as
pygeos is integrated into
from geofeather.pygeos import to_geofeather, from_geofeather # given a DataFrame df containing pygeos geometries in 'geometry' column # and a crs object to_geofeather(df, 'test.feather', crs=crs) df = from_geofeather('test.geofeather')
Note: no CRS information is returned when reading from geofeather into a DataFrame, in order to keep the function signature the same as above
Right now, indexes are not supported in
feather files. In order to get around this, simply reset your index before calling
- allow serializing to / from pandas DataFrames containing
pygeosgeometries (see notes above).
- use new CRS object in geopandas data frames (#4)
to_shp; use geopandas
- allow reading a subset of columns from a feather file
- store geometry in 'geometry' column instead of 'wkb' column (simplification to avoid renaming columns)
- Initial release
Everything that makes this fast is due to the hard work of contributors to
Release history Release notifications | RSS feed
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Hashes for geofeather-0.3.0-py3-none-any.whl