Point cloud data processing
Project description
PDAL Python support allows you to process data with PDAL into Numpy arrays. It provides a PDAL extension module to control Python interaction with PDAL. Additionally, you can use it to fetch schema and metadata from PDAL operations.
Installation
PyPI
PDAL Python support is installable via PyPI:
pip install PDAL
GitHub
The repository for PDAL’s Python extension is available at https://github.com/PDAL/python
Python support released independently from PDAL itself as of PDAL 1.7.
Usage
Simple
Given the following pipeline, which simply reads an ASPRS LAS file and sorts it by the X dimension:
json = """ { "pipeline": [ "1.2-with-color.las", { "type": "filters.sort", "dimension": "X" } ] }""" import pdal pipeline = pdal.Pipeline(json) count = pipeline.execute() arrays = pipeline.arrays metadata = pipeline.metadata log = pipeline.log
Programmatic Pipeline Construction
The previous example specified the pipeline as a JSON string. Alternatively, a pipeline can be constructed by creating Stage instances and piping them together. For example, the previous pipeline can be specified as:
pipeline = pdal.Reader("1.2-with-color.las") | pdal.Filter.sort(dimension="X")
Stage Objects
- A stage is an instance of pdal.Reader, pdal.Filter or pdal.Writer.
- A stage can be instantiated by passing as keyword arguments the options
applicable to the respective PDAL stage. For more on PDAL stages and their
options, check the PDAL documentation on Stage Objects.
- The filename option of Readers and Writers as well as the type option of Filters can be passed positionally as the first argument.
- The inputs option specifies a sequence of stages to be set as input to the current stage. Each input can be either the string tag of another stage, or the Stage instance itself.
- The Reader, Filter and Writer classes come with static methods for all the respective PDAL drivers. For example, pdal.Filter.head() is a shortcut for pdal.Filter(type="filters.head"). These methods are auto-generated by introspecting pdal and the available options are included in each method’s docstring:
>>> help(pdal.Filter.head) Help on function head in module pdal.pipeline: head(**kwargs) Return N points from beginning of the point cloud. user_data: User JSON log: Debug output filename option_file: File from which to read additional options where: Expression describing points to be passed to this filter where_merge='auto': If 'where' option is set, describes how skipped points should be merged with kept points in standard mode. count='10': Number of points to return from beginning. If 'invert' is true, number of points to drop from the beginning. invert='false': If true, 'count' specifies the number of points to skip from the beginning.
Pipeline Objects
A pdal.Pipeline instance can be created from:
- a JSON string: Pipeline(json_string)
- a sequence of Stage instances: Pipeline([stage1, stage2])
- a single Stage with the Stage.pipeline method: stage.pipeline()
- nothing: Pipeline() creates a pipeline with no stages.
- joining Stage and/or other Pipeline instances together with the pipe
operator (|):
- stage1 | stage2
- stage1 | pipeline1
- pipeline1 | stage1
- pipeline1 | pipeline2
Every application of the pipe operator creates a new Pipeline instance. To update an existing Pipeline use the respective in-place pipe operator (|=):
# update pipeline in-place pipeline = pdal.Pipeline() pipeline |= stage pipeline |= pipeline2
Reading using Numpy Arrays
The following more complex scenario demonstrates the full cycling between PDAL and Python:
- Read a small testfile from GitHub into a Numpy array
- Filters the array with Numpy for Intensity
- Pass the filtered array to PDAL to be filtered again
- Write the final filtered array to a LAS file and a TileDB array via the TileDB-PDAL integration using the TileDB writer plugin
import pdal data = "https://github.com/PDAL/PDAL/blob/master/test/data/las/1.2-with-color.las?raw=true" pipeline = pdal.Reader.las(filename=data).pipeline() print(pipeline.execute()) # 1065 points # Get the data from the first array # [array([(637012.24, 849028.31, 431.66, 143, 1, # 1, 1, 0, 1, -9., 132, 7326, 245380.78254963, 68, 77, 88), # dtype=[('X', '<f8'), ('Y', '<f8'), ('Z', '<f8'), ('Intensity', '<u2'), # ('ReturnNumber', 'u1'), ('NumberOfReturns', 'u1'), ('ScanDirectionFlag', 'u1'), # ('EdgeOfFlightLine', 'u1'), ('Classification', 'u1'), ('ScanAngleRank', '<f4'), # ('UserData', 'u1'), ('PointSourceId', '<u2'), # ('GpsTime', '<f8'), ('Red', '<u2'), ('Green', '<u2'), ('Blue', '<u2')]) arr = pipeline.arrays[0] # Filter out entries that have intensity < 50 intensity = arr[arr["Intensity"] > 30] print(len(intensity)) # 704 points # Now use pdal to clamp points that have intensity 100 <= v < 300 pipeline = pdal.Filter.range(limits="Intensity[100:300)").pipeline(intensity) print(pipeline.execute()) # 387 points clamped = pipeline.arrays[0] # Write our intensity data to a LAS file and a TileDB array. For TileDB it is # recommended to use Hilbert ordering by default with geospatial point cloud data, # which requires specifying a domain extent. This can be determined automatically # from a stats filter that computes statistics about each dimension (min, max, etc.). pipeline = pdal.Writer.las( filename="clamped.las", offset_x="auto", offset_y="auto", offset_z="auto", scale_x=0.01, scale_y=0.01, scale_z=0.01, ).pipeline(clamped) pipeline |= pdal.Filter.stats() | pdal.Writer.tiledb(array_name="clamped") print(pipeline.execute()) # 387 points # Dump the TileDB array schema import tiledb with tiledb.open("clamped") as a: print(a.schema)
Executing Streamable Pipelines
Streamable pipelines (pipelines that consist exclusively of streamable PDAL stages) can be executed in streaming mode via Pipeline.iterator(). This returns an iterator object that yields Numpy arrays of up to chunk_size size (default=10000) at a time.
import pdal pipeline = pdal.Reader("test/data/autzen-utm.las") | pdal.Filter.range(limits="Intensity[80:120)") for array in pipeline.iterator(chunk_size=500): print(len(array)) # or to concatenate all arrays into one # full_array = np.concatenate(list(pipeline))
Pipeline.iterator() also takes an optional prefetch parameter (default=0) to allow prefetching up to to this number of arrays in parallel and buffering them until they are yielded to the caller.
If you just want to execute a streamable pipeline in streaming mode and don’t need to access the data points (typically when the pipeline has Writer stage(s)), you can use the Pipeline.execute_streaming(chunk_size) method instead. This is functionally equivalent to sum(map(len, pipeline.iterator(chunk_size))) but more efficient as it avoids allocating and filling any arrays in memory.
Accessing Mesh Data
Some PDAL stages (for instance filters.delaunay) create TIN type mesh data.
This data can be accessed in Python using the Pipeline.meshes property, which returns a numpy.ndarray of shape (1,n) where n is the number of Triangles in the mesh.
If the PointView contains no mesh data, then n = 0.
Each Triangle is a tuple (A,B,C) where A, B and C are indices into the PointView identifying the point that is the vertex for the Triangle.
Meshio Integration
The meshes property provides the face data but is not easy to use as a mesh. Therefore, we have provided optional Integration into the Meshio library.
The pdal.Pipeline class provides the get_meshio(idx: int) -> meshio.Mesh method. This method creates a Mesh object from the PointView array and mesh properties.
Note
The meshio integration requires that meshio is installed (e.g. pip install meshio). If it is not, then the method fails with an informative RuntimeError.
Simple use of the functionality could be as follows:
import pdal ... pl = pdal.Pipeline(pipeline) pl.execute() mesh = pl.get_meshio(0) mesh.write('test.obj')
Advanced Mesh Use Case
USE-CASE : Take a LiDAR map, create a mesh from the ground points, split into tiles and store the tiles in PostGIS.
Note
Like Pipeline.arrays, Pipeline.meshes returns a list of numpy.ndarray to provide for the case where the output from a Pipeline is multiple PointViews
(example using 1.2-with-color.las and not doing the ground classification for clarity)
import pdal import psycopg2 import io pl = ( pdal.Reader(".../python/test/data/1.2-with-color.las") | pdal.Filter.splitter(length=1000) | pdal.Filter.delaunay() ) pl.execute() conn = psycopg(%CONNNECTION_STRING%) buffer = io.StringIO for idx in range(len(pl.meshes)): m = pl.get_meshio(idx) if m: m.write(buffer, file_format = "wkt") with conn.cursor() as curr: curr.execute( "INSERT INTO %table-name% (mesh) VALUES (ST_GeomFromEWKT(%(ewkt)s)", { "ewkt": buffer.getvalue()} ) conn.commit() conn.close() buffer.close()
Requirements
- PDAL 2.4+
- Python >=3.7
- Pybind11 (eg
pip install pybind11[global]
) - Numpy (eg
pip install numpy
) - scikit-build (eg
pip install scikit-build
)
Changes
3.1.0
- Breaking change – pipeline.metadata now returns a dictionary from json.loads instead of a string.
- pipeline.quickinfo will fetch the PDAL preview() information for a data source. You can use this to fetch header or other information without reading data. https://github.com/PDAL/python/pull/109
- PDAL driver and option collection now uses the PDAL library directly rather than shelling out to the pdal command https://github.com/PDAL/python/pull/107
- Pipelines now support pickling for use with things like Dask https://github.com/PDAL/python/pull/110
3.0.0
- Pythonic pipeline creation https://github.com/PDAL/python/pull/91
- Support streaming pipeline execution https://github.com/PDAL/python/pull/94
- Replace Cython with PyBind11 https://github.com/PDAL/python/pull/102
- Remove pdal.pio module https://github.com/PDAL/python/pull/101
- Move readers.numpy and filters.python to separate repository https://github.com/PDAL/python/pull/104
- Miscellaneous refactorings and cleanups
2.3.5
- Fix memory leak https://github.com/PDAL/python/pull/74
- Handle metadata with invalid unicode by erroring https://github.com/PDAL/python/pull/74
2.3.0
- PDAL Python support 2.3.0 requires PDAL 2.1+. Older PDAL base libraries likely will not work.
- Python support built using scikit-build
- readers.numpy and filters.python are installed along with the extension.
- Pipeline can take in a list of arrays that are passed to readers.numpy
- readers.numpy now supports functions that return arrays. See https://pdal.io/stages/readers.numpy.html for more detail.
2.0.0
- PDAL Python extension is now in its own repository on its own release schedule at https://github.com/PDAL/python
- Extension now builds and works under PDAL OSGeo4W64 on Windows.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for pdal-3.1.2-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fa380127e954628b953d75b4777eb8c7c2664ef0441c2e2305eec63a2324ba58 |
|
MD5 | 278a4e2a9e4141317c25a8a665621ae1 |
|
BLAKE2-256 | 3d472bf943aa599d1087424babaca44dd15ca84f9b38a7f7297c1547182fc3ea |
Hashes for pdal-3.1.2-cp310-cp310-macosx_11_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 00cae130c4463de32642e9d241a836f6c08297b4a9115f8437958dc528b47528 |
|
MD5 | 8db93f7b326072373784920121af7c12 |
|
BLAKE2-256 | 2608c830e5da0f047435d86c48beb32e2d6729b9fb9ce031ec003ae0bc9e41a9 |
Hashes for pdal-3.1.2-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 535f7dc9ec64764ab6c21ab694b12940920f20fff021bbff2104dfac1cd8fec0 |
|
MD5 | 2590683496df195868cbe88f7d635c81 |
|
BLAKE2-256 | 09ea7f34935c9e1c0efe121c6764b5f03f7571a36b93a296a637f4d8de6a79ec |
Hashes for pdal-3.1.2-cp39-cp39-macosx_11_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2a5f1ab07d6d7ba063e5453986758a3b23058076a7bbd77e36104782182fd91b |
|
MD5 | c4355859cf4c1f752f748a308937ca74 |
|
BLAKE2-256 | 78a62090cfc9a60041c540d0994d0f698431e05053b0c2d1aa0005345ddfaf6b |
Hashes for pdal-3.1.2-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 917bb3d07a71b10c475ef826aebc4bb11d2e8951580b4c61616ee9fff8b64dff |
|
MD5 | 0862c2af23e564c283381d97c5bf68df |
|
BLAKE2-256 | ce604e9a43238d598a055929c25698035db10517b3d2c07feba63595812c5b24 |
Hashes for pdal-3.1.2-cp38-cp38-macosx_10_16_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e4fab6d6402bab2561b12f078751a38ff840c9d45df6e7a152417df45585f6dc |
|
MD5 | 4d1f099e3896f8691d70ec9e71a8d04f |
|
BLAKE2-256 | 559d94fa5d51de79ee47a1890ad202ffcf0124a2dfed40bad4c726f5bd9912ba |
Hashes for pdal-3.1.2-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3682b58c6462b2d92f2d73fc68269523326ecaf9e76efaff6581e79c58920383 |
|
MD5 | 20c9de38f7e9d5422f02597dd69674f9 |
|
BLAKE2-256 | fcba24f9f7e8a9ddb4017b669e73f33bdfa1b2e0d8dd2225db388eb2620a2f0e |
Hashes for pdal-3.1.2-cp37-cp37m-macosx_10_16_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b6f463588cb44200b6683e2a3e542e1bea996f383629d047b3ba521e16b04783 |
|
MD5 | af31c6715a33bc18da0ed80d385a435d |
|
BLAKE2-256 | 663ef5c1b3a332dcc393848060f44076e7909cde085fd70a6af790dfc9dd4140 |