Skip to main content

📍 command-line tool for clustering geolocations.

Project description

geoclustering

📍 command-line tool for clustering geolocations.

Features

  • Uses DBSCAN or OPTICS to perform clustering.
  • Outputs clustering results as json, txt and geojson.
  • Creates a kepler.gl visualization of clusters.

Clustering Method

A cluster is created when a certain number of points (defined with --size) each are within a given distance (defined with --distance) of at least one other point in the cluster.

Install

Install with pip:

# with kepler.gl visualization support
pip install geoclustering[full]

# only text-based output
pip install geoclustering

If the full install fails, you might need to install kepler.gl build dependencies:

# macos
brew install proj gdal

Usage

Usage: geoclustering [OPTIONS] FILENAME

  Tool to cluster geolocations. A cluster is created when a certain number of
  points (defined with --size) each are within a given distance (defined with
  --distance) of at least one other point in the cluster. Input is supplied as
  a csv file. At a minimum, each row needs to have a 'lat' and a 'lon' column.
  Other rows are reflected to the output.

Options:
  -d, --distance FLOAT            (in km) Max. distance between two points in
                                  a cluster.  [required]
  -s, --size INTEGER              Min. number of points in a cluster.
                                  [required]
  -o, --output PATH               Output directory for results. Default:
                                  ./output
  -a, --algorithm [dbscan|optics]
                                  Clustering algorithm to be used. `optics`
                                  produces tighter clusters but is slower.
                                  Default: dbscan
  --open                          Open the generated visualization in the
                                  default browser automatically.
  --debug                         Print debug output.
  --help                          Show this message and exit.

Input

Inputs are supplied as a .csv file. At a minimum, each row needs to have a lat and a `lon`` column. Other rows are reflected to the output.

id,name,lat,lon
1,Bonnibelle Mathwen,40.1324085,64.4911086
...

Output

If at least one cluster was found, the tool outputs a folder with output as json, geojson, txt, csv files. A kepler.gl html file is generated as well.

JSON

Encodes an array of clusters, each containing an array of points.

[
  {
    "cluster_id": 0,
    "points": [
      {
        "id": 9,
        "name": "Rosanna Foggo",
        "lat": -6.2074293,
        "lon": 106.8915948
      }
    ]
  }
]

GeoJSON

Encodes a single FeatureCollection, containing all points as Feature objects.

{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "geometry": {
        "type": "Point",
        "coordinates": [
          106.891595,
          -6.207429
        ]
      },
      "properties": {
        "id": 9,
        "name": "Rosanna Foggo",
        "cluster_id": 0
      }
    }
  ]
}

Text

Encodes cluster as blocks separated by a newline, where each line in a cluster block contains one point.

Cluster 0
id 9, name Rosanna Foggo, lat -6.2074293, lon 106.8915948

// ...

CSV

Encodes each event in one line with cluster_id information associated.

cluster_id,name,lat,lon
9,Rosanna Foggo,-6.2074293,106.8915948
...

kepler.gl

kepler.gl instance

Develop

It is assumed that you are using Python3.9+. It is encouraged to setup a virtualenv for development.

    # install dependencies & dev-dependencies
    # PIP
    pip install -e .[dev,full]
    # PIPENV
    pipenv install --dev -e .

    # install a git hook that runs the code formatter before each commit.
    pre-commit install

We use Black as our code formatter. If you don't want to use the pre-commit hook, you can run the formatter manually or via an editor plugin.

Release

  1. Update version.py
  2. Run scripts/release.sh
  3. Confirm GH action completed successfully

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geoclustering-0.4.1.tar.gz (10.5 kB view details)

Uploaded Source

Built Distribution

geoclustering-0.4.1-py3-none-any.whl (10.5 kB view details)

Uploaded Python 3

File details

Details for the file geoclustering-0.4.1.tar.gz.

File metadata

  • Download URL: geoclustering-0.4.1.tar.gz
  • Upload date:
  • Size: 10.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.7

File hashes

Hashes for geoclustering-0.4.1.tar.gz
Algorithm Hash digest
SHA256 cfa6c0ff8a6a400faa2d12e06607910707fc938d5598f2e00ac50901d8d490dc
MD5 fc334f7568ceaeba8323350556a18f59
BLAKE2b-256 5eb464818861aafdf3d578de819aed784bdd4bab55559047e3ec277d082f3984

See more details on using hashes here.

File details

Details for the file geoclustering-0.4.1-py3-none-any.whl.

File metadata

File hashes

Hashes for geoclustering-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d63a701dc1d80f22d7da64d4800c44107ee36760a8b7b06feb71ec3e08e76cdd
MD5 cf51bca2fb088f3997e52335e4421741
BLAKE2b-256 f6ce5354611116e525eab79bcc7117c8ac8679e4ff2eedbb12f4371f85e79e3c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page