Skip to main content

A python package for density adaptive DBSCAN clustering

Project description

DOI Static Badge Static Badge GitHub Repo stars GitHub all releases
logo

AdaptiveDBSCAN

This is a normalized form of DBSCAN alogorithm that is based on varying number of neighbour. This algorithm is useful when your data has different density pattern. To get more information about the algorithm, please refer to the paper.

installation

For the best performance, it is recommended to create a new environment and then install the package:

conda create -n dadbscan python

To install the package, you can use pip:

pip install dadbscan

Getting Started

After installing the package, you can use it as follows by importing the modules:

from dadbscan.density import EQ_Density
from dadbscan.clustering import dbscan

------------------------------------------------------------------------------------

Phase1.

The first line is being used for creating a density map and the second one is for applying the Density-Adaptive DBSCAN algorithm. Now by defining the N value you have database as a CSV file, and you can run the density algorithm:

initiating the EQ_density class:

In the new version of Adaptive DBSCAN, we have added a filering option that can used as below"

filters = {"col":["num_picks", "Lat", "Lat", "Lon", "Lon"],  #select column
           "op":['>', '>=', '<=', ">=", "<="],   # it is an string format like >, =, >=, !=
           "val":[3, 55, 70, -105, -70] #it can be string or digits}
N = 65 # number of cells like NxN
density = EQ_Density(N, data_file='YOUR_FILE_PATH', min_year=1900, max_year=2050, min_mag=1, max_mag=9,  filters=filters, map_extenion_value=0.1) #map_extension_value can used to extend the frame 

! Remember to appropriately configure filters such as min_year, max_year, and others in your catalogue. Setting these values correctly according to your dataset is crucial; failing to do so may result in partial or complete filtering out of your catalogue.

To test the program, you can download the test file from the GitHub repo and use decl_cat.csv as a database.

YOUR_FILE_PATH = 'decl_cat.csv'

! It should be noted that your dataset must have a header like below (order is not important but it is case-sensitive): Lat, Lon, (Depth,Year, Month, Mw) ! If you have more columns in your dataset, you do NOT need to remove them. Lat and Lon are essentail and they are case sensitive.

running calc_density method:

heat_matrix = density.calc_density(minimum_density=10)  #setting the background value by minimum_density

In the command above, by adding minimum_density = ..., you can define the threshold for the minimum value of the density for each cell. The default value is 10.

plotting the density map:

density.plot_density()

a feature that can be used is smoothing the density map. This can be done by using the following method:

smoothed_heat_matrix = density.cell_smoother(apply_smooth=True)

! All the matrixes are saved physically in the folder 'Results' in two formats, PNG and CSV.

------------------------------------------------------------------------------------

Phase2.

Now that you have the density map, you can run the Density-Adaptive DBSCAN algorithm. To do so, you need to define the following parameters:

radius = density.radius
density_file_name = "Results/den_decl_cat__65_smooth.csv"

! be careful to correctly name the density_file_name.

As can be seen above, the radius can be derived from the density class. now it is time to initiate the dbscan class and run the algorithm:

clustering = dbscan(radius, density_file_name)

Step below can take a few minutes to be done...

final = clustering.clustering()
clustering.plot_clusters()

If you have a shape file to plot it on the background, you can use it here.

clustering.plot_clusters(shape_file_address="data/ShapeFiles/World_Countries_Generalized.shp")

You can finally save the calculation results in a file by command below:

final.to_csv(f"Results/R__final.csv")

When plotting the clustered data, you have some options:

plot_clusters(self, **kwargs):
        """
        **kwargs:
        cmap_shp: str, default="grey"
            The colormap to use for the shape file in the background
        
        cmap_scatter: str, default="turbo"
            The colormap to use for the scatter plot
        
        shp_linewidth: float, default=2
            The linewidth of the shape file
        
        save_fig: bool, default=False
            Whether to save the figure or not, if so, it will be saved in the ExampleData folder
        
        save_fig_format: str, default="pdf"
            The format to save the figure in 
        
        shape_file_address: str, default=False
            The address of the shape file to plot in the background, you can use the World_Countries_Generalized.shp file in the ShapeFiles folder.
            shape_file_address="ShapeFiles/World_Countries_Generalized.shp"
        """

Reference

Sina Sabermahani, Andrew W. Frederiksen (2023), Improved Earthquake Clustering Using a Density‐Adaptive DBSCAN Algorithm: An Example from Iran. Seismological Research Letters, doi: https://doi-org.uml.idm.oclc.org/10.1785/0220220305

License

This project is licensed under the MIT License - see the MIT License file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adaptive_dbscan-0.3.2.tar.gz (8.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

adaptive_dbscan-0.3.2-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file adaptive_dbscan-0.3.2.tar.gz.

File metadata

  • Download URL: adaptive_dbscan-0.3.2.tar.gz
  • Upload date:
  • Size: 8.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for adaptive_dbscan-0.3.2.tar.gz
Algorithm Hash digest
SHA256 b71a73f7f03d5786d7a89505f17ed76e703cbb1482fa8f37c14f9457b9ad19ad
MD5 8e0641745a739de26b8376f25dd14c9e
BLAKE2b-256 b325e1375c4a6e78ccc245d401099d4289258526f0c04bdf091cc50fc40e20f0

See more details on using hashes here.

File details

Details for the file adaptive_dbscan-0.3.2-py3-none-any.whl.

File metadata

File hashes

Hashes for adaptive_dbscan-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6e3f4a5f644020c35883d7f51ddad4f3b98d2780bfad03a7ef00aff161d62431
MD5 8e2197aa1c6a41a30b04c5ae67fa459c
BLAKE2b-256 f656215e6d8297973b559c322a2b7d876b402b03a383a5819d6f47550e51ea15

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page