Skip to main content

No project description provided

Project description

Temporal Mapper

V.0.4.0 - August 19 '24


This is a library for using the Mapper for temporal topic modelling. Though things broadly work now, the edge cases have not been throughly tested.

Direct questions to Kaleb D. Ruscitti: kaleb.ruscitti at uwaterloo.ca .

Complete documentation is under construction on Read The Docs.

Example:

arXiv Papers

From the arXiv API, we can retrieve ~500,000 article titles and abstracts, use SBERT to embed them, and then UMAP to reduce to 2D.

Using DataMapPlot and TopicNaming we can produce a static plot of this data:

A DataMapPlot of ArXiV papers

Now, using this repository we can additionally analyse the temporal information. Using the Mapper algorithm with time as our lens function, we create a temporal graph of the topics (clusters) through time. The code includes two types of plots to visualize this graph:

Centroid Plot Temporal-Semantic Plot

Installation

Clone the repo and install: git clone https://github.com/TutteInstitute/temporal-mapper.git cd temporal-mapper && pip install .

Usage

The file doc/DemoNotebook.ipynb is a start-to-finish example of how to generate a Sankey diagram with this package.

Parameters

For a complete listing of the parameters, check the repo's GitHub wiki. However, the most impactful choices are:

HDBSCAN parameters

HDBSCAN(min_cluster_size=n) This is the usual HDBSCAN parameter, but now that the points are weighted, and the weights are strictly <= 1, you generally want to set this a bit lower than you might usually do.

tm.TemporalMapper() parameters

Mapper works by clustering inside time slices, and these time slices are determined by two parameters, checkpoints and overlap. The checkpoints define the center of the bins, and overlap, which should lie in (0,1), defines how much the bins will intersect eachother.

Checkpoints

You can either pass tm.TemporalMapper() a list of checkpoints; checkpoints = arrayLike or you can use the N_checkpoints = int and slice_method = str parameters to have it generate checkpoints for you.

slice-method takes either 'time' or 'data'. The time option generates checkpoints evenly spaced in time, and the data option generates checkpoints evenly spaced in the number of data points.

Overlap

The default value of overlap=0.5 should work in most cases, but if you find your graph is highly disconnected you can increase the overlap.

Neighbours

Passing neighbours = k for some positive integer k determines the number of nearest neighbours used to compute the temporal density of the data. If you have a lot of data, you should increase this parameter as much as your computational constraints will allow.

Temporal kernel parameters

The temporal kernel is used to give the points weight in time. You can pass a kernel function to tm.TemporalGraph kernel=myFunc. The default is temporal_mapper.weighted_clusters.gaussian which is a Gaussian kernel. If your kernel function takes parameters, you can pass kernel_params = (param1, param2, ...)

The parameter rate_sensitivity can be any number >=0, or -1. This controls how sensitive the temporal kernel is to changes in the temporal density of your data. This is an exponent factor; at the default setting (= 1.) points with double the temporal density will have a kernel that is half as wide. At sensitivity 2, double density gives 1/4 as wide, and so on. The option -1 sets the scale to be logarithmic; 10x as dense = 1/2 as wide.

If you want to recover original (non-fuzzy) mapper, you can pass kernel = temporal_mapper.weighted_clusters.square and rate_sensitivity = 0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

temporal_mapper-0.4.1.tar.gz (13.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

temporal_mapper-0.4.1-py3-none-any.whl (14.8 kB view details)

Uploaded Python 3

File details

Details for the file temporal_mapper-0.4.1.tar.gz.

File metadata

  • Download URL: temporal_mapper-0.4.1.tar.gz
  • Upload date:
  • Size: 13.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.5

File hashes

Hashes for temporal_mapper-0.4.1.tar.gz
Algorithm Hash digest
SHA256 98b95552ef346c715d46815d2403c9a66eb0ecef5110d71e425d92bf2e3b3dfa
MD5 a2fbdbfe0db69cd0fcaca1159f57656c
BLAKE2b-256 af80e6276806c802baa8eba81318b5bbb56d96a48913bdf5fded0e664528f90b

See more details on using hashes here.

File details

Details for the file temporal_mapper-0.4.1-py3-none-any.whl.

File metadata

File hashes

Hashes for temporal_mapper-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 48709746d44eaa342f572efdf14bcff25b40b28678d3b70d6e25ee9c5aa6eb0b
MD5 b744038509bd2d05430513beeb395e5d
BLAKE2b-256 4b4b867af57346a77afcd077cf8c5459f0476a4a8177ad544656cdb00f6f5420

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page