No project description provided
Project description
Temporal Mapper
V.0.4.0 - August 19 '24
This is a library for using the Mapper for temporal topic modelling. Though things broadly work now, the edge cases have not been throughly tested.
Direct questions to Kaleb D. Ruscitti: kaleb.ruscitti at uwaterloo.ca .
Complete documentation is under construction on Read The Docs.
Example:
arXiv Papers
From the arXiv API, we can retrieve ~500,000 article titles and abstracts,
use SBERT to embed them, and then UMAP to reduce to 2D.
Using DataMapPlot and TopicNaming we can produce a static plot of this data:
Now, using this repository we can additionally analyse the temporal information. Using the Mapper algorithm with time as our lens function, we create a temporal graph of the topics (clusters) through time. The code includes two types of plots to visualize this graph:
| Centroid Plot | Temporal-Semantic Plot |
|---|---|
Installation
Clone the repo and install:
git clone https://github.com/TutteInstitute/temporal-mapper.git
cd temporal-mapper && pip install .
Usage
The file doc/DemoNotebook.ipynb is a start-to-finish
example of how to generate a Sankey diagram with this package.
Parameters
For a complete listing of the parameters, check the repo's GitHub wiki. However, the most impactful choices are:
HDBSCAN parameters
HDBSCAN(min_cluster_size=n) This is the usual HDBSCAN parameter, but
now that the points are weighted, and the weights are strictly <= 1,
you generally want to set this a bit lower than you might usually do.
tm.TemporalMapper() parameters
Mapper works by clustering
inside time slices, and these time slices are determined by two
parameters, checkpoints and overlap. The checkpoints define the
center of the bins, and overlap, which should lie in (0,1),
defines how much the bins will intersect eachother.
Checkpoints
You can either pass tm.TemporalMapper() a list of
checkpoints; checkpoints = arrayLike or you can use the
N_checkpoints = int and slice_method = str parameters to have it
generate checkpoints for you.
slice-method takes either 'time' or 'data'. The time option
generates checkpoints evenly spaced in time, and the data option
generates checkpoints evenly spaced in the number of data points.
Overlap
The default value of overlap=0.5 should work in most
cases, but if you find your graph is highly disconnected you can
increase the overlap.
Neighbours
Passing neighbours = k for some positive integer
k determines the number of nearest neighbours used to compute the
temporal density of the data. If you have a lot of data, you should
increase this parameter as much as your computational constraints will
allow.
Temporal kernel parameters
The temporal kernel is used to give
the points weight in time. You can pass a kernel function to
tm.TemporalGraph kernel=myFunc. The default is
temporal_mapper.weighted_clusters.gaussian which is a Gaussian
kernel. If your kernel function takes parameters, you can pass
kernel_params = (param1, param2, ...)
The parameter rate_sensitivity can be any number >=0, or -1. This
controls how sensitive the temporal kernel is to changes in the
temporal density of your data. This is an exponent factor; at the
default setting (= 1.) points with double the temporal density will
have a kernel that is half as wide. At sensitivity 2, double density
gives 1/4 as wide, and so on. The option -1 sets the scale to be
logarithmic; 10x as dense = 1/2 as wide.
If you want to recover original (non-fuzzy) mapper, you can pass
kernel = temporal_mapper.weighted_clusters.square and
rate_sensitivity = 0.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file temporal_mapper-0.4.1.tar.gz.
File metadata
- Download URL: temporal_mapper-0.4.1.tar.gz
- Upload date:
- Size: 13.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
98b95552ef346c715d46815d2403c9a66eb0ecef5110d71e425d92bf2e3b3dfa
|
|
| MD5 |
a2fbdbfe0db69cd0fcaca1159f57656c
|
|
| BLAKE2b-256 |
af80e6276806c802baa8eba81318b5bbb56d96a48913bdf5fded0e664528f90b
|
File details
Details for the file temporal_mapper-0.4.1-py3-none-any.whl.
File metadata
- Download URL: temporal_mapper-0.4.1-py3-none-any.whl
- Upload date:
- Size: 14.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
48709746d44eaa342f572efdf14bcff25b40b28678d3b70d6e25ee9c5aa6eb0b
|
|
| MD5 |
b744038509bd2d05430513beeb395e5d
|
|
| BLAKE2b-256 |
4b4b867af57346a77afcd077cf8c5459f0476a4a8177ad544656cdb00f6f5420
|