Beautiful and interactive visualisations for NLP Topics
Project description
Doc2Map
Doc2Map is an algorithm for topic modeling and visualization. It can read any type of document files, but not OCR them. It will find topics base on the core idea of Top2Vec and hierarchicaly display them on a map similar to a Google Map: Live Demo 1 With Wikipedia Dataset
Live Demo 2 With 20 News Groups
Or on a scatter plot with a munual zoom level: Live Demo 1 With Wikipedia Dataset
Live Demo 2 With 20 News Groups
Why use Doc2Map?
With Doc2Map, you will be able to create beautiful, intuitive, and interactive visuals to summarise your document corpus in a map, similar to Google Map, with topics, clusters, and documents, instead of the names of countries, states, and cities.
Thanks to Apache Tika –a software able to detect and extract and text from over a thousand different file types– allow Doc2Map to read virtually any kind of file.
Note: This is not OCR, can’t extract text from pictures.
Using Doc2Map
There are two ways to use Doc2Vec:
- Launching directly the python module
- Importing the Doc2Map library in your script
Launching Doc2Map Module
Your first option is to directly launch the module. Once launch, you will have to wait a little for the programm to start, then you will be asked what folder you want to analyse:
Select the folder with the document you want to cartography.
For the next step, you will have to be patient. Doc2Map will analyse and convert into plain text your docuemnt, then organise them. Depending of the format, the size and the number of documents, it may take a long time...
When finished, two web pages will be automaticaly launch on your browser to show you different cartographies of you documents.
The examples are loaded from HTML files newly created. You can easily find their localization by looking at the address bar of your browser, you will see something like file://Your/Path/To/Your/Visuals
These files can easily be exported to another machine, with little of requirements:
- If your visualization is based on local files, once exported, these files may no longer be accessible by interacting with the visualisation.
- However, there will be no problem, if you use a common share hard drive with the people you share the visualisations (like it may often be the case in many firms, under the form of a local network). For the visualisation DocMap.html, you will have to include the files: DocMapdensity.svg and data.js.
Importing in a Python Script
If you want to use Doc2Map with python, you have first to install it:
pip install Doc2Map
Then, you will have to import it:
from Doc2Map import Doc2Map
How Does It Work?
Doc2Map is mainly based on the Top2Vec principle, and rely on Plotly and Leaflet to create beautiful visuals.
If you want to know the complete story and working of Doc2Map, I invite you to read the Medium Article about it:
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file doc2map-1.0.3-py3-none-any.whl
.
File metadata
- Download URL: doc2map-1.0.3-py3-none-any.whl
- Upload date:
- Size: 16.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.7.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 20e2a45603c3150fa5ae27f0d2e11af089147273c2360830d634beed0f6870bf |
|
MD5 | 367efe63e6980f0108a05d386b0a547a |
|
BLAKE2b-256 | 1d3165dd9966d914ce726e349a6a59b98e4aa2df994648db8f83f942eeba917d |