Wowool Semantic Themes
Project description
Categorizing your documents
The themes app identifies the most relevant themes (categories) by collecting entities with the attribute theme. It then determines the most appropriate categories for the document.
Themes are pre-defined categories in which you want to categorize your documents. While topics are extracted salient noun groups from the processed documents.
Prerequisites
The themes.app uses the Theme entity and the annotation attributes theme and sector to collect information to identify potential categories. To enable this functionality, ensure that the semantic-themes domain is included in your processing pipeline or that you have a custom domain that produces Theme entities.
Options
ThemesOptions
interface ThemesOptions {
collect?: Record<str, UriInfo>;
attributes?: string[];
count?: number;
threshold?: number;
}
with:
| Property | Description |
|---|---|
collect |
Specifies which entities and which information that will be considered during the categorization process |
attributes |
Attributes that are considered as theme candidates. Default: ['theme', 'sector'] |
count |
Maximum number of themes to collect |
threshold |
Minimal probability, expressed as a percentage, for a theme candidate to be considered a theme |
UriInfo
UriInfo is used in case you want the app to include or exclude some URI's to be used during the categorization process.
interface UriInfo {
uri: boolean;
attributes: string[];
}
with:
| Property | Description |
|---|---|
uri |
Specifies whether the entity canonical should be considered a theme candidate |
attributes |
Attributes that are considered as theme candidates. Default: ['theme', 'sector'] |
Results
ThemesResults
type ThemesResults = ThemesResult[];
ThemesResult
interface ThemesResult {
name: string;
relevancy: number;
}
with:
| Property | Description |
|---|---|
name |
Name of the theme |
relevancy |
Relevancy of the theme within the document |
Examples
Persons and events
Let's say you want to consider the position attribute from Person, the name of the Country and the global attributes for Event. Then the following configuration can be used:
which yields as output:
[
{ "name": "ceo", "relevancy": 100 },
{ "name": "terrorism", "relevancy": 50 },
{ "name": "business", "relevancy": 50 },
{ "name": "aerospace", "relevancy": 50 },
{ "name": "USA", "relevancy": 50 }
]
Categorizing your documents
The themes app identifies the most relevant themes (categories) by collecting entities with the attribute theme. It then determines the most appropriate categories for the document.
Themes are pre-defined categories in which you want to categorize your documents. While topics are extracted salient noun groups from the processed documents.
Prerequisites
The themes.app uses the Theme entity and the annotation attributes theme and sector to collect information to identify potential categories. To enable this functionality, ensure that the semantic-themes domain is included in your processing pipeline or that you have a custom domain that produces Theme entities.
Options
ThemesOptions
interface ThemesOptions {
collect?: Record<str, UriInfo>;
attributes?: string[];
count?: number;
threshold?: number;
}
with:
| Property | Description |
|---|---|
collect |
Specifies which entities and which information that will be considered during the categorization process |
attributes |
Attributes that are considered as theme candidates. Default: ['theme', 'sector'] |
count |
Maximum number of themes to collect |
threshold |
Minimal probability, expressed as a percentage, for a theme candidate to be considered a theme |
UriInfo
UriInfo is used in case you want the app to include or exclude some URI's to be used during the categorization process.
interface UriInfo {
uri: boolean;
attributes: string[];
}
with:
| Property | Description |
|---|---|
uri |
Specifies whether the entity canonical should be considered a theme candidate |
attributes |
Attributes that are considered as theme candidates. Default: ['theme', 'sector'] |
Results
ThemesResults
type ThemesResults = ThemesResult[];
ThemesResult
interface ThemesResult {
name: string;
relevancy: number;
}
with:
| Property | Description |
|---|---|
name |
Name of the theme |
relevancy |
Relevancy of the theme within the document |
API
Examples
Using the Semantic Themes
This sample demonstrates how to use the Themes app from the wowool.semantic_themes package to extract and rank semantic themes from a document.
- PipeLine is used to create a text analysis pipeline.
- Themes is the app that extracts and ranks themes.
from wowool.sdk import Pipeline
from wowool.semantic_themes import Themes
analysis = Pipeline("english,entity,semantic-theme")
themes = Themes(count=2)
# run the document analysis
document = themes(analysis("EyeOnID works on cybercrime prevention"))
for item in document.themes:
print(f" - {item.name}: {item.relevancy}")
License
In both cases you will need to acquirer a license file at https://www.wowool.com
Non-Commercial
This library is licensed under the GNU AGPLv3 for non-commercial use.
For commercial use, a separate license must be purchased.
Commercial license Terms
1. Grants the right to use this library in proprietary software.
2. Requires a valid license key
3. Redistribution in SaaS requires a commercial license.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wowool_semantic_themes-3.3.0-py3-none-any.whl.
File metadata
- Download URL: wowool_semantic_themes-3.3.0-py3-none-any.whl
- Upload date:
- Size: 9.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
13fc171c918beaa80fdb1bad88edf944509ea30b8815a30d9fc6dda47e44553a
|
|
| MD5 |
0169788eb7109710c1c288a0e8a920e6
|
|
| BLAKE2b-256 |
add2e44ce214bedcd9b833fd5ef5a9baaedb7582fa60873f9f3877f4ad827d71
|