Skip to main content

Wowool Semantic Themes

Project description

Categorizing your documents

The themes app identifies the most relevant themes (categories) by collecting entities with the attribute theme. It then determines the most appropriate categories for the document.

Themes are pre-defined categories in which you want to categorize your documents. While topics are extracted salient noun groups from the processed documents.

Prerequisites

The themes.app uses the Theme entity and the annotation attributes theme and sector to collect information to identify potential categories. To enable this functionality, ensure that the semantic-themes domain is included in your processing pipeline or that you have a custom domain that produces Theme entities.

Options

ThemesOptions

interface ThemesOptions {
  collect?: Record<str, UriInfo>;
  attributes?: string[];
  count?: number;
  threshold?: number;
}

with:

Property Description
collect Specifies which entities and which information that will be considered during the categorization process
attributes Attributes that are considered as theme candidates. Default: ['theme', 'sector']
count Maximum number of themes to collect
threshold Minimal probability, expressed as a percentage, for a theme candidate to be considered a theme

UriInfo

UriInfo is used in case you want the app to include or exclude some URI's to be used during the categorization process.

interface UriInfo {
  uri: boolean;
  attributes: string[];
}

with:

Property Description
uri Specifies whether the entity canonical should be considered a theme candidate
attributes Attributes that are considered as theme candidates. Default: ['theme', 'sector']

Results

ThemesResults

type ThemesResults = ThemesResult[];

ThemesResult

interface ThemesResult {
  name: string;
  relevancy: number;
}

with:

Property Description
name Name of the theme
relevancy Relevancy of the theme within the document

Examples

Persons and events

Let's say you want to consider the position attribute from Person, the name of the Country and the global attributes for Event. Then the following configuration can be used:

which yields as output:

[
  { "name": "ceo", "relevancy": 100 },
  { "name": "terrorism", "relevancy": 50 },
  { "name": "business", "relevancy": 50 },
  { "name": "aerospace", "relevancy": 50 },
  { "name": "USA", "relevancy": 50 }
]

Categorizing your documents

The themes app identifies the most relevant themes (categories) by collecting entities with the attribute theme. It then determines the most appropriate categories for the document.

Themes are pre-defined categories in which you want to categorize your documents. While topics are extracted salient noun groups from the processed documents.

Prerequisites

The themes.app uses the Theme entity and the annotation attributes theme and sector to collect information to identify potential categories. To enable this functionality, ensure that the semantic-themes domain is included in your processing pipeline or that you have a custom domain that produces Theme entities.

Options

ThemesOptions

interface ThemesOptions {
  collect?: Record<str, UriInfo>;
  attributes?: string[];
  count?: number;
  threshold?: number;
}

with:

Property Description
collect Specifies which entities and which information that will be considered during the categorization process
attributes Attributes that are considered as theme candidates. Default: ['theme', 'sector']
count Maximum number of themes to collect
threshold Minimal probability, expressed as a percentage, for a theme candidate to be considered a theme

UriInfo

UriInfo is used in case you want the app to include or exclude some URI's to be used during the categorization process.

interface UriInfo {
  uri: boolean;
  attributes: string[];
}

with:

Property Description
uri Specifies whether the entity canonical should be considered a theme candidate
attributes Attributes that are considered as theme candidates. Default: ['theme', 'sector']

Results

ThemesResults

type ThemesResults = ThemesResult[];

ThemesResult

interface ThemesResult {
  name: string;
  relevancy: number;
}

with:

Property Description
name Name of the theme
relevancy Relevancy of the theme within the document

API

Examples

Using the Semantic Themes

This sample demonstrates how to use the Themes app from the wowool.semantic_themes package to extract and rank semantic themes from a document.

  • PipeLine is used to create a text analysis pipeline.
  • Themes is the app that extracts and ranks themes.
from wowool.sdk import Pipeline
from wowool.semantic_themes import Themes

analysis = Pipeline("english,entity,semantic-theme")
themes = Themes(count=2)
# run the document analysis
document = themes(analysis("EyeOnID works on cybercrime prevention"))
for item in document.themes:
    print(f" - {item.name}: {item.relevancy}")

License

In both cases you will need to acquirer a license file at https://www.wowool.com

Non-Commercial

This library is licensed under the GNU AGPLv3 for non-commercial use.  
For commercial use, a separate license must be purchased.  

Commercial license Terms

1. Grants the right to use this library in proprietary software.  
2. Requires a valid license key  
3. Redistribution in SaaS requires a commercial license.  

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wowool_semantic_themes-3.3.0-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file wowool_semantic_themes-3.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for wowool_semantic_themes-3.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 13fc171c918beaa80fdb1bad88edf944509ea30b8815a30d9fc6dda47e44553a
MD5 0169788eb7109710c1c288a0e8a920e6
BLAKE2b-256 add2e44ce214bedcd9b833fd5ef5a9baaedb7582fa60873f9f3877f4ad827d71

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page