Skip to main content

The Wowool Semantic Themes API

Project description

Categorizing your documents

The themes app identifies the most relevant themes (categories) by collecting entities with the attribute theme. It then determines the most appropriate categories for the document.

Themes are pre-defined categories in which you want to categorize your documents. While topics are extracted salient noun groups from the processed documents.

Prerequisites

The themes.app uses the Theme entity and the annotation attributes theme and sector to collect information to identify potential categories. To enable this functionality, ensure that the semantic-themes domain is included in your processing pipeline or that you have a custom domain that produces Theme entities.

Options

ThemesOptions

interface ThemesOptions {
    collect?: Record<str, UriInfo>;
    attributes?: string[];
    count?: number;
    threshold?: number;
}

with:

Property Description
collect Specifies which entities and which information that will be considered during the categorization process
attributes Attributes that are considered as theme candidates. Default: ['theme', 'sector']
count Maximum number of themes to collect
threshold Minimal probability, expressed as a percentage, for a theme candidate to be considered a theme

UriInfo

UriInfo is used in case you want the app to include or exclude some URI's to be used during the categorization process.

interface UriInfo {
    uri: boolean;
    attributes: string[];
}

with:

Property Description
uri Specifies whether the entity canonical should be considered a theme candidate
attributes Attributes that are considered as theme candidates. Default: ['theme', 'sector']

Results

ThemesResults

type ThemesResults = ThemesResult[];

ThemesResult

interface ThemesResult {
    name: string;
    relevancy: number;
}

with:

Property Description
name Name of the theme
relevancy Relevancy of the theme within the document

API

Examples

Using the Semantic Themes

This sample demonstrates how to use the Themes app from the wowool.semantic_themes package to extract and rank semantic themes from a document.

  • PipeLine is used to create a text analysis pipeline.
  • Themes is the app that extracts and ranks themes.
from wowool.sdk import Pipeline
from wowool.semantic_themes import Themes

analysis = Pipeline("english,entity,semantic-theme")
themes = Themes(count=2)
# run the document analysis
document = themes(analysis("EyeOnID works on cybercrime prevention"))
for item in document.results("wowool_themes"):
    print(f" - {item['name']}: {item['relevancy']}")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wowool_semantic_themes-3.1.2-py3-none-any.whl (9.4 kB view details)

Uploaded Python 3

File details

Details for the file wowool_semantic_themes-3.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for wowool_semantic_themes-3.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 04372deab4c77730e22619e77aa095d8c69f97ee4b477fe94793d910dfb6a1c3
MD5 1ea2977b1f6fc6c481b0804bd4489e9d
BLAKE2b-256 a50b6a0ee579fe362bef5849fd4e4230c982fb6416b7945820a6f5313dee5b9c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page