NarrativeMapper is a text analysis pipeline that uncovers the dominant narratives and emotional tones within online communities.

Project description

NarrativeMapper

Overview:

The NarrativeMapper package is a discourse analysis pipeline that uncovers the dominant narratives and emotional tones within online communities.

This project processes textual messages from .csv files, then applies OpenAI’s embedding API (text-embedding-3-large) to convert each message into semantic vectors. These embeddings are clustered using UMAP for dimensionality reduction and HDBSCAN for density-based clustering.

For each discovered cluster, the tool:

Extracts the main talking points OpenAI Chat Completions
Analyzes the emotional tone using a Hugging Face sentiment classifier
Outputs structured summaries of the narrative + emotion pairs

Install via PyPI:

Example Output:

This example is based off of 1800 r/antiwork comments from the top 300 posts within the last year (Date of Writing: 2025-04-03).

Output using format_to_dict() function. Useful for JSON export.

{
    'online_group_name': 'r/antiwork',
    'clusters': [
        {
            'label': 'The core theme of this cluster revolves around frustrations and criticisms of modern job application processes, including exploitative practices, ineffective interviews, and the use of AI and personality tests that often discriminate against neurodiverse individuals.',
            'tone': 'NEGATIVE',
            'comment_count': 74
        },
        {
            'label': 'The core themes of this cluster revolve around the challenges of low wages in the fast food and service industries, the rising cost of living, and the perceived disconnect between corporate profits and employee compensation.',
            'tone': 'NEGATIVE',
            'comment_count': 109
        },
        {
            'label': 'The core theme of this cluster revolves around employee dissatisfaction with workplace policies, management practices, and the struggle for work-life balance, often highlighting issues of wage theft, lack of respect for personal time, and the negative impact of corporate culture on mental health.',
            'tone': 'NEGATIVE',
            'comment_count': 500
        },
        {
            'label': "The core theme of this cluster revolves around the dissatisfaction with traditional work schedules, advocating for shorter workweeks and better work-life balance, while highlighting the negative impact of long hours and inadequate parental leave on individuals' well-being.",
            'tone': 'NEGATIVE',
            'comment_count': 83
        },
        {
            'label': "The core theme of this cluster revolves around workers' struggles for fair wages, unionization, and collective action against corporate exploitation, particularly in the context of Boeing.",
            'tone': 'NEGATIVE',
            'comment_count': 56
        },
        {
            'label': 'The comments primarily express strong criticism of Elon Musk and the corporate culture surrounding wealth accumulation, highlighting issues of exploitation, inequality, and the disconnect between CEOs and their employees.',
            'tone': 'NEGATIVE',
            'comment_count': 50
        },
        {
            'label': 'The core theme of this cluster revolves around the critique of wealth inequality and capitalism, highlighting the exploitation of workers, the concentration of wealth among the elite, and the systemic issues that perpetuate economic disparity and social injustice.',
            'tone': 'NEGATIVE',
            'comment_count': 157
        },
        {
            'label': 'The comments reflect widespread frustration and despair among younger generations regarding financial instability, lack of affordable housing, inadequate retirement planning, and the perception of being exploited in the workforce, often contrasting their struggles with the experiences of older generations.',
            'tone': 'NEGATIVE',
            'comment_count': 92
        }
    ]
}

Two other formatting functions are available, format_by_text() and format_by_cluster(), both return pandas DataFrames that are well-suited for CSV export.

format_by_cluster() returns columns:

online_group_name - online group name
cluster_label - cluster summary/label
comment_count - sampled textual messages per cluster
aggregated_sentiment - net sentiment, of form 'NEGATIVE', 'POSITIVE', 'NEUTRAL'
all_sentiments - this is a list containing dict items of the form '{'label': 'NEGATIVE', 'score': 0.9896971583366394}' for each message (sentiment calculated by distilbert-base-uncased-finetuned-sst-2-english).

example to showcase output format:

click to view CSV

format_by_text() returns columns:

online_group_name - online group name
cluster_label - cluster summary/label the textual message belongs to
text - the sampled textual message (this function returns all of them row by row)
sentiment - dict item holding sentiment calculation, of the form '{'label': 'NEGATIVE', 'score': 0.9896971583366394}' (sentiment calculated by distilbert-base-uncased-finetuned-sst-2-english).

example output to showcase output format:

click to view CSV

Pipeline Architecture:

CSV Text Data --> Embeddings (embeddings.py) --> Cluster (clustering.py) --> Summarize (summarize.py) --> Formatting (formatters.py)

embeddings.py: Converts textual messages into 3072 dimensional vectors (OPEN AI's text-embedding-3-large).

clustering.py: Clusters embedding vectors using UMAP for reduction and HDBSCAN for clustering.

summarize.py: Determines summaries/label-names (4o-gpt-mini Chat Completion) and sentiment (distilbert-base-uncased-finetuned-sst-2-english) for each cluster.

formatters.py: Formats summarized clusters into useful forms for data analysis.

How to Use:

IMPORTANT:

To use this package, you'll need an OpenAI API key. Create a .env file in your root directory (same folder where your script runs).

Inside the .env file, add your API key like this:

   OPENAI_API_KEY="your-api-key-here"

The package will automatically load your key using python-dotenv. (Make sure to keep your .env file private and add it to your .gitignore if you're using Git.)

Option 1: High-Level Class-Based Interface

#initialize NarrativeMapper object
mapper = NarrativeMapper("r/antiwork")

#embeds semantic vectors
mapper.load_embeddings("path/to/your/file.csv")

#clustering: n_components, n_neighbors are UMAP variables. min_cluser_size, min_samples are HDBSCAN variables.
mapper.cluster(n_components=20, n_neighbors=20, min_cluster_size=40, min_samples=15)

#summarize each cluster's topic and sentiment
mapper.summarize()

#export in your preferred format
summary_dict = mapper.format_to_dict()
text_df = mapper.format_by_text()
cluster_df = mapper.format_by_cluster()

#saving DataFrames to csv
text_df.to_csv("comments_by_cluster.csv", index=False)
cluster_df.to_csv("cluster_summary.csv", index=False)

Option 2: Low-Level Functional Interface

#manual control over each step:
embeddings = get_embeddings("path/to/your/file.csv")
cluster_df = cluster_embeddings(embeddings, n_components=20, n_neighbors=20, min_cluster_size=40, min_samples=15)
summary_df = summarize_clusters(cluster_df)

#export/format options
summary_dict = format_to_dict(summary_df, online_group_name="r/antiwork")
text_df = format_by_text(summary_df, online_group_name="r/antiwork")
cluster_df = format_by_cluster(summary_df, online_group_name="r/antiwork")

Project details

Release history Release notifications | RSS feed

0.3.4

Apr 12, 2025

0.3.3

Apr 12, 2025

0.3.2

Apr 12, 2025

0.3.1

Apr 11, 2025

0.3.0

Apr 11, 2025

0.2.9

Apr 11, 2025

0.2.8

Apr 10, 2025

0.2.7

Apr 10, 2025

0.2.6

Apr 10, 2025

0.2.5

Apr 10, 2025

0.2.4

Apr 10, 2025

0.2.3

Apr 10, 2025

0.2.2

Apr 9, 2025

0.2.1

Apr 8, 2025

0.2.0

Apr 8, 2025

0.1.9

Apr 7, 2025

0.1.8

Apr 7, 2025

0.1.7

Apr 7, 2025

0.1.6

Apr 6, 2025

0.1.5

Apr 5, 2025

0.1.4

Apr 5, 2025

This version

0.1.3

Apr 4, 2025

0.1.2

Apr 4, 2025

0.1.1

Apr 4, 2025

0.1.0

Apr 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

narrativemapper-0.1.3.tar.gz (9.8 kB view details)

Uploaded Apr 4, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

narrativemapper-0.1.3-py3-none-any.whl (12.0 kB view details)

Uploaded Apr 4, 2025 Python 3

File details

Details for the file narrativemapper-0.1.3.tar.gz.

File metadata

Download URL: narrativemapper-0.1.3.tar.gz
Upload date: Apr 4, 2025
Size: 9.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.0

File hashes

Hashes for narrativemapper-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`2113e77c3d45510436f9c02519858fa190c682611a8fd39f364289f5942e1ba0`
MD5	`f52c3bd4bf2162fae5af1643a6098ae2`
BLAKE2b-256	`0823be1bb4af9812d52e08fc3f617ca99eea7dafea1f79935d687aa8e6674f15`

See more details on using hashes here.

File details

Details for the file narrativemapper-0.1.3-py3-none-any.whl.

File metadata

Download URL: narrativemapper-0.1.3-py3-none-any.whl
Upload date: Apr 4, 2025
Size: 12.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.0

File hashes

Hashes for narrativemapper-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`89912d52424227ebcc7fc4d43a40956c3b4a0623f7612fbeecf628126f086a02`
MD5	`8aaced0eb9512d64029964c159e8b28b`
BLAKE2b-256	`1ce5efe0f631d9b14bada386321d06f239f18010909521035a0a522e72ae538e`

See more details on using hashes here.

NarrativeMapper 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

NarrativeMapper

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes