A generalized implementation of a dictionary-based content coder.

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

ContentCoder

AI Reading Machine

ContentCoder is a Python-based text analysis tool that enables users to process and analyze text using custom linguistic dictionaries. It is inspired by tools like LIWC (Linguistic Inquiry and Word Count) and provides robust methods for tokenization, text analysis, and frequency calculations. As noted in a much older version of the README.MD, this is a stripped-down, feature-incomplete version of several tools used in past projects.

Note that like 98% of this readme was generated by ChatGPT — it may not be entirely accurate, but at a quick glance, it looks pretty spot-on 😅🤞

🔥 Features

Custom Dictionary-Based Analysis
Support for LIWC-style dictionaries (2007 & 2022 formats)
Efficient text tokenization
Wildcard and abbreviation handling
Punctuation and big word analysis
Dictionary export in multiple formats (JSON, CSV, Poster format, etc.)
High-performance wildcard matching with memory optimization

🚀 Installation

Make sure you have Python 3.9+ installed (although it'll probably work with older versions as well). This package is pretty much entirely native Python, so it doesn't have any dependencies for installation. Well, none that I can recall, anyways 😄

pip install contentcoder

📁 Folder Structure

src/contentcoder/
│── __init__.py
│── ContentCoder.py
│── ContentCodingDictionary.py
│── happiestfuntokenizing.py
│── create_export_dir.py

📌 Quick Start

1. Import the `ContentCoder` class

from contentcoder.ContentCoder import ContentCoder

2. Initialize the Analyzer

cc = ContentCoder(dicFilename='path/to/dictionary.dic', fileEncoding='utf-8-sig')

3. Analyze a Text Sample

text = "Libraries are crucial to our society."
results = cc.Analyze(text, relativeFreq=True, dropPunct=True, retainCaptures=True, returnTokens=False, wildcardMem=True)
print(results)

Expected output:

{
  "WC": 6,
  "Dic": 4.5,
  "BigWords": 2.0,
  "Numbers": 0.0,
  "AllPunct": 0.0,
  "Period": 0.0,
  "Comma": 0.0,
  "QMark": 0.0,
  "Exclam": 0.0,
  "Apostro": 0.0,
  "Libraries": 1.0,
  "crucial": 1.0,
  "society": 1.0
}

📖 Main Functions & Usage

1️⃣ `Analyze(text, options)`**

Analyzes a given text and returns a dictionary of results.

Parameters:

inputText (str): The text to analyze.
relativeFreq (bool): If True, returns relative frequencies. Otherwise, raw frequencies.
dropPunct (bool): If True, punctuation is removed before processing.
retainCaptures (bool): If True, captures and stores wildcard-matched words.
returnTokens (bool): If True, returns tokenized text.
wildcardMem (bool): If True, speeds up wildcard processing by storing past matches.

Example Usage:

result = cc.Analyze("Hello world! This is a test sentence.", returnTokens=relativeFreq=True)

2️⃣ `GetResultsHeader()`

Returns a list of all available output categories.

Example Usage:

print(cc.GetResultsHeader())

Expected output:

["WC", "Dic", "BigWords", "Numbers", "AllPunct", "Period", "Comma", "QMark", "Exclam", "Apostro"]

3️⃣ `GetResultsArray(resultsDICT, rounding=4)`

Formats the results of Analyze() into a CSV-friendly list.

Example Usage:

text = "The government plays an important role."
result = cc.Analyze(text)
csv_row = cc.GetResultsArray(result)
print(csv_row)

Expected output:

[6, 4.3, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

4️⃣ `ExportCaptures(filename, fileEncoding='utf-8-sig', wildcardsOnly=False, fullset=True)`

Exports wildcard-captured words and their frequencies to a CSV file.

Example Usage:

cc.ExportCaptures("captured_words.csv")

5️⃣ `ExportDict2007Format(dicOutFilename, fileEncoding, separateDicts=False, separateDictsFolder=None)`

Exports the loaded dictionary in LIWC-2007 format.

Example Usage:

cc.dict.ExportDict2007Format("dictionary_2007.dic")

6️⃣ `ExportDict2022Format(dicOutFilename, fileEncoding, options)`**

Exports the loaded dictionary in LIWC-22 format.

Example Usage:

cc.dict.ExportDict2022Format("dictionary_2022.dicx")

7️⃣ `ExportDictJSON(filename, fileEncoding, indent=4)`

Exports the dictionary mapping to a JSON file.

Example Usage:

cc.dict.ExportDictJSON("dictionary.json")

8️⃣ `UpdateCategories(dicTerm, newCategories)`

Updates the categories associated with a dictionary term.

Example Usage:

cc.dict.UpdateCategories(dicTerm="happiness", newCategories={"positive_emotion": 1.0, "joy": 0.5})

🔄 Example: Processing a Large CSV File with `tqdm`

This script reads a large CSV file and processes each text in the "body" column.

import csv
from tqdm import tqdm
from contentcoder.ContentCoder import ContentCoder

cc = ContentCoder(dicFilename='dictionary.dic', fileEncoding='utf-8-sig')

with open("Comments.csv", "r", encoding="utf-8-sig") as csvfile:
    reader = csv.DictReader(csvfile)

    for row in tqdm(reader, desc="Processing", unit=" comments"):
        text = row["body"]
        result = cc.Analyze(text)

        # some other stuff to export your result here

⚡ Performance Optimizations

Uses wildcard caching to speed up regex evaluations.
Tokenization is optimized for handling social media text.
Processes large datasets efficiently using streaming CSV reads.

📜 Dictionary Formats Supported

LIWC-2007 (.dic)
LIWC-22 (.dicx, .csv)
JSON Exports
Custom Hierarchical Category Mapping

🤝 Contributing

Pull requests are welcome! If you find bugs or have feature requests, open an issue.

📄 License

📝 Acknowledgments

Developed by Ryan L. Boyd, Ph.D.
For academic and research purposes. Or, you know, whatever.

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

1.0.5

Sep 16, 2025

1.0.4

Feb 13, 2025

This version

1.0.3

Feb 13, 2025

1.0.2

Feb 13, 2025

1.0.1

Feb 13, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contentcoder-1.0.3.tar.gz (23.8 kB view details)

Uploaded Feb 13, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

contentcoder-1.0.3-py3-none-any.whl (22.9 kB view details)

Uploaded Feb 13, 2025 Python 3

File details

Details for the file contentcoder-1.0.3.tar.gz.

File metadata

Download URL: contentcoder-1.0.3.tar.gz
Upload date: Feb 13, 2025
Size: 23.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for contentcoder-1.0.3.tar.gz
Algorithm	Hash digest
SHA256	`b857f9a6dd40a48e65ad124b2c9dcc9aea85fc9ad11807321afbf8f0961ca269`
MD5	`7e3aca92b55e96c6ee9563e5e3682630`
BLAKE2b-256	`715c2ecd124a3a0efbc1f86517815c9e9004344d5799aec5be3e9349a02936f1`

See more details on using hashes here.

File details

Details for the file contentcoder-1.0.3-py3-none-any.whl.

File metadata

Download URL: contentcoder-1.0.3-py3-none-any.whl
Upload date: Feb 13, 2025
Size: 22.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for contentcoder-1.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`380b46cd4e47832e530c2b05f3bfdfbe3c7ba1fa8e22ab1818fa2850dd11d93d`
MD5	`f449de7eee4013b510794bbfad20bc11`
BLAKE2b-256	`64acc3319d9ff1708e1ade14e598b1c07c3458bbf583ffc0aaea532a6d1111bc`

See more details on using hashes here.

contentcoder 1.0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ContentCoder

🔥 Features

🚀 Installation

📁 Folder Structure

📌 Quick Start

1. Import the ContentCoder class

2. Initialize the Analyzer

3. Analyze a Text Sample

📖 Main Functions & Usage

1️⃣ Analyze(text, **options)

Parameters:

Example Usage:

2️⃣ GetResultsHeader()

Example Usage:

3️⃣ GetResultsArray(resultsDICT, rounding=4)

Example Usage:

4️⃣ ExportCaptures(filename, fileEncoding='utf-8-sig', wildcardsOnly=False, fullset=True)

Example Usage:

5️⃣ ExportDict2007Format(dicOutFilename, fileEncoding, separateDicts=False, separateDictsFolder=None)

Example Usage:

6️⃣ ExportDict2022Format(dicOutFilename, fileEncoding, **options)

Example Usage:

7️⃣ ExportDictJSON(filename, fileEncoding, indent=4)

Example Usage:

8️⃣ UpdateCategories(dicTerm, newCategories)

Example Usage:

🔄 Example: Processing a Large CSV File with tqdm

⚡ Performance Optimizations

📜 Dictionary Formats Supported

🤝 Contributing

📄 License

📝 Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

1. Import the `ContentCoder` class

1️⃣ `Analyze(text, options)`**

2️⃣ `GetResultsHeader()`

3️⃣ `GetResultsArray(resultsDICT, rounding=4)`

4️⃣ `ExportCaptures(filename, fileEncoding='utf-8-sig', wildcardsOnly=False, fullset=True)`

5️⃣ `ExportDict2007Format(dicOutFilename, fileEncoding, separateDicts=False, separateDictsFolder=None)`

6️⃣ `ExportDict2022Format(dicOutFilename, fileEncoding, options)`**

7️⃣ `ExportDictJSON(filename, fileEncoding, indent=4)`

8️⃣ `UpdateCategories(dicTerm, newCategories)`

🔄 Example: Processing a Large CSV File with `tqdm`