Skip to main content

Linguistic Inquiry and Word Count (LIWC) analyzer (proprietary data not included)

Project description

liwc

PyPI version Travis CI Build Status

Linguistic Inquiry and Word Count (LIWC) analyzer.

The LIWC lexicon is proprietary, so it is not included in this repository, but this Python package requires it. The lexicon data can be acquired (purchased) from liwc.net. This package reads from the LIWC2007_English100131.dic (MD5: 2a8c06ee3748218aa89b975574b4e84d) file, which must be available on any system where this package is used.

The LIWC2007 .dic format looks like this:

%
1   funct
2   pronoun
[...]
%
a   1   10
abdomen*    146 147
about   1   16  17
[...]

Setup

Install from PyPI:

pip install -U liwc

Example

import re
from collections import Counter

def tokenize(text):
    # you may want to use a smarter tokenizer
    for match in re.finditer(r'\w+', text, re.UNICODE):
        yield match.group(0)

import liwc
parse, category_names = liwc.load_token_parser('LIWC2007_English100131.dic')
  • parse is a function from a token of text (a string) to a list of matching LIWC categories (a list of strings)
  • category_names is all LIWC categories in the lexicon (a list of strings)
gettysburg = '''Four score and seven years ago our fathers brought forth on
  this continent a new nation, conceived in liberty, and dedicated to the
  proposition that all men are created equal. Now we are engaged in a great
  civil war, testing whether that nation, or any nation so conceived and so
  dedicated, can long endure. We are met on a great battlefield of that war.
  We have come to dedicate a portion of that field, as a final resting place
  for those who here gave their lives that that nation might live. It is
  altogether fitting and proper that we should do this.'''
gettysburg_tokens = tokenize(gettysburg)
# now flatmap over all the categories in all of the tokens using a generator:
gettysburg_counts = Counter(category for token in gettysburg_tokens for category in parse(token))
# and print the results:
print(gettysburg_counts)

License

Copyright (c) 2012-2019 Christopher Brown. MIT Licensed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

liwc-0.5.0.tar.gz (4.7 kB view details)

Uploaded Source

Built Distribution

liwc-0.5.0-py2.py3-none-any.whl (5.1 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file liwc-0.5.0.tar.gz.

File metadata

  • Download URL: liwc-0.5.0.tar.gz
  • Upload date:
  • Size: 4.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.40.2 CPython/3.7.5

File hashes

Hashes for liwc-0.5.0.tar.gz
Algorithm Hash digest
SHA256 0e115296ff31e3c25ed409af7cf94d0c02d29fb596e3db896ac3f6687912ee50
MD5 f27b8ffb176053031b2d7133d3338ec5
BLAKE2b-256 6818f865fabfc903a5f241155db475f8a387d3874a2eed412b7baf988f0b8cab

See more details on using hashes here.

File details

Details for the file liwc-0.5.0-py2.py3-none-any.whl.

File metadata

  • Download URL: liwc-0.5.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 5.1 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.40.2 CPython/3.7.5

File hashes

Hashes for liwc-0.5.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 dff606f3ed75609117e46550606f2d378fa05527e2168f7342cb428fedc3e657
MD5 ce339dc82ec9fda5230b004b7fc53e4c
BLAKE2b-256 c97b44560f665fbeb8bd8e297bcd3ac87b10336a3f621ad2db292e7a17f3b1da

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page