Skip to main content

Linguistic Inquiry and Word Count (LIWC) analyzer (proprietary data not included)

Project description

# `liwc`

[![PyPI version](https://badge.fury.io/py/liwc.svg)](https://pypi.org/project/liwc/)
[![Travis CI Build Status](https://travis-ci.org/chbrown/liwc-python.svg?branch=master)](https://travis-ci.org/chbrown/liwc-python)

Linguistic Inquiry and Word Count (LIWC) analyzer.

The LIWC lexicon is proprietary, so it is _not_ included in this repository,
but this Python package requires it.
The lexicon data can be acquired (purchased) from [liwc.net](http://liwc.net/).
This package reads from the `LIWC2007_English100131.dic` (MD5: `2a8c06ee3748218aa89b975574b4e84d`) file,
which must be available on any system where this package is used.

The LIWC2007 `.dic` format looks like this:

%
1 funct
2 pronoun
[...]
%
a 1 10
abdomen* 146 147
about 1 16 17
[...]


## Setup

Install from [PyPI](https://pypi.python.org/pypi/liwc):

pip install -U liwc


## Example

```python
import re
from collections import Counter

def tokenize(text):
# you may want to use a smarter tokenizer
for match in re.finditer(r'\w+', text, re.UNICODE):
yield match.group(0)

import liwc
parse, category_names = liwc.load_token_parser('LIWC2007_English100131.dic')
```

* `parse` is a function from a token of text (a string) to a list of matching LIWC categories (a list of strings)
* `category_names` is all LIWC categories in the lexicon (a list of strings)

```python
gettysburg = '''Four score and seven years ago our fathers brought forth on
this continent a new nation, conceived in liberty, and dedicated to the
proposition that all men are created equal. Now we are engaged in a great
civil war, testing whether that nation, or any nation so conceived and so
dedicated, can long endure. We are met on a great battlefield of that war.
We have come to dedicate a portion of that field, as a final resting place
for those who here gave their lives that that nation might live. It is
altogether fitting and proper that we should do this.'''
gettysburg_tokens = tokenize(gettysburg)
# now flatmap over all the categories in all of the tokens using a generator:
gettysburg_counts = Counter(category for token in gettysburg_tokens for category in parse(token))
# and print the results:
print(gettysburg_counts)
```


## License

Copyright (c) 2012-2018 Christopher Brown. [MIT Licensed](LICENSE.txt).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

liwc-0.4.0.tar.gz (3.5 kB view hashes)

Uploaded Source

Built Distribution

liwc-0.4.0-py2.py3-none-any.whl (4.0 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page