Twitter Demographer
Project description
Twitter Demographer
Twitter Demographer provides a simple API to enrich your twitter data with additional variables such as sentiment, user location, gender and age. The tool is completely extensible and you can add your own components to the system.
Free software: MIT license
Documentation: https://twitter-demographer.readthedocs.io.
Features
From a simple set of tweet ids, Twitter Demographer allows you to rehydrate them and to add additional variables to your dataset.
You are not forced to use a specific component. The design of this tool should be modular enough to allow you to decide what to add and what to remove.
from twitter_demographer.twitter_demographer import Demographer
from twitter_demographer.components import Rehydrate
from twitter_demographer.demographics.m3 import GenderAndAge
import pandas as pd
twitter_bearer_token = "TWITTER BEARER"
geonames_token = "GEONAMES TOKEN"
demo = Demographer()
component_1 = Rehydrate(twitter_bearer_token)
component_2 = GeoNamesDecoder(geonames_token)
component_3 = GenderAndAge()
data = pd.DataFrame({"tweet_ids": ["1431271582861774854", "1467887357668077581",
"1467887350084689928", "1467887352647462912"]})
print(data)
demo.add_component(component_1)
demo.add_component(component_2)
demo.add_component(component_3)
print(demo.infer(data))
tweet_ids screen_name name location user_id_str ... geo_location_country geo_location_address age gender is_org
0 1431271582861774854 federicobianchy Federico Bianchi Milano, Lombardia 2332157006 ... Italy Milan 19-29 male non-org
1 1467887357668077581 federicobianchy Federico Bianchi Milano, Lombardia 2332157006 ... Italy Milan 19-29 male non-org
2 1467887350084689928 federicobianchy Federico Bianchi Milano, Lombardia 2332157006 ... Italy Milan 19-29 male non-org
3 1467887352647462912 federicobianchy Federico Bianchi Milano, Lombardia 2332157006 ... Italy Milan 19-29 male non-org
Use-Case
Say you want to use an HuggingFace Classifier on some Twitter Data you have. For example, you might want to detect the sentiment of the data you have. The data you have might
Components
Twitter Demographer is based on components that can be concatenated together to build tools. For example, the GeoNamesDecoder to predict the location of a user from a string of text looks like this.
class GeoNamesDecoder(Component):
def __init__(self, key):
super().__init__()
self.key = key
def outputs(self):
return ["geo_location_country", "geo_location_address"]
def inputs(self):
return ["location"]
def infer(self, data):
geo = self.initialize_return_dict()
for val in data["location"]:
if val is None:
geo["geo_location_country"].append(None)
geo["geo_location_address"].append(None)
else:
g = geocoder.geonames(val, key=self.key)
geo["geo_location_country"].append(g.country)
geo["geo_location_address"].append(g.address)
return geo
Limitations and Ethical Considerations
Inferring user attributes always carries the risk of compromising user privacy, while this process can be useful for understanding and explaining phenomena in the social sciences, one should always consider the issues that this can create.
Credits
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.
History
0.1.0 (2021-12-16)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for twitter_demographer-0.1.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5ad79271e40e1a4383936bd4c5d90fac4574accda14a47ea924938c168212e6d |
|
MD5 | 2ec3ea403f0afa8ba327048b40efa8a6 |
|
BLAKE2b-256 | 6c113ead4c63ea287f88527d1461362cd5f2ff9e03b1a0d73f69943fbc32a378 |
Hashes for twitter_demographer-0.1.2-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e3030e882a8f625dc86c70abbf730c28282404d515ea86221190ac2a59af83c5 |
|
MD5 | 45889273f70b60a67edfaf9feff35d77 |
|
BLAKE2b-256 | 2c4d434c26b9d39d1bda11c34f42c4b2defbf17d82bf657f022ea79f045c2350 |