Location based social network (LBSN) data structure format & transfer tool

Project description

LBSNTransform

A python package that uses the common location based social network (LBSN) data structure (ProtoBuf) to import, transform and export Social Media data such as Twitter and Flickr.

Illustration of functions

Motivation

The goal is to provide a common interface to handle Social Media Data, without the need to individually adapt to the myriad API endpoints available. As an example, consider the ProtoBuf spec lbsn.Post, which can be a Tweet on Twitter, a Photo shared on Flickr, or a post on Reddit. However, all of these objects share a common set of attributes, which is reflected in the lbsnstructure.

The tool is based on a 4-Facet conceptual framework for LBSN, introduced in a paper by Dunkel et al. (2018).

The GDPR directly requests Social Media Network operators to allow users to transfer accounts and data in-between services. While there are attempts by Google, Facebook etc. (e.g. see the data-transfer-project), this is not currently possible. With the lbsnstructure, a primary motivation is to systematically characterize LBSN data aspects in a common, cross-network data scheme that enables privacy-by-design for connected software, data handling and database design.

Description

This tool enables data import from a Postgres database, JSON, or CSV and export to CSV, LBSN ProtoBuf or the hll and raw versions of the LBSN prepared Postgres Databases. The tool will map Social Media endpoints (e.g. Twitter tweets) to a common LBSN Interchange Structure format in ProtoBuf. LBSNTransform can be used using the command line (CLI) or imported to other Python projects with import lbsntransform, for on-the-fly conversion.

Quick Start

The recommended way to install lbsntransform, for both Linux and Windows, is through the conda package manager.

Create a conda env using environment.yml

First, create an environment with the dependencies for lbsntransform using the [environment.yml][environment.yml] that is provided in the root of the repository.

git clone https://github.com/Sieboldianus/lbsntransform.git
cd lbsntransform
# not necessary, but recommended:
conda config --env --set channel_priority strict
conda env create -f environment.yml

Install lbsntransform without dependencies

Afterwards, install lbsntransform using pip, without dependencies.

conda activate lbsntransform
pip install lbsntransform --no-deps --upgrade
# or locally, from the latest commits on master
# pip install . --no-deps --upgrade

Import data using a mapping

For each data source, a mapping must be provided that defines how data is mapped to the lbsnstructure.

The default mapping is lbsnraw.

Additional mappings can be dynamically loaded from a folder.

We have provided two example mappings for the Flickr YFCC100M dataset (CSV) and Twitter (json).

For example, to import the first 1000 records from json data from Twitter to the lbsn raw database, clone field_mapping_twitter.py to a local folder ./resources/mappings/, startup the Docker rawdb container, and use:

lbsntransform --origin 3 \
              --mappings_path ./resources/mappings/ \
              --file_input \
              --file_type "json" \
              --mappings_path ./resources/mappings/ \
              --dbpassword_output "sample-key" \
              --dbuser_output "postgres" \
              --dbserveraddress_output "127.0.0.1:5432" \
              --dbname_output "rawdb" \
              --dbformat_output "lbsn" \
              --transferlimit 1000

.. with the above input args, the the tool will:

read local json from ./01_Input/
and store lbsn records to the lbsn rawdb.

Vice versa, to import data directly to the privacy-aware version of lbsnstructure, called hlldb, startup the Docker container, and use:

lbsntransform --origin 3 \
              --mappings_path ./resources/mappings/ \
              --file_input \
              --file_type "json" \
              --mappings_path ./resources/mappings/ \
              --dbpassword_output "sample-key" \
              --dbuser_output "postgres" \
              --dbserveraddress_output "127.0.0.1:25432" \
              --dbname_output "hlldb" \
              --dbformat_output "hll" \
              --dbpassword_hllworker "sample-key" \
              --dbuser_hllworker "postgres" \
              --dbserveraddress_hllworker "127.0.0.1:25432" \
              --dbname_hllworker "hlldb" \
              --include_lbsn_objects "origin,post" \
              --include_lbsn_bases hashtag,place,date,community \
              --transferlimit 1000

.. with the above input args, the the tool will:

read local json from ./01_Input/
and store lbsn records to the privacy-aware lbsn hlldb
by converting only lbsn objects of type origin and post
and updating the HyperLogLog (HLL) target tables hashtag, place, date and community

A full list of possible input and output args is available in the documentation.

Built With

lbsnstructure - A common language independend and cross-network social-media datascheme
protobuf - Google's data interchange format
psycopg2 - Python-PostgreSQL Database Adapter
ppygis3 - A PPyGIS port for Python
shapely - Geometric objects processing in Python
emoji - Emoji handling in Python

Authors

Alexander Dunkel - Initial work

License

This project is licensed under the GNU GPLv3 or any higher - see the LICENSE.md file for details.

Project details

Release history Release notifications | RSS feed

This version

0.26.0

Aug 1, 2023

0.25.1

May 16, 2023

0.25.0

May 15, 2023

0.24.2

May 9, 2023

0.24.1

May 5, 2023

0.23.0

Nov 23, 2022

0.22.1

Nov 22, 2022

0.22.0

Apr 8, 2022

0.21.3

Mar 18, 2022

0.21.2

Mar 18, 2022

0.21.1

Mar 17, 2022

0.21.0

Mar 15, 2022

0.20.0

May 11, 2021

0.19.0

May 11, 2021

0.18.3

May 5, 2021

0.18.2

Apr 26, 2021

0.18.1

Apr 19, 2021

0.18.0

Apr 16, 2021

0.17.0

Apr 15, 2021

0.16.1

Mar 13, 2021

0.16.0

Jan 14, 2021

0.15.0

Jan 9, 2021

0.14.1

Jan 6, 2021

0.14.0

Dec 11, 2020

0.13.0

May 12, 2020

0.12.2

Mar 3, 2020

0.12.1

Feb 11, 2020

0.12.0

Jan 22, 2020

0.11.0

Dec 20, 2019

0.10.2

Nov 21, 2019

0.10.1

Nov 20, 2019

0.10.0

Nov 19, 2019

0.9.1

Oct 23, 2019

0.9.0

Oct 23, 2019

0.8.3

Oct 21, 2019

0.8.2

Sep 17, 2019

0.8.1

Sep 17, 2019

0.8.0

Aug 22, 2019

0.7.3

Jul 12, 2019

0.7.2

Jul 12, 2019

0.7.1

Jun 11, 2019

0.7.0

Jun 10, 2019

0.6.0

Jun 3, 2019

0.5.0

Jun 3, 2019

0.3.21

Jan 11, 2019

0.3.20

Jan 11, 2019

0.3.19

Jan 9, 2019

0.3.18

Jan 6, 2019

0.3.17

Jan 6, 2019

0.3.16

Jan 6, 2019

0.3.15

Jan 6, 2019

0.3.14

Jan 6, 2019

0.3.13

Jan 6, 2019

0.3.12

Jan 6, 2019

0.3.11

Jan 6, 2019

0.3.10

Jan 6, 2019

0.3.9

Jan 6, 2019

0.3.8

Jan 6, 2019

0.3.7

Jan 5, 2019

0.3.6

Jan 5, 2019

0.3.5

Jan 5, 2019

0.3.1

Jan 4, 2019

0.2.0

Jan 4, 2019

0.1.600

Jan 3, 2019

0.1.521

Dec 23, 2018

0.1.520

Dec 19, 2018

0.1.519

Dec 19, 2018

0.1.518

Dec 19, 2018

0.1.517

Dec 18, 2018

0.1.516

Dec 10, 2018

0.1.515

Dec 6, 2018

0.1.514

Dec 6, 2018

0.1.513

Dec 6, 2018

0.1.511

Dec 5, 2018

0.1.510

Dec 5, 2018

0.1.22

Jan 4, 2019

0.1.21

Jan 4, 2019

0.1.20

Jan 4, 2019

0.1.19

Jan 4, 2019

0.1.4

Jul 26, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lbsntransform-0.26.0.tar.gz (208.8 kB view details)

Uploaded Aug 1, 2023 Source

Built Distribution

lbsntransform-0.26.0-py3-none-any.whl (90.5 kB view details)

Uploaded Aug 1, 2023 Python 3

File details

Details for the file lbsntransform-0.26.0.tar.gz.

File metadata

Download URL: lbsntransform-0.26.0.tar.gz
Upload date: Aug 1, 2023
Size: 208.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for lbsntransform-0.26.0.tar.gz
Algorithm	Hash digest
SHA256	`2b7978550aade62e4402c58a54d638e9d78f50494c63d958fba0dd558f3654d6`
MD5	`eb20a93e00029317f547a8006ab74779`
BLAKE2b-256	`bccf0fdb595084f7180df2f3a35b900e8755ae6284ce9fb2ee4a7651bfcf1199`

See more details on using hashes here.

File details

Details for the file lbsntransform-0.26.0-py3-none-any.whl.

File metadata

Download URL: lbsntransform-0.26.0-py3-none-any.whl
Upload date: Aug 1, 2023
Size: 90.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for lbsntransform-0.26.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0c4298678cf81df09c8e2578aa5c8417954122817239035b222c54730c711637`
MD5	`f125c0bab0cff6a590f848d42430a4b8`
BLAKE2b-256	`5012d6a0d9bd3312617ecb50bd94706867de02de50cc2f7bf668da878c116bf1`

See more details on using hashes here.

lbsntransform 0.26.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

LBSNTransform

Motivation

Description

Quick Start

Built With

Authors

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes