Skip to main content

Makes a network out of a URLs in a dataset of tweets

Project description

Domain Network

A package to create a domain network of the URLs mentioned in a dataset of texts. In the current version it works for tweets. It may process any kind of text in the future versions.

Installation

The easiest way to install the domain_network package is to use the following command in a terminal:

pip install domain-network

Usage

To run the module using Command Line Interface (CLI) run the following:

  • For the whole process starting with raw tweets:
python -m domainNetwork  --input_dir ["data/twitterAPI_lang_en/*/*.json"] --conf_dir  [‘config/sample_config.ini’] --min_edge_weight [20] --min_node_size [20] \
--min_stand_alone_size [50]   --urls_file_name  ["output/urls.csv"] \
--network_output_file_name  ["output/network.csv"] --netloc_output_file_name ["output/netloc.csv"] \
--netloc_origin_output_file_name  ["output/netloc_origin.csv"] 
  • For making domain network of a pre-processed file which includes extracted netlocs:
python -m domainNetwork  --conf_dir  [‘config/sample_config.ini’] --min_edge_weight [20] --min_node_size [20] \
--min_stand_alone_size [50]  --network_only true  --urls_file_name  ["data/urls.csv"] \
--network_output_file_name  ["output/network.csv"] --netloc_output_file_name ["output/netloc.csv"] \
--netloc_origin_output_file_name  ["output/netloc_origin.csv"] 

Parameters:

--input_dir : Directory of tweet files

--conf_dir : File path of the config file. Read Config file section for more details.

--min_edge_weight : Min number of users that mentioned both source and target of the edge in their tweets.

--min_node_size : Min number of times that a web page is mentioned in total, for connected nodes.

--min_stand_alone_size: Min number of times that a web page is mentioned in total, for stand-alone nodes.

--network_only : If you want to use a preprocessed file which includes the netlocs

--urls_file_name : File path of preprocessed tweets with netlocs. Can be output/input file in the above mentioned situations.

--network_output_file_name: File path of the generated network, in .csv format.

--netloc_output_file_name : File path of the list of web sites, after filtering, in .csv format.

--netloc_origin_output_file_name : File path of the original list of web sites, in .csv format.

--selected_users_fp : Specifies the target group of users, i.e. active users that we are interested in their domain network

Output

The main output of this package is network.csv which includes source, target and the weight. Output file can be given to a visualization tool, e.g. networkx in python for the visualization

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

domain_network-0.1.2.tar.gz (6.3 kB view details)

Uploaded Source

Built Distribution

domain_network-0.1.2-py3-none-any.whl (10.3 kB view details)

Uploaded Python 3

File details

Details for the file domain_network-0.1.2.tar.gz.

File metadata

  • Download URL: domain_network-0.1.2.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.4

File hashes

Hashes for domain_network-0.1.2.tar.gz
Algorithm Hash digest
SHA256 82dd9ffb455e1e2d75ad9e2a13d3b91c2dbf461317b45de7c91d20ba3298dc70
MD5 ad62dba7f8992780d641342e8187d1be
BLAKE2b-256 170f2d05d575a3aee3d99824a84aaf16f06d4b093f224e48239d9538417a927f

See more details on using hashes here.

File details

Details for the file domain_network-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: domain_network-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 10.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.4

File hashes

Hashes for domain_network-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6bde854b967e8daf52616b14f8f48fe4e4795bd74c4ab64ad47ac385b9649e61
MD5 3db3d93fc3c833946a50fff3d1fe2642
BLAKE2b-256 0000b55e106b3fdae01b9cf20dbdb114ccf69734fd18506fc6e2b7f78e5796dd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page