Skip to main content

GitHub Collaboration Relation Extraction

Project description

GitHub_Collaboration_Relation_Extraction

Collaboration Relation Extraction from GitHub logs. Collaboration relations include 2 categories: EventAction relations and Reference relations. This is a relation extraction tool for Project https://github.com/birdflyi/OSDB_STN_reference_coupling_data_analysis.

Quick Start

  1. Download the directory etc/ and file main.py in GitHub_Collaboration_Relation_Extraction into the root directory of your new project.
  2. Change the default settings in etc/authConf.py.
  • AuthConfig
    • You need to set the DEFAULT_INTMED_MODE in [I_AUTH_SETTINGS_LOCAL_HOSTS, I_AUTH_SETTINGS_ALIYUN_HOSTS, I_AUTH_SETTINGS_ALIYUN_INTERMEDIATE_HOSTS], and set the corresponding auth_settings_xxx_hosts dict.
    • If you have an Aliyun Cloud or other database service within github log tables, please set the server authorization information below the line Aliyun
    • If you want a sample dataset to start, you can Download a ClickHouse sample data for your docker container, and set the server authorization information below the line local docker image.
  • GITHUB_TOKENS
  1. Change the settings in main.py and run it.
  • Change the repo_names and year settings
    • Notes: It may take a lot of time to process all records. Set limit as a positive integer to limit the max number of records when you just want to take a test.
  • Create the data/ directory
    • Create the directory in the root directory of your project: data_dirs = ['data', 'data/github_osdb_data', 'data/global_data', 'data/github_osdb_data/repos', 'data/github_osdb_data/repos_dedup_content', 'data/github_osdb_data/GitHub_Collaboration_Network_repos']. Make directories:
import os

base_dir = '' or os.getcwd()  # you can set a base dir or use the current dir by default.
data_dirs = ['data', 'data/github_osdb_data', 'data/global_data', 'data/github_osdb_data/repos', 'data/github_osdb_data/repos_dedup_content', 'data/github_osdb_data/GitHub_Collaboration_Network_repos']
for rel_data_dir in data_dirs:  \
    os.makedirs(os.path.join(base_dir, rel_data_dir), exist_ok=True)  # avoid the FileExistsError

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gh_core-2.2.4.tar.gz (119.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

GH_CoRE-2.2.4-py2.py3-none-any.whl (85.3 kB view details)

Uploaded Python 2Python 3

File details

Details for the file gh_core-2.2.4.tar.gz.

File metadata

  • Download URL: gh_core-2.2.4.tar.gz
  • Upload date:
  • Size: 119.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.12

File hashes

Hashes for gh_core-2.2.4.tar.gz
Algorithm Hash digest
SHA256 9a0ecb55df8db84fb279a13335bf3ad74a9983a80abbd20772ac1900ae964ff1
MD5 2a2a498c7e904d2714e563937aa24121
BLAKE2b-256 6e78a7b081bc0bb714936d2290c861ff5b4e4cae36f3b2357ca0d68e4cd65b8b

See more details on using hashes here.

File details

Details for the file GH_CoRE-2.2.4-py2.py3-none-any.whl.

File metadata

  • Download URL: GH_CoRE-2.2.4-py2.py3-none-any.whl
  • Upload date:
  • Size: 85.3 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.12

File hashes

Hashes for GH_CoRE-2.2.4-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 811fe480f672a7cf755c0739794a0d9424369b170e673fa845afd3770b6b1c3f
MD5 ac0fbf9a80c4b697c3eee280c7127e16
BLAKE2b-256 5b976f77520c4f87e8c809f2f588e5e6dd5f857e03b74adc3220d197c1a7d86a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page