Skip to main content

GitHub Collaboration Relation Extraction

Project description

GitHub_Collaboration_Relation_Extraction

Collaboration Relation Extraction from GitHub logs. Collaboration relations include 2 categories: EventAction relations and Reference relations. This is a relation extraction tool for Project https://github.com/birdflyi/OSDB_STN_reference_coupling_data_analysis.

Quick Start

  1. Download the directory etc/ and file main.py in GitHub_Collaboration_Relation_Extraction into the root directory of your new project.
  2. Change the default settings in etc/authConf.py.
  • AuthConfig
    • You need to set the DEFAULT_INTMED_MODE in [I_AUTH_SETTINGS_LOCAL_HOSTS, I_AUTH_SETTINGS_ALIYUN_HOSTS, I_AUTH_SETTINGS_ALIYUN_INTERMEDIATE_HOSTS], and set the corresponding auth_settings_xxx_hosts dict.
    • If you have an Aliyun Cloud or other database service within github log tables, please set the server authorization information below the line Aliyun
    • If you want a sample dataset to start, you can Download a ClickHouse sample data for your docker container, and set the server authorization information below the line local docker image.
  • GITHUB_TOKENS
  1. Change the settings in main.py and run it.
  • Change the repo_names and year settings
    • Notes: It may take a lot of time to process all records. Set limit as a positive integer to limit the max number of records when you just want to take a test.
  • Create the data/ directory
    • Create the directory in the root directory of your project: data_dirs = ['data', 'data/github_osdb_data', 'data/global_data', 'data/github_osdb_data/repos', 'data/github_osdb_data/repos_dedup_content', 'data/github_osdb_data/GitHub_Collaboration_Network_repos']. Make directories:
import os

base_dir = '' or os.getcwd()  # you can set a base dir or use the current dir by default.
data_dirs = ['data', 'data/github_osdb_data', 'data/global_data', 'data/github_osdb_data/repos', 'data/github_osdb_data/repos_dedup_content', 'data/github_osdb_data/GitHub_Collaboration_Network_repos']
for rel_data_dir in data_dirs:  \
    os.makedirs(os.path.join(base_dir, rel_data_dir), exist_ok=True)  # avoid the FileExistsError

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gh_core-1.2.0.tar.gz (115.9 kB view details)

Uploaded Source

Built Distribution

GH_CoRE-1.2.0-py2.py3-none-any.whl (81.0 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file gh_core-1.2.0.tar.gz.

File metadata

  • Download URL: gh_core-1.2.0.tar.gz
  • Upload date:
  • Size: 115.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for gh_core-1.2.0.tar.gz
Algorithm Hash digest
SHA256 998cc4018ad62886eb241e88fedc1a1c03dfbad4aaa501d9300f4861651f1913
MD5 68b18d3886dcf211dd6dc9b27eb71e94
BLAKE2b-256 61d7b2b88d364918d361f306e48fffe10b88fc0d6471a4f236c2bc9615cb4083

See more details on using hashes here.

File details

Details for the file GH_CoRE-1.2.0-py2.py3-none-any.whl.

File metadata

  • Download URL: GH_CoRE-1.2.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 81.0 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for GH_CoRE-1.2.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 2d6462c8615cbb9a7ce26346d5d75f763ad3416f7e379951d3224d45dbef7706
MD5 14e6a38a3f6a64970a695a0363d5c5d7
BLAKE2b-256 63133319d3e0443ad989529a383cdcb72abe19e506f7dbe261382012c353b5e7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page