GitHub Collaboration Relation Extraction
Project description
GitHub_Collaboration_Relation_Extraction
Collaboration Relation Extraction from GitHub logs. Collaboration relations include 2 categories: EventAction relations and Reference relations. This is a relation extraction tool for Project https://github.com/birdflyi/OSDB_STN_reference_coupling_data_analysis.
Quick Start
- Download the directory
etc/and filemain.pyin GitHub_Collaboration_Relation_Extraction into the root directory of your new project. - Change the default settings in
etc/authConf.py.
- AuthConfig
- You need to set the DEFAULT_INTMED_MODE in [I_AUTH_SETTINGS_LOCAL_HOSTS, I_AUTH_SETTINGS_ALIYUN_HOSTS, I_AUTH_SETTINGS_ALIYUN_INTERMEDIATE_HOSTS], and set the corresponding auth_settings_xxx_hosts dict.
- If you have an Aliyun Cloud or other database service within github log tables, please set the server authorization information below the line Aliyun
- If you want a sample dataset to start, you can Download a ClickHouse sample data for your docker container, and set the server authorization information below the line local docker image.
- GITHUB_TOKENS
- You need to replace the GITHUB_TOKENS with effective GitHub tokens start with 'gh', if you donot have any GitHub token, try to Creating a fine-grained personal access token.
- Change the settings in
main.pyand run it.
- Change the
repo_namesandyearsettings- Notes: It may take a lot of time to process all records. Set
limitas a positive integer to limit the max number of records when you just want to take a test.
- Notes: It may take a lot of time to process all records. Set
- Create the
data/directory- Create the directory in the root directory of your project: data_dirs = ['data', 'data/github_osdb_data', 'data/global_data', 'data/github_osdb_data/repos', 'data/github_osdb_data/repos_dedup_content', 'data/github_osdb_data/GitHub_Collaboration_Network_repos']. Make directories:
import os
base_dir = '' or os.getcwd() # you can set a base dir or use the current dir by default.
data_dirs = ['data', 'data/github_osdb_data', 'data/global_data', 'data/github_osdb_data/repos', 'data/github_osdb_data/repos_dedup_content', 'data/github_osdb_data/GitHub_Collaboration_Network_repos']
for rel_data_dir in data_dirs: \
os.makedirs(os.path.join(base_dir, rel_data_dir), exist_ok=True) # avoid the FileExistsError
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
gh_core-2.3.0.0.tar.gz
(123.5 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
gh_core-2.3.0.0-py2.py3-none-any.whl
(130.3 kB
view details)
File details
Details for the file gh_core-2.3.0.0.tar.gz.
File metadata
- Download URL: gh_core-2.3.0.0.tar.gz
- Upload date:
- Size: 123.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f617b77d0d41f0aa83e11a8265c48d40c0e8b408951a513e8f28f793be42c027
|
|
| MD5 |
860c316aa4d50760bdf87e029ea023b0
|
|
| BLAKE2b-256 |
d0b7aec6e80b158be428290d61b0992148448fd7d38e5c04300e712313711e9f
|
File details
Details for the file gh_core-2.3.0.0-py2.py3-none-any.whl.
File metadata
- Download URL: gh_core-2.3.0.0-py2.py3-none-any.whl
- Upload date:
- Size: 130.3 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
615740574e79f46cc42ae8dc46a0414b9a2315c99ee95ee872a9555922e6d058
|
|
| MD5 |
7be0b1509a9e278eaf9c20dc69b97017
|
|
| BLAKE2b-256 |
3402ab55b52f67c083f922ece26840e671daaadb623c705ba5cf3b7806d9f3e4
|