An interface for interacting with hdsr github repos
Project description
Context
- Created: November 2021
- Author: Renier Kramer, renier.kramer@hdsr.nl
- Python version: >3.5
Description
A python project that enables interaction with the GitHub API v3 to e.g. read/download files from the github organisation hdsr-mid. This project uses the github account HdsrMidReadOnly, that has read-only access to some of the hdsr-mid repos. This organisation requires 2FA, so authentication - via github api - is only possible with a personal access token. For this account the following yields:
- A personal access token has been created which is required for the Github API
- The token has no expiration date
- To change personal access token for account HdsrMidReadOnly
- login github.com with account HdsrMidReadOnly
- email: hdsrmidgithub@gmail.com
- password: please contact renier.kramer@hdsr.nl
- go to settings >> developer settings >> personal access token >> generate new token
- login github.com with account HdsrMidReadOnly
- To change authorization for account HdsrMidReadOnly
- get admin rights for hdsr-mid
- go to authorisation page
Usually, 3 ways exists to log in with GitHub API:
- Github(login_or_token=, password=)
- Github(login_or_token=<personal_access_token>)
- Github(base_url="https://{hostname}/api/v3", login_or_token=<personal_access_token>)
However, the first is not possible for github organisation 'hdsr-mid' since 13 sep 2021 as it requires two-factor authentication for everyone in the hdsr-mid organization: login trough github api is now only possible with token (options 1 and 2). In this project we use option 2.
Usage (simple)
pip install hdsr-pygithub
from hdsr_pygithub import GithubFileDownloader
from pathlib import Path
github_downloader = GithubFileDownloader(
repo_name="startenddate", # any hdsr-mid repo for which account HdsrMidReadOnly has access (see README.md)
target_file=Path("data/output/results/mwm_peilschalen_short.csv"), # this file must exist in the master branch
)
# download files to disk
download_directory = github_downloader.download_files(download_directory=<a_dir>)
downloaded_filepath = download_directory / "data/output/results/mwm_peilschalen_short.csv"
assert downloaded_filepath.exists()
# or read file in memory using e.g. pandas
import pandas as pd
url = downloader.get_download_url()
# in case filetype is a .csv (other filetypes see: https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html):
dataframe_file = pd.read_csv(filepath_or_buffer=url)
Usage (sophisticated)
pip install hdsr-pygithub
from hdsr_pygithub import GithubDirDownloader
from datetime import datetime
from pathlib import Path
github_downloader = GithubDirDownloader(
repo_name="startenddate", # any hdsr-mid repo for which account HdsrMidReadOnly has access (see README.md)
branch_name="main", # defaults to 'main' if not specified
target_dir=Path("data/output/results/"), # this dir must exist in the branch specified above
allowed_period_no_updates=datetime.timedelta(weeks=10), # defaults to 1 year if not specified
personal_access_token=<personal_access_token>, # you can use your own github account's token
repo_organisation='hdsr-mid', # defaults to 'hdsr-mid'
)
# download complete github directory (recursive) to disk
download_directory = github_downloader.download_files(download_directory=<a_dir>)
assert download_directory.is_dir()
# or download complete github directory (recursive) to disk to your Temp directory (C:/Users/<user>/AppData/Local/Temp/..)
download_directory = github_downloader.download_files(use_tmp_dir=True)
assert download_directory.is_dir()
License
Releases
Contributions
All contributions, bug reports, bug fixes, documentation improvements, enhancements and ideas are welcome. Issues are posted on: https://github.com/hdsr-mid/hdsr_pygithub/issues
Test coverage (release v1.3)
----------- coverage: platform win32, python 3.9.7-final-0 ---
Name Stmts Miss Cover
------------------------------------------------------
hdsr_pygithub\__init__.py 2 0 100%
hdsr_pygithub\constants.py 8 1 88%
hdsr_pygithub\downloader\base.py 162 21 87%
hdsr_pygithub\downloader\dir.py 88 0 100%
hdsr_pygithub\downloader\file.py 40 3 92%
hdsr_pygithub\exceptions.py 15 1 93%
setup.py 10 10 0%
------------------------------------------------------
TOTAL 325 36 89%
Conda general tips
Build conda environment (on Windows) from any directory using environment.yml:
> conda env create --name <conda_env_name> --file <path_to_project>/environment.yml python=<python_version>
> conda info --envs # verify that <conda_env_name> is in this list
Start the application from any directory:
> conda activate <conda_env_name>
At any location:
> (<conda_env_name>) python <path_to_project>/main.py
Test the application:
> conda activate <conda_env_name>
> cd <path_to_project>
> pytest # make sure pytest is installed (conda install pytest)
List all conda environments on your machine:
At any location:
> conda info --envs
Delete a conda environment:
Get directory where environment is located
> conda info --envs
Remove the enviroment
> conda env remove --name <conda_env_name>
Finally, remove the left-over directory by hand
Write dependencies to environment.yml:
The goal is to keep the .yml as short as possible (not include sub-dependencies), yet make the environment reproducible. Why? If you do 'conda install matplotlib' you also install sub-dependencies like pyqt, qt icu, and sip. You should not include these sub-dependencies in your .yml as:
- including sub-dependencies result in an unnecessary strict environment (difficult to solve when conflicting)
- sub-dependencies will be installed when dependencies are being installed
> conda activate <conda_env_name>
Recommended:
> conda env export --from-history --no-builds | findstr -v "prefix" > --file <path_to_project>/environment_new.yml
Alternative:
> conda env export --no-builds | findstr -v "prefix" > --file <path_to_project>/environment_new.yml
--from-history:
Only include packages that you have explicitly asked for, as opposed to including every package in the
environment. This flag works regardless how you created the environment (through CMD or Anaconda Navigator).
--no-builds:
By default, the YAML includes platform-specific build constraints. If you transfer across platforms (e.g.
win32 to 64) omit the build info with '--no-builds'.
Pip and Conda:
If a package is not available on all conda channels, but available as pip package, one can install pip as a dependency. Note that mixing packages from conda and pip is always a potential problem: conda calls pip, but pip does not know how to satisfy missing dependencies with packages from Anaconda repositories.
> conda activate <conda_env_name>
> conda install pip
> pip install <pip_package>
The environment.yml might look like:
channels:
- defaults
dependencies:
- <a conda package>=<version>
- pip
- pip:
- <a pip package>==<version>
You can also write a requirements.txt file:
> pip list --format=freeze > <path_to_project>/requirements.txt
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.