Command-line filter for GitHub repositories that contain "samples", instead of real project or framework or library
Project description
samples-filter
Samples-filter is a command-line filter
for GitHub repositories that contain samples
,
instead of real project or framework or library.
E.g. leeowenowen/rxjava-examples,
streaming-with-flink/examples-java,
redisson/redisson-examples.
Motivation. During the work on cam project, where we're building datasets with open source Java programs, we discovered the need for filtering repositories that contain not a real code, but rather samples, tutorials or examples. This repository is portable command-line tool that filters those sample repositories.
How to use
First, install it from PyPI like that:
pip install samples-filter
then, execute:
samples-filter filter --repositories=repos.csv --out=filtered.csv
For --repositories
you should provide a name of existing csv
dataset with GitHub repositories, and name for the output file in --out
(it will be created automatically).
Filtering method
We take the input in the format of repositories.csv
:
full_name,default_branch,stars,forks,created_at,size,open_issues_count,description,topics
apache/kafka,trunk,27266,13448,2011-08-15T18:06:16Z,182971,1085,"Mirror of Apache Kafka",kafka scala
apache/flink,master,23128,12938,2014-06-07T07:00:10Z,489079,1169,"Apache Flink",big-data flink java python scala sql
apache/cassandra,trunk,8506,3537,2009-05-21T02:10:09Z,427867,474,"Mirror of Apache Cassandra",cassandra database java
joyoyao/superCleanMaster,master,1898,884,2015-02-12T03:37:41Z,12302,18,"[DEPRECATED] ",
manifold-systems/manifold,master,2209,120,2017-06-07T02:37:23Z,126336,64,"Manifold is a Java compiler plugin, its features include Metaprogramming, Properties, Extension Methods, Operator Overloading, Templates, a Preprocessor, and more.",android-studio delegation duck-typing extension-methods graphql graphql-java intellij java java-development java-sql java-tooling js-java-interoperability json manifold metaprogramming preprocessor reflection-framework structural-typing template-engine type-providers
datageartech/datagear,master,1322,316,2020-02-22T04:06:51Z,87397,2,"数据可视化分析平台,自由制作任何您想要的数据看板",bi business-intelligence chart data-analysis data-analytics data-visualization echarts
CodingDocs/springboot-guide,master,5063,1390,2018-11-28T01:05:07Z,5354,16,"SpringBoot2.0+从入门到实战!",asynchronous dubbo mybatis rabbitmq spring-data-jpa springboot
hanks-zyh/SmallBang,master,1005,158,2015-12-24T14:48:37Z,6379,6," twitter like animation for any view :heartbeat:",animation heartbeat like-button twitter
...
this data in Markdown format looks like this:
full_name | default_branch | stars | forks | created_at | size | open_issues_count | description | topics |
---|---|---|---|---|---|---|---|---|
apache/kafka | trunk | 27266 | 13448 | 2011-08-15T18:06:16Z | 182971 | 1085 | Mirror of Apache Kafka | kafka scala |
apache/flink | master | 23128 | 12938 | 2014-06-07T07:00:10Z | 489079 | 1169 | Apache Flink | big-data flink java python scala sql |
apache/cassandra | trunk | 8506 | 3537 | 2009-05-21T02:10:09Z | 427867 | 474 | Mirror of Apache Cassandra | cassandra database java |
joyoyao/superCleanMaster | master | 1898 | 884 | 2015-02-12T03:37:41Z | 12302 | 18 | [DEPRECATED] | |
manifold-systems/manifold | master | 2209 | 120 | 2017-06-07T02:37:23Z | 126336 | 64 | Manifold is a Java compiler plugin, its features include Metaprogramming, Properties, Extension Methods, Operator Overloading, Templates, a Preprocessor, and more. | android-studio delegation duck-typing extension-methods graphql graphql-java intellij java java-development java-sql java-tooling js-java-interoperability json manifold metaprogramming preprocessor reflection-framework structural-typing template-engine type-providers |
datageartech/datagear | master | 1322 | 316 | 2020-02-22T04:06:51Z | 87397 | 2 | 数据可视化分析平台,自由制作任何您想要的数据看板 | bi business-intelligence chart data-analysis data-analytics data-visualization echarts |
CodingDocs/springboot-guide | master | 5063 | 1390 | 2018-11-28T01:05:07Z | 5354 | 16 | SpringBoot2.0+从入门到实战! | asynchronous dubbo mybatis rabbitmq spring-data-jpa springboot |
hanks-zyh/SmallBang | master | 1005 | 158 | 2015-12-24T14:48:37Z | 6379 | 6 | twitter like animation for any view :heartbeat: | animation heartbeat like-button twitter |
For each repo (identified by full_name
column) in the dataset we fetch it's
README.md
file from GitHub. Then we copy all existing columns and add
new readme
column. Then we extract full_name
, description
,
topics
, and readme
columns' values from dataset and prepare this data
for further analysis.
TBD..
How to contribute
Fork repository, make changes, send us a pull request.
We will review your changes and apply them to the master
branch shortly,
provided they don't violate our quality standards. To avoid frustration,
before sending us your pull request please run full build:
make install test check
To set up virtual environment use this set of commands:
python3 -m venv venv
source $(pwd)/venv/bin/activate
You will need Python 3.9+ installed.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for samples_filter-0.1.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fc5a29f10d46484491aaee955e3db0360b18a4d2a8c0cbe40d9d22b07c16389c |
|
MD5 | d621711c9d5167d4e60790945d3886a5 |
|
BLAKE2b-256 | 1d29a789f4d9974fec447e1503b05f8e3be224606a388c96084c62af50b14261 |