Skip to main content

Command-line filter for GitHub repositories that contain "samples", instead of real project or framework or library

Project description

samples-filter

EO principles respected here DevOps By Rultor.com We recommend IntelliJ IDEA

py PyPI - Version PDD status Hits-of-Code License Known Vulnerabilities

Samples-filter is a command-line filter for GitHub repositories that contain samples, instead of real project or framework or library. E.g. leeowenowen/rxjava-examples, streaming-with-flink/examples-java, redisson/redisson-examples.

Motivation. During the work on cam project, where we're building datasets with open source Java programs, we discovered the need for filtering repositories that contain not a real code, but rather samples, tutorials or examples. This repository is portable command-line tool that filters those sample repositories.

How to use

First, install it from PyPI like that:

pip install samples-filter

then, execute:

samples-filter filter --repositories=repos.csv --out=filtered.csv

For --repositories you should provide a name of existing csv dataset with GitHub repositories, and name for the output file in --out (it will be created automatically).

Filtering method

We take the input in the format of repositories.csv:

full_name,default_branch,stars,forks,created_at,size,open_issues_count,description,topics
apache/kafka,trunk,27266,13448,2011-08-15T18:06:16Z,182971,1085,"Mirror of Apache Kafka",kafka scala
apache/flink,master,23128,12938,2014-06-07T07:00:10Z,489079,1169,"Apache Flink",big-data flink java python scala sql
apache/cassandra,trunk,8506,3537,2009-05-21T02:10:09Z,427867,474,"Mirror of Apache Cassandra",cassandra database java
joyoyao/superCleanMaster,master,1898,884,2015-02-12T03:37:41Z,12302,18,"[DEPRECATED] ",
manifold-systems/manifold,master,2209,120,2017-06-07T02:37:23Z,126336,64,"Manifold is a Java compiler plugin, its features include Metaprogramming, Properties, Extension Methods, Operator Overloading, Templates, a Preprocessor, and more.",android-studio delegation duck-typing extension-methods graphql graphql-java intellij java java-development java-sql java-tooling js-java-interoperability json manifold metaprogramming preprocessor reflection-framework structural-typing template-engine type-providers
datageartech/datagear,master,1322,316,2020-02-22T04:06:51Z,87397,2,"数据可视化分析平台,自由制作任何您想要的数据看板",bi business-intelligence chart data-analysis data-analytics data-visualization echarts
CodingDocs/springboot-guide,master,5063,1390,2018-11-28T01:05:07Z,5354,16,"SpringBoot2.0+从入门到实战!",asynchronous dubbo mybatis rabbitmq spring-data-jpa springboot
hanks-zyh/SmallBang,master,1005,158,2015-12-24T14:48:37Z,6379,6,"  twitter like animation for any view :heartbeat:",animation heartbeat like-button twitter
...

this data in Markdown format looks like this:

full_name default_branch stars forks created_at size open_issues_count description topics
apache/kafka trunk 27266 13448 2011-08-15T18:06:16Z 182971 1085 Mirror of Apache Kafka kafka scala
apache/flink master 23128 12938 2014-06-07T07:00:10Z 489079 1169 Apache Flink big-data flink java python scala sql
apache/cassandra trunk 8506 3537 2009-05-21T02:10:09Z 427867 474 Mirror of Apache Cassandra cassandra database java
joyoyao/superCleanMaster master 1898 884 2015-02-12T03:37:41Z 12302 18 [DEPRECATED]
manifold-systems/manifold master 2209 120 2017-06-07T02:37:23Z 126336 64 Manifold is a Java compiler plugin, its features include Metaprogramming, Properties, Extension Methods, Operator Overloading, Templates, a Preprocessor, and more. android-studio delegation duck-typing extension-methods graphql graphql-java intellij java java-development java-sql java-tooling js-java-interoperability json manifold metaprogramming preprocessor reflection-framework structural-typing template-engine type-providers
datageartech/datagear master 1322 316 2020-02-22T04:06:51Z 87397 2 数据可视化分析平台,自由制作任何您想要的数据看板 bi business-intelligence chart data-analysis data-analytics data-visualization echarts
CodingDocs/springboot-guide master 5063 1390 2018-11-28T01:05:07Z 5354 16 SpringBoot2.0+从入门到实战! asynchronous dubbo mybatis rabbitmq spring-data-jpa springboot
hanks-zyh/SmallBang master 1005 158 2015-12-24T14:48:37Z 6379 6 twitter like animation for any view :heartbeat: animation heartbeat like-button twitter

For each repo (identified by full_name column) in the dataset we fetch it's README.md file from GitHub. Then we copy all existing columns and add new readme column. Then we extract full_name, description, topics, and readme columns' values from dataset and prepare this data for further analysis.

TBD..

How to contribute

Fork repository, make changes, send us a pull request. We will review your changes and apply them to the master branch shortly, provided they don't violate our quality standards. To avoid frustration, before sending us your pull request please run full build:

make install test check

To set up virtual environment use this set of commands:

python3 -m venv venv
source $(pwd)/venv/bin/activate

You will need Python 3.9+ installed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

samples-filter-0.1.5.tar.gz (13.1 kB view hashes)

Uploaded Source

Built Distribution

samples_filter-0.1.5-py3-none-any.whl (22.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page