Skip to main content

An intelligent license scanner.

Project description

Atarashi

Build Status

Open source software is licensed using open source licenses. There are many of open source licenses around and adding to that, open source software packages involve sometimes multiple licenses for different files.

Atarashi provides different methods for scanning for license statements in open source software. Unlike existing rule-based approaches - such as the Nomos license scanner from the FOSSology project - atarashi implements multiple text statistics and information retrieval algorithms.

Anticipated advantages is an improved precision while offering an as easy as possible approach to add new license texts or new license references.

Atarashi is designed to work stand-alone and with FOSSology. More info at http://fossology.github.io/atarashi

Requirements

  • Python >= v3.5
  • pip >= 18.1

Steps for Installation

Install

Install from PyPi

  • pip install atarashi

Source install

  • pip install .
  • It will download all dependencies required and trigger build as well.
  • Build will generate 3 new files in your current directory
    1. data/Ngram_keywords.json
    2. licenses/<SPDX-version>.csv
    3. licenses/processedList.csv
  • These files will be placed to their appropriate places by the install script.

Installing just dependencies

  • pip install -r requirements.txt

Build (optional)

  • $ python3 setup.py build

How to run

Get the help by running atarashi -h or atarashi --help

Example

  • Running DLD agent

    atarashi -a DLD /path/to/file.c

  • Running wordFrequencySimilarity agent

    atarashi -a wordFrequencySimilarity /path/to/file.c

  • Running tfidf agent

    • With Cosine similarity

      atarashi -a tfidf /path/to/file.c

      atarashi -a tfidf -s CosineSim /path/to/file.c

    • With Score similarity

      atarashi -a tfidf -s ScoreSim /path/to/file.c

  • Running Ngram agent

    • With Cosine similarity

      atarashi -a Ngram /path/to/file.c

      atarashi -a Ngram -s CosineSim /path/to/file.c

    • With Dice similarity

      atarashi -a Ngram -s DiceSim /path/to/file.c

    • With Bigram Cosine similarity

      atarashi -a Ngram -s BigramCosineSim /path/to/file.c

  • Running in verbose mode

    atarashi -a DLD -v /path/to/file.c

  • Running with custom CSVs and JSONs

    • Please reffer to the build instructions to get the CSV and JSON understandable by atarashi.
    • atarashi -a DLD -l /path/to/processedList.csv /path/to/file.c
    • atarashi -a Ngram -l /path/to/processedList.csv -j /path/to/ngram.json /path/to/file.c

Running Docker image

  1. Pull Docker image

    docker pull fossology/atarashi:latest

  2. Run the image

    docker run --rm -v <path/to/scan>:/project fossology/atarashi:latest <options> /project/<path/to/file>

Since docker can not access host fs directly, we mount a volume from the directory containing the files to scan to /project in the container. Simply pass the options and path to the file relative to the mounted path.

Test

  • Run imtihaan (meaning Exam in Hindi) with the name of the Agent.
  • eg. python atarashi/imtihaan.py /path/to/processedList.csv <DLD|tfidf|Ngram> <testfile>
  • See python atarashi/imtihaan.py --help for more

Creating Debian packages

  • Install dependencies
# apt-get install python3-setuptools python3-all debhelper
# pip install stdeb
  • Create Debian packages
$ python3 setup.py --command-packages=stdeb.command bdist_deb
  • Locate the files under deb_dist

License

SPDX-License-Identifier: GPL-2.0

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License version 2 as published by the Free Software Foundation.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.

How to generate the documentation using sphinx

  1. Go to project directory 'atarashi'.

  2. Install Sphinx and m2r pip install sphinx m2r (Since this project is based on python so pip is already installed).

  3. Initialise docs/ directory with sphinx-quickstart

    mkdir docs
    cd docs/
    sphinx-quickstart
    
    • Root path for the documentation [.]: .
    • Separate source and build directories (y/n) [n]: n
    • autodoc: automatically insert docstrings from modules (y/n) [n]: y
    • intersphinx: link between Sphinx documentation of different projects (y/n) [n]: y
    • Else use the default option
  4. Setup the conf.py and include README.md

    • Enable the following lines and change the insert path:

      import os
      import sys
      sys.path.insert(0, os.path.abspath('../'))
      
    • Enable m2r to insert .md files in Sphinx documentation:

      [...]
      extensions = [
        ...
        'm2r',
      ]
      [...]
      source_suffix = ['.rst', '.md']
      
    • Include README.md by editing index.rst

      .. toctree::
          [...]
          readme
      
      .. mdinclude:: ../README.md
      
  5. Auto-generate the .rst files in docs/source which will be used to generate documentation

    cd docs/
    sphinx-apidoc -o source/ ../atarashi
    
  6. cd docs

  7. make html

This will generate file in docs/_build/html. Go to: index.html

You can change the theme of the documentation by changing html_theme in config.py file in docs/ folder. You can choose from {'alabaster', 'classic', 'sphinxdoc', 'scrolls', 'agogo', 'traditional', 'nature', 'haiku', 'pyramid', 'bizstyle'} Reference

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

atarashi-0.0.11.tar.gz (24.0 kB view details)

Uploaded Source

Built Distribution

atarashi-0.0.11-py3-none-any.whl (11.6 MB view details)

Uploaded Python 3

File details

Details for the file atarashi-0.0.11.tar.gz.

File metadata

  • Download URL: atarashi-0.0.11.tar.gz
  • Upload date:
  • Size: 24.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.6

File hashes

Hashes for atarashi-0.0.11.tar.gz
Algorithm Hash digest
SHA256 e7e2b4c3640a84c2f46f1fdc832498641b1dc810d0a3afa93ee79a4d2a601836
MD5 8e352e5141f830239b28ef2d993d4021
BLAKE2b-256 1a371ef21d5cb3ee349c91df70aabba81a1fadc90ae353f190e059fce3cbcde9

See more details on using hashes here.

File details

Details for the file atarashi-0.0.11-py3-none-any.whl.

File metadata

  • Download URL: atarashi-0.0.11-py3-none-any.whl
  • Upload date:
  • Size: 11.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.6

File hashes

Hashes for atarashi-0.0.11-py3-none-any.whl
Algorithm Hash digest
SHA256 9d39699a0fe745d0dcf53cd15ec9183199dc3b4479138df618c52551c789220a
MD5 91b8f640853a6a393ac69e9b10b259fd
BLAKE2b-256 82405c4d6453daea1acdc6812603bfde695699542291ab7a5b2ae7acd317f228

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page