Text Similarity Index Processor

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

Project description

Text Similarity Index processor

What is the project intented to solve?

Resolving the Technical Debt in "Test/Requirement/Issues/Any-text" repos with unique id using Natural Language Processing Continuous de-duplicate monitoring system in place to check the duplication of any new text added to "Test/Requirement/Issues/Any-text" bank. Grouping of similar "Test/Requirement/Issues/Any-text" helps in reduction of "Test/Requirement/Issues/Any-text" yet quality quotient remain same.
Cycle time of test execution comes down as similar tests are identified for merging. Repeated requirement can be reduced Issues list can be merged/reduced

Technology stack

Python with few python packages mentioned in the INSTALL.md

Status

This is a development release. There are known Issues/improvements & Limitations which will be taken up in the subsequent releases. Tool is open for the community to make changes for enhancement, bug fix etc.

Dependencies

Python 3.7.3 (64bit)

[packages]

pip, mutmut, pytest, xlrd, xlsxwriter, pandas, codecov, pytest-cov, pylint

Installation

INSTALL.md

Usage & Configuration

How to use the tool:

From any editor which support Python (pref: pycharm, set similarity_processor and text-de-duplication_monitoring as root by right clicking and selecting option)

Make sure to set the right python interpreter and make sure it lists all the packages mentioned as mandate.

Option 1: UI

Execute the similarity_ui.py, which will open the UI window where you need to enter the options like,

Path to the test/requirement/other other document to be analyzed.
Similarity to be processed (find out 100% match, 99% etc...)
Unique ID in the csv/xlsx column ID(0/1 etc...)
Steps/Description id for content matching (column of interest IDs in the csv/xlsx seperated by , like 1,2,3)
If new requirement / test to me checked with existing, enable the check box and paste the content to be checked in the new text box.

Option 2: commandline

C:\Projects\PythonRepo\text-de-duplication>python similarity_processor\similarity_cmd.py --h usage: similarity_cmd.py [-h] [--path --p] [--simindex --s] [--uniqid --u] [--colint --c]

Text Similarity Index Processor

optional arguments: -h, --help show this help message and exit --path --p the Input file path --simindex --s the Similarity index to be processed --uniqid --u uniq id index(column) of the input file --colint --c the col of interest

How to test the software

To test the tool use : navigate to "text_de_duplication_monitoring" which is the root directory
issue pytest -v to run all the tests

To report the pytest in html: issue command pytest --html=report.html
To run test for coverage: pytest --cov-report html --cov="similarity_processor"
pydoc creation python -m pydoc -w module_name
mutation testing using mutmut mutmut --paths-to-mutate "path_to \ similarity_processor" run
pylint execution on code pylint similarity_processor test >"path_to_save_file\pylint.txt"
jscpd execution on root folder jscpd --min-tokens 20 --reporters "html" --mode "strict" --format "python" --output . .

Limitations

Input is accepted only via xlsx
Stand alone application not web enabled
Users have to fetch the input to csv/xlsx
Tool is not yet plugged to TFS, ALM etc

Improvements/ Road-map

Increase the test efficiency based on mutation testing output.
Make the tool web enabled (using python flask...).
Create hook to TFS, ALM etc so that this tool we can download the test/ requirement/ defects and do further processing.
Enable the tool to do similarity check on code base.

Contact / Getting help

MAINTAINERS.md

License

License.md

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

Release history Release notifications | RSS feed

0.1.5

Sep 29, 2022

0.1.4

Jul 15, 2022

0.1.3

Jul 8, 2022

0.1.2

Jul 8, 2022

0.1.1

Sep 22, 2020

0.1.0

Aug 20, 2020

0.0.11

Aug 3, 2020

0.0.10

Aug 3, 2020

0.0.3

Mar 5, 2020

This version

0.0.1

Feb 4, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

similarity_processor-0.0.1.tar.gz (8.9 kB view hashes)

Uploaded Feb 4, 2020 Source

Built Distribution

similarity_processor-0.0.1-py3-none-any.whl (10.9 kB view hashes)

Uploaded Feb 4, 2020 Python 3

Hashes for similarity_processor-0.0.1.tar.gz

Hashes for similarity_processor-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`2b0abaf0339f852393f2ad78c6db7949d2f4bc312e1b0352f64a1b43fe3777a4`
MD5	`f19e1f994d99ee1a1f1c0a40a4549015`
BLAKE2b-256	`73b04ac24aee4690d6432ca35cf14e057c8b075de2d317111a9ea77e72cf9bec`

Hashes for similarity_processor-0.0.1-py3-none-any.whl

Hashes for similarity_processor-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2e7502aa312f981fdaa54a10484404d03997f1040da6edf3f68dc7624dfaaae8`
MD5	`2ae9116ab56be8d3694dabc76af408ad`
BLAKE2b-256	`348ac87f73b136f117e2939e8dd6737435e92de067079e6b49d55e8b1df7c0b0`