Text Similarity Index Processor
Project description
Text Similarity Index processor
What is the project intented to solve?
Resolving the Technical Debt in "Test/Requirement/Issues/Any-text" repos with unique id using Natural Language Processing Continuous
de-duplicate monitoring system in place to check the duplication of any new text added to "Test/Requirement/Issues/Any-text" bank.
Grouping of similar "Test/Requirement/Issues/Any-text" helps in reduction of "Test/Requirement/Issues/Any-text" yet quality quotient remain same.
Cycle time of test execution comes down as similar tests are identified for merging.
Repeated requirement can be reduced Issues list can be merged/reduced
Technology stack
Python with few python packages mentioned in the INSTALL.md
Status
This is a development release. There are known Issues/improvements & Limitations which will be taken up in the subsequent releases. Tool is open for the community to make changes for enhancement, bug fix etc.
Dependencies
Python 3.7.3 (64bit)
[packages]
pip, mutmut, pytest, xlrd, xlsxwriter, pandas, codecov, pytest-cov, pylint
Installation
Usage & Configuration
How to use the tool:
From any editor which support Python (pref: pycharm, set similarity_processor and text-de-duplication_monitoring as root by right clicking and selecting option)
Make sure to set the right python interpreter and make sure it lists all the packages mentioned as mandate.
Option 1: UI
Execute the similarity_ui.py, which will open the UI window where you need to enter the options like,
- Path to the test/requirement/other other document to be analyzed.
- Similarity to be processed (find out 100% match, 99% etc...)
- Unique ID in the csv/xlsx column ID(0/1 etc...)
- Steps/Description id for content matching (column of interest IDs in the csv/xlsx seperated by , like 1,2,3)
- If new requirement / test to me checked with existing, enable the check box and paste the content to be checked in the new text box.
Option 2: commandline
C:\Projects\PythonRepo\text-de-duplication>python similarity_processor\similarity_cmd.py --h usage: similarity_cmd.py [-h] [--path --p] [--simindex --s] [--uniqid --u] [--colint --c]
Text Similarity Index Processor
optional arguments: -h, --help show this help message and exit --path --p the Input file path --simindex --s the Similarity index to be processed --uniqid --u uniq id index(column) of the input file --colint --c the col of interest
How to test the software
- To test the tool use : navigate to "text_de_duplication_monitoring" which is the root directory
- issue pytest -v to run all the tests
-
To report the pytest in html: issue command pytest --html=report.html
-
To run test for coverage: pytest --cov-report html --cov="similarity_processor"
-
pydoc creation python -m pydoc -w module_name
-
mutation testing using mutmut mutmut --paths-to-mutate "path_to \ similarity_processor" run
-
pylint execution on code pylint similarity_processor test >"path_to_save_file\pylint.txt"
-
jscpd execution on root folder jscpd --min-tokens 20 --reporters "html" --mode "strict" --format "python" --output . .
Limitations
- Input is accepted only via xlsx
- Stand alone application not web enabled
- Users have to fetch the input to csv/xlsx
- Tool is not yet plugged to TFS, ALM etc
Improvements/ Road-map
- Increase the test efficiency based on mutation testing output.
- Make the tool web enabled (using python flask...).
- Create hook to TFS, ALM etc so that this tool we can download the test/ requirement/ defects and do further processing.
- Enable the tool to do similarity check on code base.
Contact / Getting help
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for similarity_processor-0.0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2b0abaf0339f852393f2ad78c6db7949d2f4bc312e1b0352f64a1b43fe3777a4 |
|
MD5 | f19e1f994d99ee1a1f1c0a40a4549015 |
|
BLAKE2b-256 | 73b04ac24aee4690d6432ca35cf14e057c8b075de2d317111a9ea77e72cf9bec |
Hashes for similarity_processor-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2e7502aa312f981fdaa54a10484404d03997f1040da6edf3f68dc7624dfaaae8 |
|
MD5 | 2ae9116ab56be8d3694dabc76af408ad |
|
BLAKE2b-256 | 348ac87f73b136f117e2939e8dd6737435e92de067079e6b49d55e8b1df7c0b0 |