A Python library for easy calculation of tree edit distances with visualization capabilities.
Project description
EasyTED Library
The EasyTED Library offers a straightforward approach for calculating the syntactic tree edit distance (TED) between two sentences. Utilizing advanced Natural Language Processing (NLP) techniques, EasyTED parses sentences to their constituency trees, facilitating in-depth linguistic analyses with minimal setup. Beyond calculating distances, it features tree visualization and transformation tools, making it an indispensable resource for linguistics research and NLP applications.
Features
- Tree Edit Distance Calculation: Compute the TED between any two sentences.
- Constituency Tree Parsing: Transform sentences into their underlying constituency tree structures.
- Tree Visualization: Generate and save visual representations of constituency trees.
- Bracketed String Transformation: Convert constituency trees into a bracketed string format for easy comparison and analysis.
- Simple Integration: Designed to seamlessly integrate with broader NLP and linguistic analysis workflows.
Installation
Install EasyTED directly from the Python Package Index (PyPI) using pip:
pip install easyted
Usage
Calculating Tree Edit Distance
from easyted import TreeEditDistanceCalculator
# Initialize the calculator
calculator = TreeEditDistanceCalculator()
# Calculate the tree edit distance between two sentences
distance = calculator.calculate_ted("This is a test.", "This is only a test.")
print(f"Tree Edit Distance: {distance}")
Visualizing Constituency Trees
# Draw and save the constituency tree to a file
calculator.draw_and_save_tree("A visualization of a constituency tree.", "tree_visualization.ps")
Requirements
- Python 3.6+
- NLTK
- stanza
- APTED
Contributing
We welcome contributions to the EasyTED Library! If you have suggestions for improvements or wish to contribute new features, please feel free to open an issue or submit a pull request. Ensure your contributions adhere to the coding standards set forth by the project.
License
EasyTED is licensed under the MIT License. See the LICENSE file in the project repository for more details.
Acknowledgments
Thanks to NLTK for providing the foundational tools for working with natural language data. Appreciation to the Stanford NLP Group for the development of the stanza library, which powers the linguistic analysis capabilities of EasyTED. Gratitude to the developers of the APTED algorithm for their work on efficient tree edit distance computation.
Citations
If you use EasyTED in your research, please consider citing the following:
@inproceedings{qi2020stanza,
title={Stanza: A {Python} Natural Language Processing Toolkit for Many Human Languages},
author={Qi, Peng and Zhang, Yuhao and Zhang, Yuhui and Bolton, Jason and Manning, Christopher D.},
booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations",
year={2020}
}
@article{pawlik2016tree,
title={Tree edit distance: Robust and memory- efficient},
author={Pawlik, Mateusz and Augsten, Nikolaus},
journal={Information Systems},
volume={56},
year={2016}
}
@article{pawlik2015efficient,
title={Efficient Computation of the Tree Edit Distance},
author={Pawlik, Mateusz and Augsten, Nikolaus},
journal={ACM Transactions on Database Systems (TODS)},
volume={40},
number={1},
year={2015}
}
@article{pawlik2011rted,
title={RTED: A Robust Algorithm for the Tree Edit Distance},
author={Pawlik, Mateusz and Augsten, Nikolaus},
journal={PVLDB},
volume={5},
number={4},
year={2011}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file easyted-0.0.1.tar.gz.
File metadata
- Download URL: easyted-0.0.1.tar.gz
- Upload date:
- Size: 5.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
61921790d3a00af069087ec627703d81deb5927b60e1f530ee88c0b470a63298
|
|
| MD5 |
aee5460b2582db8bb37a52d004780457
|
|
| BLAKE2b-256 |
95505913c910264d77f8cdeedebefe66fcdf736a2c460d465ee9c2824722e2e3
|
File details
Details for the file easyted-0.0.1-py3-none-any.whl.
File metadata
- Download URL: easyted-0.0.1-py3-none-any.whl
- Upload date:
- Size: 5.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c58d7570d2201e234eb88d922f7153f0f092bc6d478385124e9349fe9a690ada
|
|
| MD5 |
befa60a3e9b02ef7a4d688d507191624
|
|
| BLAKE2b-256 |
0c34b03189f440545854c710f403d00f2c0471c51dc5352a6ed06d70c5cbb466
|