This is a short set of functions meant to help analyze cosine similarity between texts
Project description
tf_idf
This file will become your README and also the index of your documentation.
Install
pip install tf_idf
How to use
Fill me in please! Don’t forget code examples:
import tf_idf.core as tf_idf
import pandas as pd
AI = 'For instance, in the design phase of a structural engineering project, Monte Carlo simulations can help evaluate the performance of a proposed design under different loading conditions and material properties, providing valuable insights into its reliability and safety'
ME = 'For instance, Monte Carlo simulations can simulate hundreds or thousands of different combinations of loading conditions and material properties to create statistical predictions of structure stiffness'
# word_tokenize(AI.lower().split())
# preprocess_text(AI)
compare = tf_idf.preprocess_text(AI)
compare = pd.concat([compare, preprocess_text(ME)], ignore_index=True)
compare
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
DOCUMENT | LOWERCASE | CLEANING | TOKENIZATION | STOP-WORDS | STEMMING | |
---|---|---|---|---|---|---|
0 | For instance, in the design phase of a structural engineering project, Monte Carlo simulations can help evaluate the performance of a proposed design under different loading conditions and material properties, providing valuable insights into its reliability and safety | for instance, in the design phase of a structural engineering project, monte carlo simulations can help evaluate the performance of a proposed design under different loading conditions and material properties, providing valuable insights into its reliability and safety | for instance in the design phase of a structural engineering project monte carlo simulations can help evaluate the performance of a proposed design under different loading conditions and material properties providing valuable insights into its reliability and safety | [for, instance, in, the, design, phase, of, a, structural, engineering, project, monte, carlo, simulations, can, help, evaluate, the, performance, of, a, proposed, design, under, different, loading, conditions, and, material, properties, providing, valuable, insights, into, its, reliability, and, safety] | [instance, design, phase, structural, engineering, project, monte, carlo, simulations, evaluate, performance, proposed, design, different, loading, conditions, material, properties, providing, valuable, insights, reliability, safety] | [instanc, design, phase, structur, engin, project, mont, carlo, simul, evalu, perform, propos, design, differ, load, condit, materi, properti, provid, valuabl, insight, reliabl, safeti] |
1 | For instance, Monte Carlo simulations can simulate hundreds or thousands of different combinations of loading conditions and material properties to create statistical predictions of structure stiffness | for instance, monte carlo simulations can simulate hundreds or thousands of different combinations of loading conditions and material properties to create statistical predictions of structure stiffness | for instance monte carlo simulations can simulate hundreds or thousands of different combinations of loading conditions and material properties to create statistical predictions of structure stiffness | [for, instance, monte, carlo, simulations, can, simulate, hundreds, or, thousands, of, different, combinations, of, loading, conditions, and, material, properties, to, create, statistical, predictions, of, structure, stiffness] | [instance, monte, carlo, simulations, simulate, hundreds, thousands, different, combinations, loading, conditions, material, properties, create, statistical, predictions, structure, stiffness] | [instanc, mont, carlo, simul, simul, hundr, thousand, differ, combin, load, condit, materi, properti, creat, statist, predict, structur, stiff] |
compare_tfidf = calculate_tfidf(compare)
compare_tfidf
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
DOCUMENT | LOWERCASE | CLEANING | TOKENIZATION | STOP-WORDS | STEMMING | carlo | combin | condit | creat | ... | propos | provid | reliabl | safeti | simul | statist | stiff | structur | thousand | valuabl | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | For instance, in the design phase of a structural engineering project, Monte Carlo simulations can help evaluate the performance of a proposed design under different loading conditions and material properties, providing valuable insights into its reliability and safety | for instance, in the design phase of a structural engineering project, monte carlo simulations can help evaluate the performance of a proposed design under different loading conditions and material properties, providing valuable insights into its reliability and safety | for instance in the design phase of a structural engineering project monte carlo simulations can help evaluate the performance of a proposed design under different loading conditions and material properties providing valuable insights into its reliability and safety | [for, instance, in, the, design, phase, of, a, structural, engineering, project, monte, carlo, simulations, can, help, evaluate, the, performance, of, a, proposed, design, under, different, loading, conditions, and, material, properties, providing, valuable, insights, into, its, reliability, and, safety] | [instance, design, phase, structural, engineering, project, monte, carlo, simulations, evaluate, performance, proposed, design, different, loading, conditions, material, properties, providing, valuable, insights, reliability, safety] | [instanc, design, phase, structur, engin, project, mont, carlo, simul, evalu, perform, propos, design, differ, load, condit, materi, properti, provid, valuabl, insight, reliabl, safeti] | 0.158850 | 0.000000 | 0.158850 | 0.000000 | ... | 0.223259 | 0.223259 | 0.223259 | 0.223259 | 0.158850 | 0.000000 | 0.000000 | 0.158850 | 0.000000 | 0.223259 |
1 | For instance, Monte Carlo simulations can simulate hundreds or thousands of different combinations of loading conditions and material properties to create statistical predictions of structure stiffness | for instance, monte carlo simulations can simulate hundreds or thousands of different combinations of loading conditions and material properties to create statistical predictions of structure stiffness | for instance monte carlo simulations can simulate hundreds or thousands of different combinations of loading conditions and material properties to create statistical predictions of structure stiffness | [for, instance, monte, carlo, simulations, can, simulate, hundreds, or, thousands, of, different, combinations, of, loading, conditions, and, material, properties, to, create, statistical, predictions, of, structure, stiffness] | [instance, monte, carlo, simulations, simulate, hundreds, thousands, different, combinations, loading, conditions, material, properties, create, statistical, predictions, structure, stiffness] | [instanc, mont, carlo, simul, simul, hundr, thousand, differ, combin, load, condit, materi, properti, creat, statist, predict, structur, stiff] | 0.193068 | 0.271351 | 0.193068 | 0.271351 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.386137 | 0.271351 | 0.271351 | 0.193068 | 0.271351 | 0.000000 |
2 rows × 35 columns
tf_idf.cosineSimilarity(compare)
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
DOCUMENT | STEMMING | COSIM | |
---|---|---|---|
0 | For instance, in the design phase of a structural engineering project, Monte Carlo simulations can help evaluate the performance of a proposed design under different loading conditions and material properties, providing valuable insights into its reliability and safety | [instanc, design, phase, structur, engin, project, mont, carlo, simul, evalu, perform, propos, design, differ, load, condit, materi, properti, provid, valuabl, insight, reliabl, safeti] | 1.000000 |
1 | For instance, Monte Carlo simulations can simulate hundreds or thousands of different combinations of loading conditions and material properties to create statistical predictions of structure stiffness | [instanc, mont, carlo, simul, simul, hundr, thousand, differ, combin, load, condit, materi, properti, creat, statist, predict, structur, stiff] | 0.337359 |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
tf_idf_cosimm-0.0.2.tar.gz
(10.4 kB
view details)
Built Distribution
File details
Details for the file tf_idf_cosimm-0.0.2.tar.gz
.
File metadata
- Download URL: tf_idf_cosimm-0.0.2.tar.gz
- Upload date:
- Size: 10.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a3e9a38c4cd53e5720bca687215abdc273a71d5a39f7e59ae659f9abc4e69c96 |
|
MD5 | a01ba3d2cea0953d7717ed50a3c0a26e |
|
BLAKE2b-256 | dedaf1897e332602ef43985bac7c301285a94e371c7ccdfd84935835037cc94b |
File details
Details for the file tf_idf_cosimm-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: tf_idf_cosimm-0.0.2-py3-none-any.whl
- Upload date:
- Size: 9.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | daac75b3065830310aa19fb844109e622f138a809e2d7d958545ce4a2e8cd667 |
|
MD5 | 055f30b238d0e168dc78460686e06c91 |
|
BLAKE2b-256 | c6a0b83d5cd1985bc46ccab65f313708019a370deed2deee39de7e017f06cefd |