This repository contains Python functions for extracting gene features from a GFF3 file and performing keyword analysis using TF-IDF (Term Frequency-Inverse Document Frequency).
Project description
Gene Feature Extraction and Keyword Analysis
This repository contains Python functions for extracting gene features from a GFF3 file and performing keyword analysis using TF-IDF (Term Frequency-Inverse Document Frequency).
Functions
Extraction.py:
This file contains methods required to extract information from the genome and descriptions of organisms.
extract_keywords(text: str) -> set
Returns important keywords from a given text using TF-IDF. The function tokenizes the text, removes stopwords, and computes the TF-IDF scores. It then selects the top N words as keywords.
Args:
text
(str): The input text from which keywords are drawn.
Returns:
extraction
(set): A set of important keywords.
extract_gene_features(gff_file: str, output_filename: str) -> None
Extracts gene features from a GFF3 file and writes them to a new file.
Args:
gff_file
(str): Path to the GFF3 file.output_filename
(str): Path to the output file.
extract_gene_info(gff_file: str, output_filename: str) -> None
Extracts gene IDs and descriptions from a GFF3 file and writes them to a new file.
Args:
gff_file
(str): Path to the GFF3 file.output_filename
(str): Path to the output file.
Read.py
This file contains a method to read descriptions of genes and return ones that match to generated keywords.
read_descriptions(fp: str, kwords: set) -> set
Finds matching gene IDs and descriptions based on provided keywords.
Args:
fp
(str): The filepath to the gene descriptions text file.kwords
(set): A set of keywords to search for.
Returns:
matching_descriptions
(set): A set of tuples containing matching gene IDs, descriptions, and keywords.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pygptk-0.0.1.tar.gz
.
File metadata
- Download URL: pygptk-0.0.1.tar.gz
- Upload date:
- Size: 8.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cea8883f53562f45794239a828dc5e649cc1ebc42e6d026cf375bd594c8d8cb0 |
|
MD5 | d3302170789180183827fca77af2a4df |
|
BLAKE2b-256 | 2c179cbeae43ebb8d3f6ecf983a36313a6419f1395461dc3aff5a759921b9a1b |
File details
Details for the file PyGPTK-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: PyGPTK-0.0.1-py3-none-any.whl
- Upload date:
- Size: 11.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f3d5b402020bf8bac8384c82d76785cc37e644ba6f32a3bc3c950df9eeda4104 |
|
MD5 | df3b7a6517f868d8ad51c0d0af81856a |
|
BLAKE2b-256 | 4df64a62846c6f1a2da91a7a1fd0ff81e91e7f5b820abc02d9667f93818a7af2 |