Skip to main content

This repository contains Python functions for extracting gene features from a GFF3 file and performing keyword analysis using TF-IDF (Term Frequency-Inverse Document Frequency).

Project description

Gene Feature Extraction and Keyword Analysis

This repository contains Python functions for extracting gene features from a GFF3 file and performing keyword analysis using TF-IDF (Term Frequency-Inverse Document Frequency).

Functions


Extraction.py:

This file contains methods required to extract information from the genome and descriptions of organisms.


extract_keywords(text: str) -> set

Returns important keywords from a given text using TF-IDF. The function tokenizes the text, removes stopwords, and computes the TF-IDF scores. It then selects the top N words as keywords.

Args:

  • text (str): The input text from which keywords are drawn.

Returns:

  • extraction (set): A set of important keywords.

extract_gene_features(gff_file: str, output_filename: str) -> None

Extracts gene features from a GFF3 file and writes them to a new file.

Args:

  • gff_file (str): Path to the GFF3 file.
  • output_filename (str): Path to the output file.

extract_gene_info(gff_file: str, output_filename: str) -> None

Extracts gene IDs and descriptions from a GFF3 file and writes them to a new file.

Args:

  • gff_file (str): Path to the GFF3 file.
  • output_filename (str): Path to the output file.


Read.py

This file contains a method to read descriptions of genes and return ones that match to generated keywords.


read_descriptions(fp: str, kwords: set) -> set

Finds matching gene IDs and descriptions based on provided keywords.

Args:

  • fp (str): The filepath to the gene descriptions text file.
  • kwords (set): A set of keywords to search for.

Returns:

  • matching_descriptions (set): A set of tuples containing matching gene IDs, descriptions, and keywords.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pygptk-0.0.1.tar.gz (8.2 kB view details)

Uploaded Source

Built Distribution

PyGPTK-0.0.1-py3-none-any.whl (11.6 kB view details)

Uploaded Python 3

File details

Details for the file pygptk-0.0.1.tar.gz.

File metadata

  • Download URL: pygptk-0.0.1.tar.gz
  • Upload date:
  • Size: 8.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.4

File hashes

Hashes for pygptk-0.0.1.tar.gz
Algorithm Hash digest
SHA256 cea8883f53562f45794239a828dc5e649cc1ebc42e6d026cf375bd594c8d8cb0
MD5 d3302170789180183827fca77af2a4df
BLAKE2b-256 2c179cbeae43ebb8d3f6ecf983a36313a6419f1395461dc3aff5a759921b9a1b

See more details on using hashes here.

File details

Details for the file PyGPTK-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: PyGPTK-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 11.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.4

File hashes

Hashes for PyGPTK-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f3d5b402020bf8bac8384c82d76785cc37e644ba6f32a3bc3c950df9eeda4104
MD5 df3b7a6517f868d8ad51c0d0af81856a
BLAKE2b-256 4df64a62846c6f1a2da91a7a1fd0ff81e91e7f5b820abc02d9667f93818a7af2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page