Skip to main content

EasyStructure

Project description

EasyStructure

Introduction

EasyStructure is a cleant-up and repackaged version fastStructure. It runs on Python 3 and can be straightforwardly installed using pip install ezstructure. It does not depend on any non-Python libraries. Compared to the original fastStructure program, this version is about 50% slower when using the --prior=logistic option (but not with the default value of simple).

fastStructure is a fast algorithm for inferring population structure from large SNP genotype data. It is based on a variational Bayesian framework for posterior inference and was written in Python2.x.

Citation

Anil Raj, Matthew Stephens, and Jonathan K. Pritchard. fastSTRUCTURE: Variational Inference of Population Structure in Large SNP Data Sets, (Genetics) June 2014 197:573-589 [Genetics, Biorxiv]

Installation

EasyStructure can be installed easily by using the pip package installer. If you have pip installed, just run the following command to install the latest release:

pip install -U ezstructure

Using EasyStructure

EasyStructure is command-line software. The main command is ezstructure. You can view the command-line help by running the command:

ezstructure --help

EasyStructure, like its ancestor FastStructure, performs inference for the simplest, independent-loci, admixture model, with two choices of priors that can be specified using the --prior flag. Thus, unlike Structure, EasyStructure does not require the mainparams and extraparam files. The inference algorithm used by FastStructure is fundamentally different from that of Structure and requires the setting of far fewer options. All options can be passed via the flags listed above.

Main options

The key options to pass to the scripts are the input file, the output file and the number of populations. Assuming the input file is named genotypes.bed (with corresponding genotypes.fam and genotypes.bim), the output file is named genotypes_output and the number of populations you would like is 3, you can run the algorithm as follows:

ezstructure -K 3 --input=genotypes --output=genotypes_output

This generates a genotypes_output.3.log file that tracks how the algorithm proceeds, and files genotypes_output.3.meanQ and genotypes_output.3.meanP containing the posterior mean of admixture proportions and allele frequencies, respectively. The orders of samples and SNPs in the output files match those in the .fam file and .bim file, respectively. Note that input file names need not include suffixes (e.g., .bed).

Input data format

The current implementation can import data from plink bed format and the original Structure format. If the data are in plink format, ensure that bed, bim and fam files for the dataset are all present in the same path.

While the original Structure program allowed for a more flexible input format, fastStructure expects a more specific Structure-like input format. Specifically, rows in the data file correspond to samples, with two rows per sample (note that only diploids are handled by this software), and columns correspond to SNPs. The first 6 columns of the file will be ignored; these typically would include IDs, metadata, etc. This software only handles bi-allelic loci. The two alleles at each locus can be encoded as desired; however, missing data should be encoded as -9.

Running on test data

A test simulated dataset is provided in test/testdata.bed in the source repository at GitHub with genotypes sampled for 200 individuals at 500 SNP loci. The output files in test/ were generated as follows:

ezstructure -K 3 --input=test/testdata --output=testoutput_simple --full --seed=100
ezstructure -K 3 --input=test/testdata --output=testoutput_logistic --full --seed=100 --prior=logistic

Executing the code with the provided test data should generate a log file identical to the ones in test/, (except for the numbers in the Iteration_Time (secs) column) as a final check that the source code has been downloaded and compiled correctly. The algorithm scales linearly with number of samples, number of loci and value of K; the expected runtime for a new dataset can be computed from the runtime in the above log file.

Choosing model complexity

In order to choose the appropriate number of model components that explain structure in the dataset, we recommend running the algorithm for multiple choices of K. We have provided a utility tool, structure_choosek, to parse through the output of these runs and provide a reasonable range of values for the model complexity appropriate for this dataset.

Assuming the algorithm was run on the test dataset for choices of K ranging from 1 to 10, and the output flag was --output=test/testoutput_simple, you can obtain the model complexity by doing the following:

ezstructure_choosek --input=test/testoutput_simple

The output would look like:

Model complexity that maximizes marginal likelihood = 2
Model components used to explain structure in data = 4

Visualizing admixture proportions

In order to visualize the expected admixture proportions inferred by EasyStructure, we have provided a simple tool to generate Distruct plots using the mean of the variational posterior distribution over admixture proportions. The samples in the plot will be grouped according to population labels inferred by EasyStructure. However, if the user would like to group the samples according to some other categorical label (e.g., geographic location), these labels can be provided as a separate file using the flag --popfile. The order of labels in this file (one label per row) should match the order of samples in the input data files.

Assuming the algorithm was run on the test dataset for K=5, and the output flag was --output=test/testoutput_simple, you can generate a Distruct plot by doing the following:

ezdistruct -K 5 --input=test/testoutput_simple --output=test/testoutput_simple_distruct.svg

Python interface

As EasyStructure can be installed using pip, it is possible to use it as a dependency for other packages. To use EasyStructure from within Python code, use the following example:

from ezstructure.io import parse_bed, parse_str, write_output
from ezstructure.structure import run_structure

# Parse input file.
G = parse_bed("example.bed")  # Or parse_str("example.str")

# Set parameters.
K = 3
out_prefix = "example"
tol = 1e-6
prior = "simple"
cv = 0

# Run algorithm.
Q, P, other = run_structure(G, K, out_prefix, tol, prior, cv)

# Write output.
write_output(Q, P, other, K, out_prefix, full=True)

Changelog

Version 1.0.0

Initial repackaged version.

Version 1.0.1

Corrected python_requires declaration to exclude Python 3.5.

Version 1.0.2

Updated to support Cython3 and use language_level=3.

Version 1.0.3

Final version to use numpy<2. Provided for compatibility purposes.

Version 1.0.4

First version to use numpy>=2. Requires Python 3.9 or newer.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ezstructure-1.0.4.tar.gz (22.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

ezstructure-1.0.4-cp314-cp314-win_amd64.whl (244.6 kB view details)

Uploaded CPython 3.14Windows x86-64

ezstructure-1.0.4-cp314-cp314-win32.whl (219.3 kB view details)

Uploaded CPython 3.14Windows x86

ezstructure-1.0.4-cp313-cp313-win_amd64.whl (237.6 kB view details)

Uploaded CPython 3.13Windows x86-64

ezstructure-1.0.4-cp313-cp313-win32.whl (213.2 kB view details)

Uploaded CPython 3.13Windows x86

ezstructure-1.0.4-cp312-cp312-win_amd64.whl (240.2 kB view details)

Uploaded CPython 3.12Windows x86-64

ezstructure-1.0.4-cp312-cp312-win32.whl (214.6 kB view details)

Uploaded CPython 3.12Windows x86

ezstructure-1.0.4-cp311-cp311-win_amd64.whl (248.7 kB view details)

Uploaded CPython 3.11Windows x86-64

ezstructure-1.0.4-cp311-cp311-win32.whl (222.3 kB view details)

Uploaded CPython 3.11Windows x86

ezstructure-1.0.4-cp310-cp310-win_amd64.whl (249.1 kB view details)

Uploaded CPython 3.10Windows x86-64

ezstructure-1.0.4-cp310-cp310-win32.whl (223.3 kB view details)

Uploaded CPython 3.10Windows x86

ezstructure-1.0.4-cp39-cp39-win_amd64.whl (250.4 kB view details)

Uploaded CPython 3.9Windows x86-64

ezstructure-1.0.4-cp39-cp39-win32.whl (224.4 kB view details)

Uploaded CPython 3.9Windows x86

File details

Details for the file ezstructure-1.0.4.tar.gz.

File metadata

  • Download URL: ezstructure-1.0.4.tar.gz
  • Upload date:
  • Size: 22.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for ezstructure-1.0.4.tar.gz
Algorithm Hash digest
SHA256 092f5abc59a11d77d63da6e1b07bebf98095ccf002f1e05b949fea1b4270aa7c
MD5 5ed8223c9de2f87f0868a0a3a31a906f
BLAKE2b-256 54b8bb36377f96e163c922aee995d150e0b03230839cd151c72c390f7005fc66

See more details on using hashes here.

File details

Details for the file ezstructure-1.0.4-cp314-cp314-win_amd64.whl.

File metadata

File hashes

Hashes for ezstructure-1.0.4-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 db7a0ae69ae7782c46579102faf3e26cc3b80d287c661383ab9adff54c9ae3aa
MD5 084c51d59c6c6bcbee01e9218ca9a75f
BLAKE2b-256 7498f2ebd5988bae4fb7fccf427058b3eb785096f73376cdfe3a278925b58fba

See more details on using hashes here.

File details

Details for the file ezstructure-1.0.4-cp314-cp314-win32.whl.

File metadata

  • Download URL: ezstructure-1.0.4-cp314-cp314-win32.whl
  • Upload date:
  • Size: 219.3 kB
  • Tags: CPython 3.14, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for ezstructure-1.0.4-cp314-cp314-win32.whl
Algorithm Hash digest
SHA256 90aada4f1350591d5b0420d8a1f5ab811a136e2b84f8f49d51b4eb3b2ad35fa2
MD5 9df9564890f20dbdd5867f1a4200a01e
BLAKE2b-256 36f97c1be5e8115b9d6f05f3e2b110dd44ca44a9465b904b8cc3b5edee6765e1

See more details on using hashes here.

File details

Details for the file ezstructure-1.0.4-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for ezstructure-1.0.4-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 c87f77103893c18852855d211fef190973d6128f20042270ef94fa3fcc5a771c
MD5 74294595e9a14b17f08723f571f71410
BLAKE2b-256 a6c419696ecf382d91d8d5e06a8c6c50b80bebd0e4f93c4028b6ad2b8ded641b

See more details on using hashes here.

File details

Details for the file ezstructure-1.0.4-cp313-cp313-win32.whl.

File metadata

  • Download URL: ezstructure-1.0.4-cp313-cp313-win32.whl
  • Upload date:
  • Size: 213.2 kB
  • Tags: CPython 3.13, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for ezstructure-1.0.4-cp313-cp313-win32.whl
Algorithm Hash digest
SHA256 a084f62f5472f5e066bbba7ef679dae6ae739c3fc92c1d7a97ad53de181efedd
MD5 d7a8f4899915c0b9d3c8b5fc6463928e
BLAKE2b-256 508a8cb0aa98e2b21f8b81fd8d5004673d5de99a0cab983d0c51b1790a311f05

See more details on using hashes here.

File details

Details for the file ezstructure-1.0.4-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for ezstructure-1.0.4-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 e82efa99bab3d1fc668f3255b36efe85d969564fd201c5be1f8e163fed7c0c05
MD5 370ed43bf48d458b355af6dd0bf48d58
BLAKE2b-256 b61d4a8d40ae55c26bd46c3e1c0c392469af2b65b85aae69e59ec95d6ed4727b

See more details on using hashes here.

File details

Details for the file ezstructure-1.0.4-cp312-cp312-win32.whl.

File metadata

  • Download URL: ezstructure-1.0.4-cp312-cp312-win32.whl
  • Upload date:
  • Size: 214.6 kB
  • Tags: CPython 3.12, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for ezstructure-1.0.4-cp312-cp312-win32.whl
Algorithm Hash digest
SHA256 8f556bb0bcf93d88cc495aa3da6a463fe6f836ecedc80db235652fd1a4ae941b
MD5 4058b553b266af53acd6fd4aae7a1a45
BLAKE2b-256 8bf6a1c9dbe434c99dbc744b40e45919e9234bb993c0de9fb1175437908e1f9d

See more details on using hashes here.

File details

Details for the file ezstructure-1.0.4-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for ezstructure-1.0.4-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 815f80eeafb8dc2d52f66094ad12bb7b691b515a7fa2e054d9e7321752b5cce2
MD5 9dfd0ba60e866b61cd3896200e27c7db
BLAKE2b-256 f240b082f76c078c9452517623bec72cd19ff55181f1a620dad5b35b25056c74

See more details on using hashes here.

File details

Details for the file ezstructure-1.0.4-cp311-cp311-win32.whl.

File metadata

  • Download URL: ezstructure-1.0.4-cp311-cp311-win32.whl
  • Upload date:
  • Size: 222.3 kB
  • Tags: CPython 3.11, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for ezstructure-1.0.4-cp311-cp311-win32.whl
Algorithm Hash digest
SHA256 22728b8c28da91cd04adc9abe9f5664d19374fef68a520d4adbef4cd5b6e727e
MD5 a44671d498f976c850e8c2fe0fa276e0
BLAKE2b-256 e000d16b2d0f050f57dd0f473594bc2c9f9f37e8470bc22bf0d854045407d0b7

See more details on using hashes here.

File details

Details for the file ezstructure-1.0.4-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for ezstructure-1.0.4-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 8fbe462d2fd8748f357d8d065736deeef21726ef5d6ab122f77014a73ee3f6d2
MD5 3c67179b80a200dfeecbaf75cc36a09e
BLAKE2b-256 cf2f29c0f70c0cd246eed0cc741f02cf9c82ddb4c097c2b7a50eb965e0aa8e19

See more details on using hashes here.

File details

Details for the file ezstructure-1.0.4-cp310-cp310-win32.whl.

File metadata

  • Download URL: ezstructure-1.0.4-cp310-cp310-win32.whl
  • Upload date:
  • Size: 223.3 kB
  • Tags: CPython 3.10, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for ezstructure-1.0.4-cp310-cp310-win32.whl
Algorithm Hash digest
SHA256 77e883a284f29b8584ed3830a7fde2161979879beef8abf692b79b6e24c33ceb
MD5 9a44c2754c4c3747f81f280da4a484f4
BLAKE2b-256 873f634f46fbcd9702c26d254e878554c3f50f466bff7f90edf51fce23110654

See more details on using hashes here.

File details

Details for the file ezstructure-1.0.4-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: ezstructure-1.0.4-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 250.4 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for ezstructure-1.0.4-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 51d6d4f23f682d056adeb7306a017c500696325acd09f3812cf95206608f7dd6
MD5 fb4d1f9704e63e9defa67d3bf63782f5
BLAKE2b-256 a84c43e913dcb95cbc5cd80db36de2fbd6eb578d0b870a591e51bbf8d16bb3c2

See more details on using hashes here.

File details

Details for the file ezstructure-1.0.4-cp39-cp39-win32.whl.

File metadata

  • Download URL: ezstructure-1.0.4-cp39-cp39-win32.whl
  • Upload date:
  • Size: 224.4 kB
  • Tags: CPython 3.9, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for ezstructure-1.0.4-cp39-cp39-win32.whl
Algorithm Hash digest
SHA256 81d98a5454898a181768de717941cd2cac4a3fd16eef3fae1ec82d933327202f
MD5 dbf0968b75f95246fe3aca4c2e636711
BLAKE2b-256 9a3098a36c1309ce626d792fdb67c3d82815a7dfd88dae54f675085c7ca12699

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page