Skip to main content

EasyStructure

Project description

EasyStructure

Introduction

EasyStructure is a cleant-up and repackaged version fastStructure. It runs on Python 3 and can be straightforwardly installed using pip install ezstructure. It does not depend on any non-Python libraries. Compared to the original fastStructure program, this version is about 50% slower when using the --prior=logistic option (but not with the default value of simple).

fastStructure is a fast algorithm for inferring population structure from large SNP genotype data. It is based on a variational Bayesian framework for posterior inference and was written in Python2.x.

Citation

Anil Raj, Matthew Stephens, and Jonathan K. Pritchard. fastSTRUCTURE: Variational Inference of Population Structure in Large SNP Data Sets, (Genetics) June 2014 197:573-589 [Genetics, Biorxiv]

Installation

EasyStructure can be installed easily by using the pip package installer. If you have pip installed, just run the following command to install the latest release:

pip install -U ezstructure

Using EasyStructure

EasyStructure is command-line software. The main command is ezstructure. You can view the command-line help by running the command:

ezstructure --help

EasyStructure, like its ancestor FastStructure, performs inference for the simplest, independent-loci, admixture model, with two choices of priors that can be specified using the --prior flag. Thus, unlike Structure, EasyStructure does not require the mainparams and extraparam files. The inference algorithm used by FastStructure is fundamentally different from that of Structure and requires the setting of far fewer options. All options can be passed via the flags listed above.

Main options

The key options to pass to the scripts are the input file, the output file and the number of populations. Assuming the input file is named genotypes.bed (with corresponding genotypes.fam and genotypes.bim), the output file is named genotypes_output and the number of populations you would like is 3, you can run the algorithm as follows:

ezstructure -K 3 --input=genotypes --output=genotypes_output

This generates a genotypes_output.3.log file that tracks how the algorithm proceeds, and files genotypes_output.3.meanQ and genotypes_output.3.meanP containing the posterior mean of admixture proportions and allele frequencies, respectively. The orders of samples and SNPs in the output files match those in the .fam file and .bim file, respectively. Note that input file names need not include suffixes (e.g., .bed).

Input data format

The current implementation can import data from plink bed format and the original Structure format. If the data are in plink format, ensure that bed, bim and fam files for the dataset are all present in the same path.

While the original Structure program allowed for a more flexible input format, fastStructure expects a more specific Structure-like input format. Specifically, rows in the data file correspond to samples, with two rows per sample (note that only diploids are handled by this software), and columns correspond to SNPs. The first 6 columns of the file will be ignored; these typically would include IDs, metadata, etc. This software only handles bi-allelic loci. The two alleles at each locus can be encoded as desired; however, missing data should be encoded as -9.

Running on test data

A test simulated dataset is provided in test/testdata.bed in the source repository at GitHub with genotypes sampled for 200 individuals at 500 SNP loci. The output files in test/ were generated as follows:

ezstructure -K 3 --input=test/testdata --output=testoutput_simple --full --seed=100
ezstructure -K 3 --input=test/testdata --output=testoutput_logistic --full --seed=100 --prior=logistic

Executing the code with the provided test data should generate a log file identical to the ones in test/, (except for the numbers in the Iteration_Time (secs) column) as a final check that the source code has been downloaded and compiled correctly. The algorithm scales linearly with number of samples, number of loci and value of K; the expected runtime for a new dataset can be computed from the runtime in the above log file.

Choosing model complexity

In order to choose the appropriate number of model components that explain structure in the dataset, we recommend running the algorithm for multiple choices of K. We have provided a utility tool, structure_choosek, to parse through the output of these runs and provide a reasonable range of values for the model complexity appropriate for this dataset.

Assuming the algorithm was run on the test dataset for choices of K ranging from 1 to 10, and the output flag was --output=test/testoutput_simple, you can obtain the model complexity by doing the following:

ezstructure_choosek --input=test/testoutput_simple

The output would look like:

Model complexity that maximizes marginal likelihood = 2
Model components used to explain structure in data = 4

Visualizing admixture proportions

In order to visualize the expected admixture proportions inferred by EasyStructure, we have provided a simple tool to generate Distruct plots using the mean of the variational posterior distribution over admixture proportions. The samples in the plot will be grouped according to population labels inferred by EasyStructure. However, if the user would like to group the samples according to some other categorical label (e.g., geographic location), these labels can be provided as a separate file using the flag --popfile. The order of labels in this file (one label per row) should match the order of samples in the input data files.

Assuming the algorithm was run on the test dataset for K=5, and the output flag was --output=test/testoutput_simple, you can generate a Distruct plot by doing the following:

ezdistruct -K 5 --input=test/testoutput_simple --output=test/testoutput_simple_distruct.svg

Python interface

As EasyStructure can be installed using pip, it is possible to use it as a dependency for other packages. To use EasyStructure from within Python code, use the following example:

from ezstructure.io import parse_bed, parse_str, write_output
from ezstructure.structure import run_structure

# Parse input file.
G = parse_bed("example.bed")  # Or parse_str("example.str")

# Set parameters.
K = 3
out_prefix = "example"
tol = 1e-6
prior = "simple"
cv = 0

# Run algorithm.
Q, P, other = run_structure(G, K, out_prefix, tol, prior, cv)

# Write output.
write_output(Q, P, other, K, out_prefix, full=True)

Changelog

Version 1.0.0

Initial repackaged version.

Version 1.0.1

Corrected python_requires declaration to exclude Python 3.5.

Version 1.0.2

Updated to support Cython3 and use language_level=3.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ezstructure-1.0.2.tar.gz (23.0 kB view details)

Uploaded Source

Built Distributions

ezstructure-1.0.2-cp312-cp312-win_amd64.whl (247.1 kB view details)

Uploaded CPython 3.12 Windows x86-64

ezstructure-1.0.2-cp312-cp312-win32.whl (228.4 kB view details)

Uploaded CPython 3.12 Windows x86

ezstructure-1.0.2-cp311-cp311-win_amd64.whl (258.4 kB view details)

Uploaded CPython 3.11 Windows x86-64

ezstructure-1.0.2-cp311-cp311-win32.whl (234.3 kB view details)

Uploaded CPython 3.11 Windows x86

ezstructure-1.0.2-cp310-cp310-win_amd64.whl (258.3 kB view details)

Uploaded CPython 3.10 Windows x86-64

ezstructure-1.0.2-cp310-cp310-win32.whl (235.4 kB view details)

Uploaded CPython 3.10 Windows x86

ezstructure-1.0.2-cp39-cp39-win_amd64.whl (258.5 kB view details)

Uploaded CPython 3.9 Windows x86-64

ezstructure-1.0.2-cp39-cp39-win32.whl (235.8 kB view details)

Uploaded CPython 3.9 Windows x86

ezstructure-1.0.2-cp38-cp38-win_amd64.whl (260.4 kB view details)

Uploaded CPython 3.8 Windows x86-64

ezstructure-1.0.2-cp38-cp38-win32.whl (238.5 kB view details)

Uploaded CPython 3.8 Windows x86

ezstructure-1.0.2-cp37-cp37m-win_amd64.whl (254.0 kB view details)

Uploaded CPython 3.7m Windows x86-64

ezstructure-1.0.2-cp37-cp37m-win32.whl (234.1 kB view details)

Uploaded CPython 3.7m Windows x86

ezstructure-1.0.2-cp36-cp36m-win_amd64.whl (279.1 kB view details)

Uploaded CPython 3.6m Windows x86-64

ezstructure-1.0.2-cp36-cp36m-win32.whl (251.2 kB view details)

Uploaded CPython 3.6m Windows x86

File details

Details for the file ezstructure-1.0.2.tar.gz.

File metadata

  • Download URL: ezstructure-1.0.2.tar.gz
  • Upload date:
  • Size: 23.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.11

File hashes

Hashes for ezstructure-1.0.2.tar.gz
Algorithm Hash digest
SHA256 6e6f9a2a0524603507c615f16e2b6b410c201060b53e98296fd42b37fdea4a64
MD5 2a5967bc145c54bacc4a4229d412265b
BLAKE2b-256 3f7a1c24b8cc8d02e85d20f6363dbfe2bf06e805f11d50f7eb619d0c87dad612

See more details on using hashes here.

File details

Details for the file ezstructure-1.0.2-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for ezstructure-1.0.2-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 f779b3881bd0f1c12c055bbe48abe40bdde13fb31a6dcacbb7e1454f8c4a5f0c
MD5 a867d56b24cb5b867f18f5124418c91b
BLAKE2b-256 b53ade2e7cc34497829c7bdac5e297eb0c974fb4f2ab5df3e685475ac1100ec9

See more details on using hashes here.

File details

Details for the file ezstructure-1.0.2-cp312-cp312-win32.whl.

File metadata

  • Download URL: ezstructure-1.0.2-cp312-cp312-win32.whl
  • Upload date:
  • Size: 228.4 kB
  • Tags: CPython 3.12, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.11

File hashes

Hashes for ezstructure-1.0.2-cp312-cp312-win32.whl
Algorithm Hash digest
SHA256 983b119e93ce8a7c78600607e102a1b99f04b064a2a2daba67b516400c01aad1
MD5 dddab3a002e86d92ced4eb300d634a71
BLAKE2b-256 a39ba81c7ddafc0c4981a56395579867e51e4a592bfed17fb585ddd4969ab02e

See more details on using hashes here.

File details

Details for the file ezstructure-1.0.2-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for ezstructure-1.0.2-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 1716702ab5c652a9bfb3803d7148266383bc9739375275bcad8c65646294f797
MD5 9b277613ddf57b071b195d0ac8a3944c
BLAKE2b-256 1e78ad1b56768846cdec1f2f6b5df88b0da5361f77390306b6b4a9e172d86570

See more details on using hashes here.

File details

Details for the file ezstructure-1.0.2-cp311-cp311-win32.whl.

File metadata

  • Download URL: ezstructure-1.0.2-cp311-cp311-win32.whl
  • Upload date:
  • Size: 234.3 kB
  • Tags: CPython 3.11, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.11

File hashes

Hashes for ezstructure-1.0.2-cp311-cp311-win32.whl
Algorithm Hash digest
SHA256 917203312dc0fc88089b41c7f4e39369a7fd6494ee648acf138a20ee7ea4ca10
MD5 15e27e750de6878fc48e219b0c5eec48
BLAKE2b-256 9481f06d82d93e4854a18f0e0929877d8e89b9c2bf95ad0d5c8cff2b544a3011

See more details on using hashes here.

File details

Details for the file ezstructure-1.0.2-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for ezstructure-1.0.2-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 eaf20a5de1ce9f555e36b98015379e738bea7e64829d0689ccd3de8736ed5f8b
MD5 1cb2b9e7eb081634b535c47c396225b5
BLAKE2b-256 c980e0dfa2f04d19cba95bbe033227aee2435ba97dec3cba4c79bb05f6ac61c7

See more details on using hashes here.

File details

Details for the file ezstructure-1.0.2-cp310-cp310-win32.whl.

File metadata

  • Download URL: ezstructure-1.0.2-cp310-cp310-win32.whl
  • Upload date:
  • Size: 235.4 kB
  • Tags: CPython 3.10, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.11

File hashes

Hashes for ezstructure-1.0.2-cp310-cp310-win32.whl
Algorithm Hash digest
SHA256 29f92c6d4f52fa94e5cf9835d66a1085e5a0ffbc8d008694e97977cee62e3272
MD5 c27f468b529de2c474bd1291bc7a5477
BLAKE2b-256 d8e8bfcc3ae4ac48ba24589b0f820bb6434fdcf1d23c3863244cc45925b48721

See more details on using hashes here.

File details

Details for the file ezstructure-1.0.2-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for ezstructure-1.0.2-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 e148acd6ff8071f28e43eeba81e451b6af4fb93db9914f17ec901e706870e84d
MD5 c12ff6a679378959f04219b8f83d3508
BLAKE2b-256 6177f9a4522bbbdd98958195519e6b0cb5b06bd0324f9d78ca0604f46500fb39

See more details on using hashes here.

File details

Details for the file ezstructure-1.0.2-cp39-cp39-win32.whl.

File metadata

  • Download URL: ezstructure-1.0.2-cp39-cp39-win32.whl
  • Upload date:
  • Size: 235.8 kB
  • Tags: CPython 3.9, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.11

File hashes

Hashes for ezstructure-1.0.2-cp39-cp39-win32.whl
Algorithm Hash digest
SHA256 b11cdc5e7a4bb7e0525ed7fde2f21fa6dcf8abc86e23cd7e496aaafbd93a925a
MD5 2a48b52463b8a3e738ba8567e455e9ee
BLAKE2b-256 4345f4be57b9ef80cb3f80a172ac16558b23cf7580cff811fe34443ffc420a0c

See more details on using hashes here.

File details

Details for the file ezstructure-1.0.2-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for ezstructure-1.0.2-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 b7ed3eb0e3fd6d83f629c865e4ba9eebe7bc1fc3a23d60a3e8db36b53e4ad8ab
MD5 41b1079c18a60b8f4542897d4802061a
BLAKE2b-256 d520cd5a3676e1aeaef0c3eaabb5b15e1ac6ab00fd8c4a5cefc2f0dddee53d97

See more details on using hashes here.

File details

Details for the file ezstructure-1.0.2-cp38-cp38-win32.whl.

File metadata

  • Download URL: ezstructure-1.0.2-cp38-cp38-win32.whl
  • Upload date:
  • Size: 238.5 kB
  • Tags: CPython 3.8, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.11

File hashes

Hashes for ezstructure-1.0.2-cp38-cp38-win32.whl
Algorithm Hash digest
SHA256 bb41488a3c9bef8ed0aab05ba930fa87dc3bab1d39fe4f73aa74feaf6cfb8289
MD5 cd3692ba9642bda2a9ba2054850ab6bb
BLAKE2b-256 9454e79f8ef67d1f4927e74d67cea35a58ed20bdc699840d915391bb412cf209

See more details on using hashes here.

File details

Details for the file ezstructure-1.0.2-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for ezstructure-1.0.2-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 130da0df3eec235b35347a76b6b8f76a0b3cb0812aab290199f28cc92c538ad9
MD5 04d9d51313dffcd557797301589558e2
BLAKE2b-256 47c3be83f1ceb1cc69b0b197a4cf10cdc87883efdf09f1916e87d52ce2c756bf

See more details on using hashes here.

File details

Details for the file ezstructure-1.0.2-cp37-cp37m-win32.whl.

File metadata

  • Download URL: ezstructure-1.0.2-cp37-cp37m-win32.whl
  • Upload date:
  • Size: 234.1 kB
  • Tags: CPython 3.7m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.11

File hashes

Hashes for ezstructure-1.0.2-cp37-cp37m-win32.whl
Algorithm Hash digest
SHA256 2c6514cf8ab286ae7b2e11bc3b85acabfb7747fd31b208453a476df31b63f0b0
MD5 a9789c75ef0c177cfb137e9863ad6b87
BLAKE2b-256 e597e0165567266c6939537b59d8c9f69a899306099155307420dc6a5e57bc07

See more details on using hashes here.

File details

Details for the file ezstructure-1.0.2-cp36-cp36m-win_amd64.whl.

File metadata

File hashes

Hashes for ezstructure-1.0.2-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 ae235cc188fcea079f479f56479a2fc0f3e8d5f21f817ba06130580e5d26b84a
MD5 1f386e05df736e03dbd45d49bd122ac3
BLAKE2b-256 8f026681d649f36acdb6ec2fc7c40a484a8bde46592fc062bc9e9451ed1817b9

See more details on using hashes here.

File details

Details for the file ezstructure-1.0.2-cp36-cp36m-win32.whl.

File metadata

  • Download URL: ezstructure-1.0.2-cp36-cp36m-win32.whl
  • Upload date:
  • Size: 251.2 kB
  • Tags: CPython 3.6m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.11

File hashes

Hashes for ezstructure-1.0.2-cp36-cp36m-win32.whl
Algorithm Hash digest
SHA256 c535fc8d66bd0ba71687e93eabb46cbb5f83864fbb3c29c49905e440ddc74fe3
MD5 60b0167843c1dbba5a5bd6fc420ba08e
BLAKE2b-256 60a7cd914c7552249fd2de38598c4366ae881b9d8adbc76b5b1a079c35ac13ec

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page