EasyStructure
Project description
EasyStructure
Introduction
EasyStructure is a cleant-up and repackaged version fastStructure. It runs on Python 3 and
can be straightforwardly installed using pip install ezstructure
. It does not depend on any
non-Python libraries. Compared to the original fastStructure program, this version is about 50%
slower when using the --prior=logistic
option (but not with the default value of simple
).
fastStructure is a fast algorithm for inferring population structure from large SNP genotype data. It is based on a variational Bayesian framework for posterior inference and was written in Python2.x.
Citation
Anil Raj, Matthew Stephens, and Jonathan K. Pritchard. fastSTRUCTURE: Variational Inference of Population Structure in Large SNP Data Sets, (Genetics) June 2014 197:573-589 [Genetics, Biorxiv]
Installation
EasyStructure can be installed easily by using the pip
package installer.
If you have pip
installed, just run the following command to install the latest release:
pip install -U ezstructure
Using EasyStructure
EasyStructure is command-line software. The main command is ezstructure
. You can view the
command-line help by running the command:
ezstructure --help
EasyStructure, like its ancestor FastStructure, performs inference for the simplest, independent-loci,
admixture model, with two choices of priors that can be specified using the --prior
flag.
Thus, unlike Structure, EasyStructure does not require the mainparams and extraparam files.
The inference algorithm used by FastStructure is fundamentally different from that of Structure and
requires the setting of far fewer options. All options can be passed via the flags listed above.
Main options
The key options to pass to the scripts are the input file, the output file and the number of populations.
Assuming the input file is named genotypes.bed
(with corresponding genotypes.fam
and genotypes.bim
),
the output file is named genotypes_output
and the number of populations you would like is 3,
you can run the algorithm as follows:
ezstructure -K 3 --input=genotypes --output=genotypes_output
This generates a genotypes_output.3.log
file that tracks how the algorithm proceeds, and files
genotypes_output.3.meanQ
and genotypes_output.3.meanP
containing the posterior mean of
admixture proportions and allele frequencies, respectively. The orders of samples and
SNPs in the output files match those in the .fam
file and .bim
file, respectively.
Note that input file names need not include suffixes (e.g., .bed).
Input data format
The current implementation can import data from plink bed format and the original Structure format. If the data are in plink format, ensure that bed, bim and fam files for the dataset are all present in the same path.
While the original Structure program allowed for a more flexible input format, fastStructure expects a more specific Structure-like input format. Specifically, rows in the data file correspond to samples, with two rows per sample (note that only diploids are handled by this software), and columns correspond to SNPs. The first 6 columns of the file will be ignored; these typically would include IDs, metadata, etc. This software only handles bi-allelic loci. The two alleles at each locus can be encoded as desired; however, missing data should be encoded as -9.
Running on test data
A test simulated dataset is provided in test/testdata.bed
in the source repository at
GitHub with genotypes sampled for
200 individuals at 500 SNP loci. The output files in test/
were generated as follows:
ezstructure -K 3 --input=test/testdata --output=testoutput_simple --full --seed=100
ezstructure -K 3 --input=test/testdata --output=testoutput_logistic --full --seed=100 --prior=logistic
Executing the code with the provided test data should generate a log file identical to the ones in test/
,
(except for the numbers in the Iteration_Time (secs)
column) as a final check that the source code
has been downloaded and compiled correctly. The algorithm scales linearly with number of samples,
number of loci and value of K; the expected runtime for a new dataset can be computed from the runtime in the above log file.
Choosing model complexity
In order to choose the appropriate number of model components that explain structure in the dataset,
we recommend running the algorithm for multiple choices of K. We have provided a utility tool,
structure_choosek
, to parse through the output of these runs and provide a reasonable range of
values for the model complexity appropriate for this dataset.
Assuming the algorithm was run on the test dataset for choices of K ranging from 1 to 10, and the output flag was --output=test/testoutput_simple, you can obtain the model complexity by doing the following:
ezstructure_choosek --input=test/testoutput_simple
The output would look like:
Model complexity that maximizes marginal likelihood = 2
Model components used to explain structure in data = 4
Visualizing admixture proportions
In order to visualize the expected admixture proportions inferred by EasyStructure, we have provided a simple tool to generate Distruct plots using the mean of the variational posterior distribution over admixture proportions. The samples in the plot will be grouped according to population labels inferred by EasyStructure. However, if the user would like to group the samples according to some other categorical label (e.g., geographic location), these labels can be provided as a separate file using the flag --popfile. The order of labels in this file (one label per row) should match the order of samples in the input data files.
Assuming the algorithm was run on the test dataset for K=5, and the output flag was --output=test/testoutput_simple, you can generate a Distruct plot by doing the following:
ezdistruct -K 5 --input=test/testoutput_simple --output=test/testoutput_simple_distruct.svg
Python interface
As EasyStructure can be installed using pip, it is possible to use it as a dependency for other packages. To use EasyStructure from within Python code, use the following example:
from ezstructure.io import parse_bed, parse_str, write_output
from ezstructure.structure import run_structure
# Parse input file.
G = parse_bed("example.bed") # Or parse_str("example.str")
# Set parameters.
K = 3
out_prefix = "example"
tol = 1e-6
prior = "simple"
cv = 0
# Run algorithm.
Q, P, other = run_structure(G, K, out_prefix, tol, prior, cv)
# Write output.
write_output(Q, P, other, K, out_prefix, full=True)
Changelog
Version 1.0.0
Initial repackaged version.
Version 1.0.1
Corrected python_requires
declaration to exclude Python 3.5.
Version 1.0.2
Updated to support Cython3 and use language_level=3
.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file ezstructure-1.0.2.tar.gz
.
File metadata
- Download URL: ezstructure-1.0.2.tar.gz
- Upload date:
- Size: 23.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6e6f9a2a0524603507c615f16e2b6b410c201060b53e98296fd42b37fdea4a64 |
|
MD5 | 2a5967bc145c54bacc4a4229d412265b |
|
BLAKE2b-256 | 3f7a1c24b8cc8d02e85d20f6363dbfe2bf06e805f11d50f7eb619d0c87dad612 |
File details
Details for the file ezstructure-1.0.2-cp312-cp312-win_amd64.whl
.
File metadata
- Download URL: ezstructure-1.0.2-cp312-cp312-win_amd64.whl
- Upload date:
- Size: 247.1 kB
- Tags: CPython 3.12, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f779b3881bd0f1c12c055bbe48abe40bdde13fb31a6dcacbb7e1454f8c4a5f0c |
|
MD5 | a867d56b24cb5b867f18f5124418c91b |
|
BLAKE2b-256 | b53ade2e7cc34497829c7bdac5e297eb0c974fb4f2ab5df3e685475ac1100ec9 |
File details
Details for the file ezstructure-1.0.2-cp312-cp312-win32.whl
.
File metadata
- Download URL: ezstructure-1.0.2-cp312-cp312-win32.whl
- Upload date:
- Size: 228.4 kB
- Tags: CPython 3.12, Windows x86
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 983b119e93ce8a7c78600607e102a1b99f04b064a2a2daba67b516400c01aad1 |
|
MD5 | dddab3a002e86d92ced4eb300d634a71 |
|
BLAKE2b-256 | a39ba81c7ddafc0c4981a56395579867e51e4a592bfed17fb585ddd4969ab02e |
File details
Details for the file ezstructure-1.0.2-cp311-cp311-win_amd64.whl
.
File metadata
- Download URL: ezstructure-1.0.2-cp311-cp311-win_amd64.whl
- Upload date:
- Size: 258.4 kB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1716702ab5c652a9bfb3803d7148266383bc9739375275bcad8c65646294f797 |
|
MD5 | 9b277613ddf57b071b195d0ac8a3944c |
|
BLAKE2b-256 | 1e78ad1b56768846cdec1f2f6b5df88b0da5361f77390306b6b4a9e172d86570 |
File details
Details for the file ezstructure-1.0.2-cp311-cp311-win32.whl
.
File metadata
- Download URL: ezstructure-1.0.2-cp311-cp311-win32.whl
- Upload date:
- Size: 234.3 kB
- Tags: CPython 3.11, Windows x86
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 917203312dc0fc88089b41c7f4e39369a7fd6494ee648acf138a20ee7ea4ca10 |
|
MD5 | 15e27e750de6878fc48e219b0c5eec48 |
|
BLAKE2b-256 | 9481f06d82d93e4854a18f0e0929877d8e89b9c2bf95ad0d5c8cff2b544a3011 |
File details
Details for the file ezstructure-1.0.2-cp310-cp310-win_amd64.whl
.
File metadata
- Download URL: ezstructure-1.0.2-cp310-cp310-win_amd64.whl
- Upload date:
- Size: 258.3 kB
- Tags: CPython 3.10, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | eaf20a5de1ce9f555e36b98015379e738bea7e64829d0689ccd3de8736ed5f8b |
|
MD5 | 1cb2b9e7eb081634b535c47c396225b5 |
|
BLAKE2b-256 | c980e0dfa2f04d19cba95bbe033227aee2435ba97dec3cba4c79bb05f6ac61c7 |
File details
Details for the file ezstructure-1.0.2-cp310-cp310-win32.whl
.
File metadata
- Download URL: ezstructure-1.0.2-cp310-cp310-win32.whl
- Upload date:
- Size: 235.4 kB
- Tags: CPython 3.10, Windows x86
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 29f92c6d4f52fa94e5cf9835d66a1085e5a0ffbc8d008694e97977cee62e3272 |
|
MD5 | c27f468b529de2c474bd1291bc7a5477 |
|
BLAKE2b-256 | d8e8bfcc3ae4ac48ba24589b0f820bb6434fdcf1d23c3863244cc45925b48721 |
File details
Details for the file ezstructure-1.0.2-cp39-cp39-win_amd64.whl
.
File metadata
- Download URL: ezstructure-1.0.2-cp39-cp39-win_amd64.whl
- Upload date:
- Size: 258.5 kB
- Tags: CPython 3.9, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e148acd6ff8071f28e43eeba81e451b6af4fb93db9914f17ec901e706870e84d |
|
MD5 | c12ff6a679378959f04219b8f83d3508 |
|
BLAKE2b-256 | 6177f9a4522bbbdd98958195519e6b0cb5b06bd0324f9d78ca0604f46500fb39 |
File details
Details for the file ezstructure-1.0.2-cp39-cp39-win32.whl
.
File metadata
- Download URL: ezstructure-1.0.2-cp39-cp39-win32.whl
- Upload date:
- Size: 235.8 kB
- Tags: CPython 3.9, Windows x86
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b11cdc5e7a4bb7e0525ed7fde2f21fa6dcf8abc86e23cd7e496aaafbd93a925a |
|
MD5 | 2a48b52463b8a3e738ba8567e455e9ee |
|
BLAKE2b-256 | 4345f4be57b9ef80cb3f80a172ac16558b23cf7580cff811fe34443ffc420a0c |
File details
Details for the file ezstructure-1.0.2-cp38-cp38-win_amd64.whl
.
File metadata
- Download URL: ezstructure-1.0.2-cp38-cp38-win_amd64.whl
- Upload date:
- Size: 260.4 kB
- Tags: CPython 3.8, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b7ed3eb0e3fd6d83f629c865e4ba9eebe7bc1fc3a23d60a3e8db36b53e4ad8ab |
|
MD5 | 41b1079c18a60b8f4542897d4802061a |
|
BLAKE2b-256 | d520cd5a3676e1aeaef0c3eaabb5b15e1ac6ab00fd8c4a5cefc2f0dddee53d97 |
File details
Details for the file ezstructure-1.0.2-cp38-cp38-win32.whl
.
File metadata
- Download URL: ezstructure-1.0.2-cp38-cp38-win32.whl
- Upload date:
- Size: 238.5 kB
- Tags: CPython 3.8, Windows x86
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bb41488a3c9bef8ed0aab05ba930fa87dc3bab1d39fe4f73aa74feaf6cfb8289 |
|
MD5 | cd3692ba9642bda2a9ba2054850ab6bb |
|
BLAKE2b-256 | 9454e79f8ef67d1f4927e74d67cea35a58ed20bdc699840d915391bb412cf209 |
File details
Details for the file ezstructure-1.0.2-cp37-cp37m-win_amd64.whl
.
File metadata
- Download URL: ezstructure-1.0.2-cp37-cp37m-win_amd64.whl
- Upload date:
- Size: 254.0 kB
- Tags: CPython 3.7m, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 130da0df3eec235b35347a76b6b8f76a0b3cb0812aab290199f28cc92c538ad9 |
|
MD5 | 04d9d51313dffcd557797301589558e2 |
|
BLAKE2b-256 | 47c3be83f1ceb1cc69b0b197a4cf10cdc87883efdf09f1916e87d52ce2c756bf |
File details
Details for the file ezstructure-1.0.2-cp37-cp37m-win32.whl
.
File metadata
- Download URL: ezstructure-1.0.2-cp37-cp37m-win32.whl
- Upload date:
- Size: 234.1 kB
- Tags: CPython 3.7m, Windows x86
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c6514cf8ab286ae7b2e11bc3b85acabfb7747fd31b208453a476df31b63f0b0 |
|
MD5 | a9789c75ef0c177cfb137e9863ad6b87 |
|
BLAKE2b-256 | e597e0165567266c6939537b59d8c9f69a899306099155307420dc6a5e57bc07 |
File details
Details for the file ezstructure-1.0.2-cp36-cp36m-win_amd64.whl
.
File metadata
- Download URL: ezstructure-1.0.2-cp36-cp36m-win_amd64.whl
- Upload date:
- Size: 279.1 kB
- Tags: CPython 3.6m, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ae235cc188fcea079f479f56479a2fc0f3e8d5f21f817ba06130580e5d26b84a |
|
MD5 | 1f386e05df736e03dbd45d49bd122ac3 |
|
BLAKE2b-256 | 8f026681d649f36acdb6ec2fc7c40a484a8bde46592fc062bc9e9451ed1817b9 |
File details
Details for the file ezstructure-1.0.2-cp36-cp36m-win32.whl
.
File metadata
- Download URL: ezstructure-1.0.2-cp36-cp36m-win32.whl
- Upload date:
- Size: 251.2 kB
- Tags: CPython 3.6m, Windows x86
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c535fc8d66bd0ba71687e93eabb46cbb5f83864fbb3c29c49905e440ddc74fe3 |
|
MD5 | 60b0167843c1dbba5a5bd6fc420ba08e |
|
BLAKE2b-256 | 60a7cd914c7552249fd2de38598c4366ae881b9d8adbc76b5b1a079c35ac13ec |