Skip to main content

Manipulate tree sequences produced by SLiM.

Project description

# pySLiM

SLiM can now read, and write tree sequences, which store genealogical data of populations.
SLiM can use a tree sequence produced by the coalescent simulator `msprime` to initialize
a simulation, but to do so we need to add the relevant metadata. SLiM can also write out
history as a tree sequence, and in so doing it records extra information in metadata.
This package makes it easy to add the relevant metadata to a tree sequence so that SLiM
can use it, and to read the metadata in a SLiM-produced tree sequence.

The SLiM manual documents how the extra metadata is stored in the tree sequence files,
and provides additional examples of how to use this package.

## Installation

To install `pyslim`, do
```
git clone https://github.com/tskit-dev/pyslim.git
cd pyslim
python setup.py install --user
```
You should also be able to install it with `pip install pyslim`.
You'll also need an up-to-date [msprime](https://github.com/tskit-dev/msprime) and [SLiM](https://messerlab.org/slim/).

To run the tests to make sure everything is working, do:
```
python -m nose tests
```

*Note:* if you use `python3` you may need to replace `python` with `python3` above.


## Quickstart: coalescent simulation for SLiM

Coalescent simulation is more biologically limited, but is much faster than
forwards simluation, so it can be helpful to run a "hybrid" simulation, by
endowing a SLiM simulation with a history derived from a msprime simulation.
The `pyslim.annotate()` command helps make this easy, by adding default
information to a tree sequence, allowing it to be read in by SLiM. This will
simulate a tree sequence with msprime, add SLiM information, and write it out
to a `.trees` file:
```
import msprime
import pyslim

# simulate a tree sequence of 12 nodes
ts = msprime.simulate(12, mutation_rate=1.0, recombination_rate=1.0)
new_ts = pyslim.annotate_defaults(ts, model_type="nonWF", slim_generation=1)
new_ts.dump("initialize_nonWF.trees")
```

The resulting file `slim_ts.trees` can be read into SLiM to be used as a starting state,
as illustrated in this minimal example:
```
initialize()
{
setSeed(23);
initializeSLiMModelType("nonWF");
initializeTreeSeq();
initializeMutationRate(1e-2);
initializeMutationType("m1", 0.5, "f", -0.1);
initializeGenomicElementType("g1", m1, 1.0);
initializeGenomicElement(g1, 0, 99);
initializeRecombinationRate(1e-2);
}

1 early() {
sim.readFromPopulationFile("initialize_nonWF.trees");
}

10 {
sim.treeSeqOutput("nonWF_restart.trees");
catn("Done.");
sim.simulationFinished();
}
```
See the SLiM manual for more about this operation.


## Quickstart: finishing an uncoalesced SLiM simulation

Although we can initialize a SLiM simulation with the results of a coalescent simulation,
if during the simulation we don't actually use the genotypes for anything, it can be much
more efficient to only coalesce the portions of the first-generation ancestors that have
not yet coalesced. (See the SLiM manual for more explanation, and the upcoming paper.)
Doing this is as simple as:
```
ts = pyslim.load("unfinished.trees")
ts.recapitate(recombination_rate = 1e-6, Ne=1000)
```

## Quickstart: adding neutral mutations to a SLiM simulation

If you have recorded a tree sequence in SLiM, likely you have not included any neutral mutations.
To add these (in a completely equivalent way to having included them during the simulation),
you can use the `msprime.mutate( )` function, which returns a new tree sequence with additional mutations.
This works as follows:
```
ts = pyslim.load("unmutated.trees")
mut_ts = pyslim.SlimTreeSequence(msprime.mutate(ts, rate=1e-8, keep=True))
```
This will add infinite-sites mutations at a rate of 1e-8 per site, and will
`keep` any existing mutations (and not add any new ones to the sites where they
exist already). We have wrapped the call to `msprime.mutate` in a call to
`pyslim.SlimTreeSequence`, because `msprime.mutate` returns an *msprime* tree sequence,
and by converting it back into a `pyslim` tree sequence we can still use the methods
defined by `pyslim`. (The conversion does not modify the tree sequence at all,
it only adds the `.slim_generation` attribute.) The output of other `msprime`
functions that return tree sequences may be converted back to
`pyslim.SlimTreeSequence` in the same way.


## Quickstart: reading SLiM metadata

To retrieve the extra information that SLiM stores in a tree sequence, use the `extract_X_metadata()`
functions, where `X` is one of `mutation`, `population`, `node`, or `individual`.
For instance, to see the age of each Individual produced by annotation
in the previous example:
```
for ind in pyslim.extract_individual_metadata(new_ts.tables):
print(ind.age)
```
In this example, all the ages are 0 (the default).

## Quickstart: modifying SLiM metadata

To *modify* the metadata that `pyslim` has introduced into a coalescent simulation,
or the metadata in a SLiM-produced tree sequence, use the `annotate_X_metadata()` functions.
For instance, to set the ages of the individuals in the tree sequence to random numbers between 1 and 4,
and write out the resulting tree sequence:
```
import random

tables = new_ts.dump_tables()
ind_md = list(pyslim.extract_individual_metadata(tables))
for ind in ind_md:
ind.age = random.choice([1,2,3,4])

pyslim.annotate_individual_metadata(tables, ind_md)
mod_ts = pyslim.load_tables(tables, slim_format=True)

for ind in pyslim.extract_individual_metadata(mod_ts.tables):
print(ind.age)

mod_ts.dump("modified_ts.trees")
```

# Documentation

Here we describe the technical details.
Currently, the python package `msprime` simulates tree sequences
and provides tools to work with them.
In the future, tools for working with tree sequences will be separated
into a package called `tskit`.
`pyslim` provides a thin interface between `msprime/tskit`.

## Metadata entries

SLiM records additional information in the metadata columns of Population, Individual, Node, and Mutation tables.
The information is recorded in a binary format, and is extracted and written by `pyslim` using the python `struct` module.
Nothing besides this binary information can be stored in the metadata of these tables for the tree sequence to be used by SLiM,
and so when `pyslim` annotates an existing tree sequence, anything in those columns is overwritten.
For more detailed documentation on the contents and format of the metadata, see the SLiM manual.

## Other important notes:

1. `tskit` "nodes" correspond to SLiM "genomes". Individuals in SLiM are diploid, so each has two nodes.

2. Since in SLiM, all individual are diploid, every individual will be associated with two nodes.
The Individual table contains entries for

a. the currently alive individuals,
b. any individuals that have been remembered with `treeSeqRememberIndividuals()`, and
c. the *first* generation of the SLiM simulation.

This last category is here because they are necessary for recapitation (described above);
but they are *not* marked as samples, so will most likely be removed if you `simplify` the tree sequence.

3. The "remembered nodes" will be the *first* nodes.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyslim-0.1.tar.gz (31.4 kB view hashes)

Uploaded Source

Built Distributions

pyslim-0.1-py3.6.egg (76.3 kB view hashes)

Uploaded Source

pyslim-0.1-py3-none-any.whl (39.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page