Population¶
The Population
object is the most important object of simuPOP. It consists
of one or more generations of individuals, grouped by subpopulations, and a
local Python dictionary to hold arbitrary population information. This class
provides a large number of functions to access and modify population structure,
individuals and their genotypes and information fields. The following sections
explain these features in detail.
Access and change individual genotype¶
From a user’s point of view, genotypes of all individuals in a population are
arranged sequentially. Similar to functions Individual.genotype
() and
Individual.setGenotype
(), genotypes of a population can be accessed in
batch using functions Population.genotype
() and
Population.setGenotype
(). However, because it is error prone to locate
an allele of a particular individual in this long array, these functions are
usually used to perform population-level genotype operations such as clearing
all alleles (e.g. pop.setGenotype(0)
) or counting the number of a particular
allele across all individuals (e.g. pop.genotype().count(1)
).
Another way to change alleles across the whole population is to recode existing alleles to other numbers. This is sometimes needed if you need to change allele states to conform with a particular mutation model, assumptions of other software applications or genetic samples. For example, if your dataset uses 1, 2, 3, 4 for A, C, T, G alleles, and you would like to use alleles 0, 1, 2 and 3 for A, C, G, T (a convention for simuPOP when nucleotide mutation models are involved), you can use
pop.recodeAlleles([0, 0, 1, 3, 2], alleleNames=['A', 'C', 'G', 'T'])
to convert and rename the alleles (1 allele to 0, 2 allele to 1, etc). This operation will be applied to all subpopulations for all ancestral generations, but can be restricted to selected loci.
Subpopulations¶
A simuPOP population consists of one or more subpopulations. If a population is not structured, it has one subpopulation that is the population itself. Subpopulations serve as barriers of individuals in the sense that mating only happens between individuals in the same subpopulation. A number of functions are provided to merge, remove, resize subpopulations, and move individuals between subpopulations (migration).
Example subPopName demonstrates how to use some of the subpopulation related functions.
Example: Manipulation of subpopulations
>>> import simuPOP as sim
>>> pop = sim.Population(size=[3, 4, 5], ploidy=1, loci=1, infoFields='x')
>>> # individual 0, 1, 2, ... will have an allele 0, 1, 2, ...
>>> pop.setGenotype(range(pop.popSize()))
>>> #
>>> pop.subPopSize(1)
4
>>> # merge subpopulations
>>> pop.mergeSubPops([1, 2])
1
>>> # split subpopulations
>>> pop.splitSubPop(1, [2, 7])
(1, 2)
>>> pop.subPopSizes()
(3, 2, 7)
>>> # remove subpopulations
>>> pop.removeSubPops(1)
>>> pop.subPopSizes()
(3, 7)
now exiting runScriptInteractively...
Some population operations change the IDs of subpopulations. For example, if a population has three subpopulations 0, 1, and 2, and subpopulation 1 is split into two subpouplations, subpopulation 2 will become subpopulation 3. Tracking the ID of a subpopulation can be problematic, especially when conditional or random subpopulation operations are involved. In this case, you can specify names to subpopulations. These names will follow their associated subpopulations during population operations so you can identify the ID of a subpopulation by its name. Note that simuPOP allows duplicate subpopulation names.
Example: Use of subpopulation names
>>> import simuPOP as sim
>>> pop = sim.Population(size=[3, 4, 5], subPopNames=['x', 'y', 'z'])
>>> pop.removeSubPops([1])
>>> pop.subPopNames()
('x', 'z')
>>> pop.subPopByName('z')
1
>>> pop.splitSubPop(1, [2, 3])
(1, 2)
>>> pop.subPopNames()
('x', 'z', 'z')
>>> pop.setSubPopName('z-1', 1)
>>> pop.subPopNames()
('x', 'z-1', 'z')
>>> pop.subPopByName('z')
2
now exiting runScriptInteractively...
Virtual subpopulations and virtual splitters *¶
simuPOP subpopulations can be further divided into virtual subpopulations (VSP), which are groups of individuals who share certain properties. For example, all male individuals, all unaffected individuals, all individuals with information field age > 20, all individuals with genotype 0, 0 at a given locus, can form VSPs. VSPs do not have to add up to the whole subpopulation, nor do they have to be non-overlapping. Unlike subpopulations that have strict boundaries, VSPs change easily with the changes of individual properties.
VSPs are defined by virtual splitters. It is a definition for groups of
individuals in each subpopulation. A splitter defines the same number of VSPs
in all subpopulations, although sizes of these VSPs vary across subpopulations
due to subpopulation differences. For example, a SexSplitter
()
defines two VSPs, the first with all male individuals and the second with all
female individuals, and a InfoSplitter
(field='x', values=[1, 2,
4]
) defines three VSPs whose members have values 1
, 2
and 4
at
information field x
, respectively. This splitter also allows the use of
cutoff values and ranges to define VSPs. If different types of VSPs are needed,
a combined splitter can be used to combine VSPs defined by several splitters.
A VSP is represented by a [sp, vsp]
pair where sp
and vsp
can be
subpopulation indexes or names. Its name and size can be obtained using
functions subPopName()
and subPopSize()
. Example virtualSplitter demonstrates how to apply virtual splitters to a population,
and how to check VSP names and sizes.
Example: Define virtual subpopulations in a population
>>> import simuPOP as sim
>>> import random
>>> pop = sim.Population(size=[200, 400], loci=[30], infoFields='x')
>>> # assign random information fields
>>> sim.initSex(pop)
>>> sim.initInfo(pop, lambda: random.randint(0, 3), infoFields='x')
>>> # define a virtual splitter by sex
>>> pop.setVirtualSplitter(sim.SexSplitter())
>>> pop.numVirtualSubPop() # Number of defined VSPs
2
>>> pop.subPopName([0, 0]) # Each VSP has a name
'Male'
>>> pop.subPopSize([0, 1]) # Size of VSP 1 in subpopulation 0
109
>>> pop.subPopSize([0, 'Female']) # Refer to vsp by its name
109
>>> # define a virtual splitter by information field 'x'
>>> pop.setVirtualSplitter(sim.InfoSplitter(field='x', values=[0, 1, 2, 3]))
>>> pop.numVirtualSubPop() # Number of defined VSPs
4
>>> pop.subPopName([0, 0]) # Each VSP has a name
'x = 0'
>>> pop.subPopSize([0, 0]) # Size of VSP 0 in subpopulation 0
46
>>> pop.subPopSize([1, 0]) # Size of VSP 0 in subpopulation 1
92
now exiting runScriptInteractively...
VSP provides an easy way to access groups of individuals in a subpopulation and allows finer control of an evolutionary process. For example, mating schemes can be applied to VSPs which makes it possible to apply different mating schemes to, for example, individuals with different ages. By applying migration, mutation etc to VSPs, it is easy to implement advanced features such as sex-biased migrations, different mutation rates for individuals at different stages of a disease. Example virtualSubPop demonstrates how to initialize genotype and information fields to individuals in male and female VSPs.
Example: Applications of virtual subpopulations
>>> import simuPOP as sim
>>> import random
>>> pop = sim.Population(10, loci=[2, 3], infoFields='Sex')
>>> sim.initSex(pop)
>>> pop.setVirtualSplitter(sim.SexSplitter())
>>> # initialize male and females with different genotypes.
>>> sim.initGenotype(pop, genotype=[0]*5, subPops=[(0, 0)])
>>> sim.initGenotype(pop, genotype=[1]*5, subPops=[(0, 1)])
>>> # set Sex information field to 0 for all males, and 1 for all females
>>> pop.setIndInfo([sim.MALE], 'Sex', [0, 0])
>>> pop.setIndInfo([sim.FEMALE], 'Sex', [0, 1])
>>> # Print individual genotypes, followed by values at information field Sex
>>> sim.dump(pop, structure=False)
SubPopulation 0 (), 10 Individuals:
0: FU 11 111 | 11 111 | 2
1: FU 11 111 | 11 111 | 2
2: MU 00 000 | 00 000 | 1
3: MU 00 000 | 00 000 | 1
4: MU 00 000 | 00 000 | 1
5: MU 00 000 | 00 000 | 1
6: MU 00 000 | 00 000 | 1
7: FU 11 111 | 11 111 | 2
8: FU 11 111 | 11 111 | 2
9: FU 11 111 | 11 111 | 2
now exiting runScriptInteractively...
Advanced virtual subpopulation splitters **¶
simuPOP provides a number of virtual splitters that can define VSPs using
specified properties. For example, InfoSplitter
(field='a',
values=[1,2,3]
) defines three VSPs whose individuals have values 1
, 2
,
and 3
at information field a
, respectively; SexSplitter
()
defines two VSPs of male and female individuals, respectively; and
RangeSplitter
(ranges=[[0, 2000], [2000, 5000]]
) defines two VSPs
using two blocks of individuals.
A CombinedSplitter
can be used if your simulation needs more than one
sets of VSPs. For example, you may want to split your subpopulations both by sex
and by affection status. In this case, you can define a combined splitter using
CombinedSplitter(splitters=[SexSplitter(), AffectionSplitter()])
This splitter simply stacks VSPs defined in AffectionSplitter
() after
SexSplitter
() so that unaffected and affected VSPs are now VSPs 2 and
3 (0 and 1 are used for male and female VSPs).
There are also scenarios when you would like to define finer VSPs with
individuals belonging to more than one VSPs. For example, you may want to have a
look of frequencies of certain alleles in affected male vs affected females, or
count the number of males and females with certain value at an information
field. In this case, a ProductSplitter
can be used to define VSPs using
interactions of several VSPs. For example,
ProductSplitter(splitters=[SexSplitter(), AffectionSplitter()])
defines 4 subpopulations by splitting VSPs defined by SexSplitter
()
with affection status. These four VSPs will then have unaffected male, affected
male, unaffected female and affected female individuals, respectively.
If you consider ProductSplitter
as an intersection splitter that
defines new VSPs as intersections of existing VSPs, you may wonder how to define
unions of VSPs. For example, you can make a case where you want to consider
Individuals with information field a < 0 or a > 100 together. A regular
InfoSplitter
(field='a', cutoff=[0, 100]
) cannot do that because
it defines three VSPs with \(a<0\), \(0\leq a<100\) and
\(a\geq100\), respectively. The trick here is to use parameter vspMap
of
a CombinedSplitter
. If this parameter is defined, multiple VSPs could
be groups or reordered to define a new set of VSPs. For example,
CombinedSplitter(splitters=[InfoSplitter(field='a', cutoff=[0, 100])], vspMap=[[0,2], 1])
defines two VSPs using VSPs 0 and 2, and VSP 1 defined by the
InfoSplitter
so that the first VSP contains individuals with
\(a<0\) or \(a\geq100\).
Example advancedVSP demonstrates some advanced usages of virtual splitters.
Example: Advanced virtual subpopulation usages.
>>> import simuPOP as sim
>>> import random
>>> pop = sim.Population(size=[2000, 4000], loci=[30], infoFields='x')
>>> # assign random information fields
>>> sim.initSex(pop)
>>> sim.initInfo(pop, lambda: random.randint(0, 3), infoFields='x')
>>> #
>>> # 1, use a combined splitter
>>> pop.setVirtualSplitter(sim.CombinedSplitter(splitters = [
... sim.SexSplitter(),
... sim.InfoSplitter(field='x', values=[0, 1, 2, 3])
... ]))
>>> pop.numVirtualSubPop() # Number of defined VSPs
6
>>> pop.subPopName([0, 0]) # Each VSP has a name
'Male'
>>> pop.subPopSize([0, 0]) # sim.MALE
1011
>>> pop.subPopSize([1, 4]) # individuals in sp 1 with value 2 at field x
1048
>>> #
>>> # use a product splitter that defines additional VSPs by sex and info
>>> pop.setVirtualSplitter(sim.ProductSplitter(splitters = [
... sim.SexSplitter(names=['M', 'F']), # give a new set of names
... sim.InfoSplitter(field='x', values=[0, 1, 2, 3])
... ]))
>>> pop.numVirtualSubPop() # Number of defined VSPs
8
>>> pop.subPopName([0, 0]) # Each VSP has a name
'M, x = 0'
>>> pop.subPopSize([0, 0]) # sim.MALE with value 1 in sp 0
240
>>> pop.subPopSize([1, 5]) # sim.FEMALE with value 1 in sp 1
453
>>> #
>>> # use a combined splitter to join VSPs defined by a
>>> # product splitter
>>> pop.setVirtualSplitter(sim.CombinedSplitter([
... sim.ProductSplitter([
... sim.SexSplitter(),
... sim.InfoSplitter(field='x', values=[0, 1, 2, 3])])],
... vspMap = [[0,1,2], [4,5,6], [7]],
... names = ['Male x<=3', 'Female x<=3', 'Female x=4']))
>>> pop.numVirtualSubPop() # Number of defined VSPs
3
>>> pop.subPopName([0, 0]) # Each VSP has a name
'Male x<=3'
>>> pop.subPopSize([0, 0]) # sim.MALE with value 0, 1, 2 at field x
770
>>> pop.subPopSize([1, 1]) # sim.FEMALE with value 0, 1 or 2 at field x
1493
now exiting runScriptInteractively...
Access individuals and their properties¶
There are many ways to access individuals of a population. For example, function
Population.Individual(idx)
returns a reference to the idx
-th
individual in a population. An optional parameter subPop
can be specified to
return the idx
-th individual in the subPop
-th subpopulation.
If you would like to access a group of individuals, either from a whole
population, a subpopulation, or from a virtual subpopulation,
Population.individuals
([subPop]
) is easier to use. This function
returns a Python iterator that can be used to iterate through individuals. An
advantage of this function is that subPop
can be a virtual subpopulation
which makes it easy to iterate through Individuals with certain properties (such
as all male Individuals). If you would like to iterate through multiple virtual
subpopulations in one or more ancestral generations, you can use another
function Population.allIndividuals
(subPops, ancGens
).
If more than one generations are stored in a population, function
ancestor(idx, [subPop], gen)
can be used to access Individual from an
ancestral generation (see Section subsec_Ancestral_populations for details). Because there is no group access
function for ancestors, it may be more convenient to use useAncestralGen
to
make an ancestral generation the current generation, and use
Population.Individuals
. Note that ancestor() function can always access
individuals at a certain generation, regardless which generation the current
generation is. Example batchAccess demonstrates how to use
all these Individual-access functions.
If an unique ID is assigned to all individuals in a population, you can look up
individuals from their IDs using function Population.indByID
(). The
information field to save individual ID is usually ind_id
and you can use
operator IdTagger
and its function form tagID
to set this
field. Note that this function can be used to look up individuals in the present
and all ancestral generations, although a parameter (ancGen
) can be used to
limit search to a specific generation if you know in advance which generation
the individual locates.
Example: Access individuals of a population
>>> import simuPOP as sim
>>> # create a sim.population with two generations. The current generation has values
>>> # 0-9 at information field x, the parental generation has values 10-19.
>>> pop = sim.Population(size=[5, 5], loci=[2, 3], infoFields='x', ancGen=1)
>>> pop.setIndInfo(range(10, 20), 'x')
>>> pop1 = pop.clone()
>>> pop1.setIndInfo(range(10), 'x')
>>> pop.push(pop1)
>>> #
>>> ind = pop.individual(5) # using absolute index
>>> ind.x
5.0
>>> ind.x # the same as ind.x
5.0
>>> # use a for loop, and relative index
>>> for idx in range(pop.subPopSize(1)):
... print(pop.individual(idx, 1).x)
...
5.0
6.0
7.0
8.0
9.0
>>> # It is usually easier to use an iterator
>>> for ind in pop.individuals(1):
... print(ind.x)
...
5.0
6.0
7.0
8.0
9.0
>>> # Access individuals in VSPs
>>> pop.setVirtualSplitter(sim.InfoSplitter(cutoff=[3, 7, 17], field='x'))
>>> for ind in pop.individuals([1, 1]):
... print(ind.x)
...
5.0
6.0
>>> # Access all individuals in all ancestral generations
>>> print([ind.x for ind in pop.allIndividuals()])
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0]
>>> # or only specified subpopulations or ancestral generations
>>> print([ind.x for ind in pop.allIndividuals(subPops=[(0,2), (1,3)], ancGens=1)])
[10.0, 11.0, 12.0, 13.0, 14.0, 17.0, 18.0, 19.0]
>>>
>>> # Access individuals in ancetral generations
>>> pop.ancestor(5, 1).x # absolute index
15.0
>>> pop.ancestor(0, 1, 1).x # relative index
15.0
>>> # Or make ancestral generation the current generation and use 'individual'
>>> pop.useAncestralGen(1)
>>> pop.individual(5).x # absolute index
15.0
>>> pop.individual(0, 1).x # relative index
15.0
>>> # 'ancestor' can still access the 'present' (generation 0) generation
>>> pop.ancestor(5, 0).x
5.0
>>> # access individual by ID
>>> pop.addInfoFields('ind_id')
>>> sim.tagID(pop)
>>> [int(ind.ind_id) for ind in pop.individuals()]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> # access individual by ID. Note that individual 12 is in the parental generation
>>> pop.indByID(12).x
1.0
now exiting runScriptInteractively...
Although it is easy to access individuals in a population, it is often more
efficient to access genotypes and information fields in batch mode. For example,
functions genotype()
andsetGenotype()
can read/write genotype of all
individuals in a population or (virtual) subpopulation, functions indInfo()
and setIndInfo()
can read/write certain information fields in a population
or (virtual) subpopulation. The write functions work in a circular manner in the
sense that provided values are reused if they are not enough to fill all
genotypes or information fields. Example batchAccess
demonstrates the use of such functions.
Example: Access Individual properties in batch mode
>>> import simuPOP as sim
>>> import random
>>> pop = sim.Population(size=[4, 6], loci=2, infoFields='x')
>>> pop.setIndInfo([random.randint(0, 10) for x in range(10)], 'x')
>>> pop.indInfo('x')
(7.0, 5.0, 8.0, 10.0, 7.0, 0.0, 8.0, 4.0, 4.0, 10.0)
>>> pop.setGenotype([0, 1, 2, 3], 0)
>>> pop.genotype(0)
[0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3]
>>> pop.setVirtualSplitter(sim.InfoSplitter(cutoff=[3], field='x'))
>>> pop.setGenotype([0]) # clear all values
>>> pop.setGenotype([5, 6, 7], [1, 1])
>>> pop.indInfo('x', 1)
(7.0, 0.0, 8.0, 4.0, 4.0, 10.0)
>>> pop.genotype(1)
[5, 6, 7, 5, 0, 0, 0, 0, 6, 7, 5, 6, 7, 5, 6, 7, 5, 6, 7, 5, 6, 7, 5, 6]
now exiting runScriptInteractively...
Attach arbitrary auxillary information using information fields¶
Information fields are usually set during population creation, using the
infoFields
parameter of the population constructor. It can also be set or
added using functions setInfoFields, addInfoField
and addInfoFields
.
Example popInfo demonstrates how to read and write information
fields from an individual, or from a population in batch mode. Note that
functions Population.indInfo
and Population.setIndInfo
can be
applied to (virtual) subpopulation using a optional parameter subPop.
Example: Add and use of information fields in a population
>>> import simuPOP as sim
>>> pop = sim.Population(10)
>>> pop.setInfoFields(['a', 'b'])
>>> pop.addInfoFields('c')
>>> pop.addInfoFields(['d', 'e'])
>>> pop.infoFields()
('a', 'b', 'c', 'd', 'e')
>>> #
>>> # information fields can be accessed in batch mode
>>> pop.setIndInfo([1], 'c')
>>> # as well as individually.
>>> for ind in pop.individuals():
... ind.e = ind.c + 1
...
>>> print(pop.indInfo('e'))
(2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0)
now exiting runScriptInteractively...
Keep track of ancestral generations¶
A simuPOP population usually holds individuals in one generation. During evolution, an offspring generation will replace the parental generation and become the present generation (population), after it is populated from a parental population. The parental generation is discarded.
This is usually enough when only the present generation is of interest. However, parental generations can provide useful information on how genotype and other information are passed from parental to offspring generations. simuPOP provides a mechanism to store and access arbitrary number of ancestral generations in a population object. Applications of this feature include pedigree tracking, reconstruction, and pedigree ascertainments.
A parameter ancGen
is used to specify how many generations a population
object can store (which is usually called the ancestral depth of a
population). This parameter is default to 0
, meaning keeping no ancestral
population. You can specify a positive number n
to store n most recent
generations; or -1
to store all generations. Of course, storing all
generations during an evolutionary process is likely to exhaust the RAM of your
computer quickly.
Several member functions can be used to manipulate ancestral generations:
ancestralGens()
returns the number of ancestral generations stored in a population.setAncestralDepth(depth)
resets the number of generations a population can store.push(pop)
will push populationpop
into the current population.pop
will become the current generation, and the current generation will either be removed (if ancGen == 0), or become the parental generation of pop. The greatest ancestral generation may be removed. This function is rarely used because populations with ancestral generations are usually created during an evolutionary process.useAncestralGen(idx)
set the present generation toidx
generation.idx
= 1
for the parental generation,2
for grand-parental, …, and0
for the present generation. This is useful because most population functions act on the present generation. You should always callsetAncestralPop(0)
after you examined the ancestral generations.
If a population has several ancestral generations, they are referred by their
indexes 0 (the latest generation), 1 (parental generation), … and \(k\)
(top-most ancestral generation) where \(k\) equals to ancestralGens()
.
In many cases, you can retrieve the properties of ancestral generations
directly, using functions such as
popSize(ancGen=-1), subPopSizes(ancGen=-1), subPopSize(subPop, ancGen=-1)
: population and subpopulation sizes of ancestral generationancGen
.ancestor(index, ancGen)
: Get a reference to theindex
individual of ancestral generationancGen
.
However, most population member functions work at the current generation so you
will need to switch to an ancestral generation using function
useAncestralGen()
if you would like to manipulate an ancestral generation.
For example, you can remove the second subpopulation of the parental generation
using functions:
pop.useAncestralGen(1)
pop.removeSubPops(1)
A typical use of ancestral generations is demonstrated in example extract. In this example, a population is created and is initialized with
allele frequency 0.5. Its ancestral depth is set to 2 at the beginning of
generation 18 so that it can hold parental generations at generation 18 and 19.
The allele frequency at each generation is calculated and displayed, both during
evolution using a Stat
operator, and after evolution using the function
form this operator. Note that setting the ancestral depth at the end of an
evolutionary process is a common practice because we are usually only interested
in the last few generations.
Example: Ancestral populations
>>> import simuPOP as sim
>>> pop = sim.Population(500, loci=1, ancGen=2)
>>> pop.evolve(
... initOps=[
... sim.InitSex(),
... sim.InitGenotype(freq=[0.5, 0.5])
... ],
... matingScheme = sim.RandomMating(),
... postOps=[
... sim.Stat(alleleFreq=0, begin=-3),
... sim.PyEval(r"'%.3f\n' % alleleFreq[0][0]", begin=-3)
... ],
... gen = 20
... )
0.495
0.510
0.506
20
>>> # information
>>> pop.ancestralGens()
2
>>> pop.popSize(ancGen=1)
500
>>> pop.setVirtualSplitter(sim.SexSplitter())
>>> # number of males in the current and parental generation
>>> pop.subPopSize((0,0)), pop.subPopSize((0,0), ancGen=1)
(254, 249)
>>> # start from current generation
>>> for i in range(pop.ancestralGens(), -1, -1):
... pop.useAncestralGen(i)
... sim.stat(pop, alleleFreq=0)
... print('%d %.3f' % (i, pop.dvars().alleleFreq[0][0]))
...
2 0.495
1 0.510
0 0.506
>>> # restore to the current generation
>>> pop.useAncestralGen(0)
now exiting runScriptInteractively...
Change genotypic structure of a population¶
Several functions are provided to remove, add empty loci or chromosomes, and to
merge loci or chromosomes from another population. They can be used to trim
unneeded loci, expand existing population or merge two populations. Example
extract demonstrates how to use these populations. Note that
function Population.addLociFrom
by default merges chromosomes one by one
according to chromosome index. If byName
is set to True, it will try to
match chromosomes by name and merge them. This example also demonstrates the use
of DBG_WARNING
flag, which will trigger a warning message when chromosomes
with different names are merged.
Example: Add and remove loci and chromosomes
>>> import simuOpt
>>> simuOpt.setOptions(debug='DBG_WARNING')
>>> import simuPOP as sim
Turn on debug 'DBG_WARNING'
>>> pop = sim.Population(10, loci=3, chromNames=['chr1'])
>>> # 1 1 1,
>>> pop.setGenotype([1])
>>> # 1 1 1, 0 0 0
>>> pop.addChrom(lociPos=[0.5, 1, 2], lociNames=['rs1', 'rs2', 'rs3'],
... chromName='chr2')
>>> pop1 = sim.Population(10, loci=3, chromNames=['chr3'],
... lociNames=['rs4', 'rs5', 'rs6'])
>>> # 2 2 2,
>>> pop1.setGenotype([2])
>>> # 1 1 1, 0 0 0, 2 2 2
>>> pop.addChromFrom(pop1)
>>> # 1 1 1, 0 0 0, 2 0 2 2 0
>>> pop.addLoci(chrom=[2, 2], pos=[1.5, 3.5], lociNames=['rs7', 'rs8'])
(7, 10)
>>> # 1 1 1, 0 0 0, 2 0 2 0
>>> pop.removeLoci(8)
>>> # loci names can also be used.
>>> pop.removeLoci(['rs1', 'rs7'])
>>> sim.dump(pop)
Ploidy: 2 (diploid)
Chromosomes:
1: chr1 (AUTOSOME, 3 loci)
(1), (2), (3)
2: chr2 (AUTOSOME, 2 loci)
rs2 (1), rs3 (2)
3: chr3 (AUTOSOME, 3 loci)
rs4 (1), rs6 (3), rs8 (3.5)
population size: 10 (1 subpopulations with 10 Individuals)
Number of ancestral populations: 0
SubPopulation 0 (), 10 Individuals:
0: MU 111 00 220 | 111 00 220
1: MU 111 00 220 | 111 00 220
2: MU 111 00 220 | 111 00 220
3: MU 111 00 220 | 111 00 220
4: MU 111 00 220 | 111 00 220
5: MU 111 00 220 | 111 00 220
6: MU 111 00 220 | 111 00 220
7: MU 111 00 220 | 111 00 220
8: MU 111 00 220 | 111 00 220
9: MU 111 00 220 | 111 00 220
>>> # add loci from another population
>>> pop2 = sim.Population(10, loci=2, lociPos=[0.1, 2.2], chromNames='chr3')
>>> pop.addLociFrom(pop2)
WARNING: Chromosome 'chr3' is merged to chromosome 'chr1'.
>>> pop.addLociFrom(pop2, byName=2)
>>> sim.dump(pop, genotype=False)
Ploidy: 2 (diploid)
Chromosomes:
1: chr1 (AUTOSOME, 5 loci)
(0.1), (1), (2), (2.2), (3)
2: chr2 (AUTOSOME, 2 loci)
rs2 (1), rs3 (2)
3: chr3 (AUTOSOME, 5 loci)
(0.1), rs4 (1), (2.2), rs6 (3), rs8 (3.5)
population size: 10 (1 subpopulations with 10 Individuals)
Number of ancestral populations: 0
now exiting runScriptInteractively...
Remove or extract individuals and subpopulations from a population¶
Functions Population.removeIndividuals
and
Population.removeSubPops
remove selected individuals or groups of
individuals from a population. Functions Population.extractIndividuals
and Population.extractSubPops
extract individuals and subpopulations
from an existing population and form a new one.
Functions removeIndividauls
and extractIndividuals
could be used to
remove or extract individuals from the present generation by indexes or from all
ancestral generations by IDs or a Python filter function. This function should
accept parameter ind
or one or more information fields. simuPOP will pass
individual for parameter ind
, and values at specified information fields
(age
in this example) of each individual to this function. The present
population structure will be kept, even if some subpopulations are left empty.
For example, you could remove the first thirty individuals of a population using
pop.removeIndividuals(indexes=range(30))
or remove all individuals at age 20 or 30 using
pop.removeIndividuals(IDs=(20, 30), idField='age')
or remove all individuals with age between 20 and 30 using
pop.removeIndividuals(filter=lambda age: age >=20 and age <=30)
. In the last example, a Python lambda function is defined to avoid the definition of a named function.
Functions removeSubPops
or extractSubPops
could be used to remove or
extract subpopulations, or goups of individuals defined by virtual
subpopulations from a population. The latter case is very interesting because it
could be used to remove or extract individuals with similar properties, such as
all individuals between the ages 40 and 60, as demonstrated in Example
extract.
Example: Extract individuals
>>> import simuPOP as sim
>>> import random
>>> pop = sim.Population(size=[200, 200], loci=[5, 5], infoFields='age')
>>> sim.initGenotype(pop, genotype=range(10))
>>> sim.initInfo(pop, lambda: random.randint(0,75), infoFields='age')
>>> pop.setVirtualSplitter(sim.InfoSplitter(field='age', cutoff=[20, 60]))
>>> # remove individuals
>>> pop.removeIndividuals(indexes=range(0, 300, 10))
>>> print(pop.subPopSizes())
(180, 190)
>>> # remove individuals using IDs
>>> pop.setIndInfo([1, 2, 3, 4], field='age')
>>> pop.removeIndividuals(IDs=[2, 4], idField='age')
>>> # remove indiviuals using a filter function
>>> sim.initSex(pop)
>>> pop.removeIndividuals(filter=lambda ind: ind.sex() == sim.MALE)
>>> print([pop.individual(x).sex() for x in range(8)])
[2, 2, 2, 2, 2, 2, 2, 2]
>>> #
>>> # remove subpopulation
>>> pop.removeSubPops(1)
>>> print(pop.subPopSizes())
(56,)
>>> # remove virtual subpopulation (people with age between 20 and 60)
>>> pop.removeSubPops([(0, 1)])
>>> print(pop.subPopSizes())
(56,)
>>> # extract another virtual subpopulation (people with age greater than 60)
>>> pop1 = pop.extractSubPops([(0,2)])
>>> sim.dump(pop1, structure=False, max=10)
SubPopulation 0 (), 0 Individuals:
now exiting runScriptInteractively...
Store arbitrary population information as population variables¶
Each simuPOP population has a Python dictionary that can be used to store
arbitrary Python variables. These variables are usually used by various
operators to share information between them. For example, the Stat
operator
calculates population statistics and stores the results in this Python
dictionary. Other operators such as the PyEval
and TerminateIf
read from this dictionary and act upon its information.
simuPOP provides two functions, namely Population.vars()
and
Population.dvars()
to access a population dictionary. These functions return
the same dictionary object but dvars
() returns a wrapper class so that
you can access this dictionary as attributes. For example,
pop.vars()['alleleFreq'][0]
is equivalent to pop.dvars().alleleFreq[0]
.
Because dictionary subPop[spID]
is frequently used by operators to store
variables related to a particular (virtual) subpopulation, function
pop.vars(subPop)
is provided as a shortcut to
pop.vars()['subPop'][spID]
. Example popVars demonstrates
how to set and access population variables.
Example: population variables
>>> import simuPOP as sim
>>> from pprint import pprint
>>> pop = sim.Population(100, loci=2)
>>> sim.initGenotype(pop, freq=[0.3, 0.7])
>>> print(pop.vars()) # No variable now
{}
>>> pop.dvars().myVar = 21
>>> print(pop.vars())
{'myVar': 21}
>>> sim.stat(pop, popSize=1, alleleFreq=0)
>>> # pprint prints in a less messy format
>>> pprint(pop.vars())
{'alleleFreq': {0: defdict({0: 0.275, 1: 0.725})},
'alleleNum': {0: defdict({0: 55.0, 1: 145.0})},
'myVar': 21,
'popSize': 100,
'subPopSize': [100]}
>>> # print number of allele 1 at locus 0
>>> print(pop.vars()['alleleNum'][0][1])
145.0
>>> # use the dvars() function to access dictionary keys as attributes
>>> print(pop.dvars().alleleNum[0][1])
145.0
>>> print(pop.dvars().alleleFreq[0])
defdict({0: 0.275, 1: 0.725})
now exiting runScriptInteractively...
It is important to understand that this dictionary forms a local namespace
in which Python expressions can be evaluated. This is the basis of how
expression-based operators work. For example, the PyEval
operator in
example simple_example evaluates expression
``'%.2f\\t' % LD[0][1]''
in each population’s local namespace when it is
applied to that population. This yields different results for different
population because their LD values are different. In addition to Python
expressions, Python statements can also be executed in the local namespace of a
population, using the stmts
parameter of the PyEval
or
PyExec
operator. Example expression demonstrates
the use of a simuPOP terminator, which terminates the evolution of a population
when its expression is evaluated as True
. Note that The evolve()
function of this example does not specify how many generations to evolve so it
will stop only after all replicates stop. The return value of this function
indicates how many generations each replicate has evolved. This example also
demonstrates how to run multiple replicates of an evolutionary process, which we
will discuss in detail latter.
Example: Expression evaluation in the local namespace of a population
>>> import simuPOP as sim
>>> simu = sim.Simulator(sim.Population(100, loci=1), rep=5)
>>> simu.evolve(
... initOps=[
... sim.InitSex(),
... sim.InitGenotype(freq=[0.5, 0.5])
... ],
... matingScheme = sim.RandomMating(),
... postOps=[
... sim.Stat(alleleFreq=0),
... sim.TerminateIf('len(alleleFreq[0]) == 1')
... ]
... )
(129, 1540, 180, 247, 242)
now exiting runScriptInteractively...
Save and load a population¶
simuPOP populations can be saved to and loaded from disk files using
Population.save(file)
member function and global function
loadPopulation
. Virtual splitters are not saved because they are
considered as runtime definitions. Although files in any extension can be used,
extension .pop
is recommended. Example savePop demonstrates
how to save and load a population in the native simuPOP format.
Example: Save and load a population
>>> import simuPOP as sim
>>> pop = sim.Population(100, loci=5, chromNames=['chrom1'])
>>> pop.dvars().name = 'my sim.Population'
>>> pop.save('sample.pop')
>>> pop1 = sim.loadPopulation('sample.pop')
>>> pop1.chromName(0)
'chrom1'
>>> pop1.dvars().name
'my sim.Population'
now exiting runScriptInteractively...
The native simuPOP format is portable across different platforms but is not
human readable and is not recognized by other applications. If you need to save
a simuPOP population in a format that is recognizable by a particular software,
you can use functions importPopulation
, export
, and operator
Exporter
if you would like to export populations during evolution. These
functions are defined in module simuPOP.utils
.
Import and export datasets in unsupported formats *¶
simuPOP provides a few utility functions to import and export populations in
common formats such as GENEPOP, Phylip, and STRUCTURE (see chapter utility
modules for details). If you need to import data from a file in a format that is
not currently supported, you generally need to first scan the file for
information such as number and names of chromosomes, loci, alleles,
subpopulation, and individuals. After you create a population without genotype
information from these parameters, you can scan the file for the second time and
fill the population with genotypes and other information. Example
importData demonstrates how to define a function to import
from a file that is saved by function saveCSV
.
Example: Import a population from another file format
>>> import simuPOP as sim
>>> def importData(filename):
... 'Read data from ``filename`` and create a population'
... data = open(filename)
... header = data.readline()
... fields = header.split(',')
... # columns 1, 3, 5, ..., without trailing '_1'
... names = [fields[x].strip()[:-2] for x in range(1, len(fields), 2)]
... popSize = 0
... alleleNames = set()
... for line in data.readlines():
... # get all allele names
... alleleNames |= set([x.strip() for x in line.split(',')[1:]])
... popSize += 1
... # create a population
... alleleNames = list(alleleNames)
... pop = sim.Population(size=popSize, loci=len(names), lociNames=names,
... alleleNames=alleleNames)
... # start from beginning of the file again
... data.seek(0)
... # discard the first line
... data.readline()
... for ind, line in zip(pop.individuals(), data.readlines()):
... fields = [x.strip() for x in line.split(',')]
... sex = sim.MALE if fields[0] == '1' else sim.FEMALE
... ploidy0 = [alleleNames.index(fields[x]) for x in range(1, len(fields), 2)]
... ploidy1 = [alleleNames.index(fields[x]) for x in range(2, len(fields), 2)]
... ind.setGenotype(ploidy0, 0)
... ind.setGenotype(ploidy1, 1)
... ind.setSex(sex)
... # close the file
... data.close()
... return pop
...
>>> from simuPOP.utils import saveCSV
>>> pop = sim.Population(size=[10], loci=[3, 2], lociNames=['rs1', 'rs2', 'rs3', 'rs4', 'rs5'],
... alleleNames=['A', 'B'])
>>> sim.initSex(pop)
>>> sim.initGenotype(pop, freq=[0.5, 0.5])
>>> # output sex but not affection status.
>>> saveCSV(pop, filename='sample.csv', affectionFormatter=None,
... sexFormatter={sim.MALE:1, sim.FEMALE:2})
>>> # have a look at the file
>>> print(open('sample.csv').read())
sex, rs1_1, rs1_2, rs2_1, rs2_2, rs3_1, rs3_2, rs4_1, rs4_2, rs5_1, rs5_2
2, B, B, B, B, B, A, A, B, B, A
2, B, A, B, A, B, A, A, A, A, B
1, B, B, B, B, B, B, B, B, B, A
1, B, A, B, A, B, B, B, A, A, A
1, B, B, B, B, B, B, A, A, B, A
1, A, B, B, A, B, B, B, A, B, B
1, B, B, B, B, B, B, B, B, A, A
2, B, B, A, A, B, A, A, A, B, A
2, A, B, B, B, A, B, B, A, A, B
2, B, A, A, B, A, A, B, B, B, A
>>> pop1 = importData('sample.csv')
>>> sim.dump(pop1)
Ploidy: 2 (diploid)
Chromosomes:
1: (AUTOSOME, 5 loci)
rs1 (1), rs2 (2), rs3 (3), rs4 (4), rs5 (5)
population size: 10 (1 subpopulations with 10 Individuals)
Number of ancestral populations: 0
SubPopulation 0 (), 10 Individuals:
0: FU BBBAB | BBABA
1: FU BBBAA | AAAAB
2: MU BBBBB | BBBBA
3: MU BBBBA | AABAA
4: MU BBBAB | BBBAA
5: MU ABBBB | BABAB
6: MU BBBBA | BBBBA
7: FU BABAB | BAAAA
8: FU ABABA | BBBAB
9: FU BAABB | ABABA
now exiting runScriptInteractively...
Unless there are specific requirements in the order and labeling of individuals,
exporting a simuPOP population is usually straightforward. Functions that are
useful in such occasions include structural functions
Population.numSubPop
(), Population.subPopName
,
Population.popSize()
and Population.subPopSizes
(), and
individual access functions Population.individual
() and
Population.individuals
() and individual population access functions
such as Individual.allele
() and Individual.info
(). Function
saveFSTAT
in the cookbook module fstatUtil
or saveCSV
in module
simuPOP.utils
are good examples you can follow.