Hybrid and Python operators¶
Hybrid operators¶
Despite the large number of built-in operators, it is obviously not possible to implement every genetics models available. For example, although simuPOP provides several penetrance models, a user may want to try a customized one. In this case, one can use a hybrid operator.
A hybrid operator is an operator that calls a user-defined function when its applied to a population. The number and meaning of input parameters and return values vary from operator to operator. For example, a hybrid mutator sends a to- be-mutated allele to a user-defined function and use its return value as a mutant allele. A hybrid selector uses the return value of a user defined function as individual fitness. Such an operator handles the routine part of the work (e.g. scan through a chromosome and determine which allele needs to be mutated), and leave the creative part to users. Such a mutator can be used to implement complicated genetic models such as an asymmetric stepwise mutation model for microsatellite markers.
simuPOP operators use parameter names to determine which information should be
passed to a user-defined function. For example, a hybrid quantitative trait
operator recognizes parameters ind
, geno
, gen
and names of
information fields such as smoking
. If your model depends on genotype, you
could provide a function with parameter geno (e.g. func(geno)
); if your
model depends on smoking and genotype, you could provide a function with
parameters geno and smoking (e.g. func(geno, smoking)
); if you model depends
on individual sex, you can use a function that passes the whole individual (e.g.
func(ind)
) so that you could check individual sex. When a hybrid operator is
applied to a population, it will check the parameter names of provided Python
function and send requested information automatically.
For example, Example hybridOperator defines a three-
locus heterogeneity penetrance model Risch1990 that yields positive penetrance
only when at least two disease susceptibility alleles are available. The
underlying mechanism of this operator is that for each individual, simuPOP will
collect genotype at specified loci (parameter loci
) and send them to
function myPenetrance
and evaluate. The return values are used as the
penetrance value of the individual, which is then interpreted as the probability
that this individual will become affected.
Example: Use a hybrid operator
>>> import simuPOP as sim
>>> def myPenetrance(geno):
... 'A three-locus heterogeneity penetrance model'
... if sum(geno) < 2:
... return 0
... else:
... return sum(geno)*0.1
...
>>> pop = sim.Population(1000, loci=[20]*3)
>>> pop.evolve(
... initOps=[
... sim.InitSex(),
... sim.InitGenotype(freq=[0.8, 0.2])
... ],
... matingScheme=sim.RandomMating(),
... postOps=[
... sim.PyPenetrance(func=myPenetrance, loci=[10, 30, 50]),
... sim.Stat(numOfAffected=True),
... sim.PyEval(r"'%d: %d\n' % (gen, numOfAffected)")
... ],
... gen = 5
... )
0: 97
1: 96
2: 78
3: 95
4: 80
5
now exiting runScriptInteractively...
Python operator PyOperator
*¶
If hybrid operators are still not flexible enough, you can always resort to a
pure-Python operator PyOperator
. This operator has full access to the
evolving population (or parents and offspring when aplied during-mating), and
can therefore perform arbitrary operations.
A PyOperator
that is applied pre- or post- mating expects a function
with one or both parameters pop
and param
, wherepop
is the
population being applied, and param
is optional, depending on whether or not
a parameter is passed to the PyOperator
() constructor. Function
func
can perform arbitrary action to pop
and must return True
or
False
. The evolution of pop will be stopped if this function returns
False. This is essentially how operator TerminateIf
works.
Alternatively, this callback function can accept ind
as one of the
parameters. In this case, the function will be called for all individuals or
individuals in specified (virtual) subpopulations. Individuals will be removed
from the populaton if this function returns False.
Example PyOperator defines such a function. It accepts a
cutoff value and two mutation rates as parameters. It then calculate the
frequency of allele 1 at each locus and apply a two-allele model at high
mutation rate if the frequency is lower than the cutoff and a low mutation rate
otherwise. The kAlleleMutate
function is the function form of a mutator
KAlleleMutator
(see Section subsec_Function_form for details).
Example: A frequency dependent mutation operator
import simuPOP as sim
def dynaMutator(pop, param):
'''This mutator mutates commom loci with low mutation rate and rare
loci with high mutation rate, as an attempt to raise allele frequency
of rare loci to an higher level.'''
# unpack parameter
(cutoff, mu1, mu2) = param;
sim.stat(pop, alleleFreq=range(pop.totNumLoci()))
for i in range(pop.totNumLoci()):
# Get the frequency of allele 1 (disease allele)
if pop.dvars().alleleFreq[i][1] < cutoff:
sim.kAlleleMutate(pop, k=2, rates=mu1, loci=[i])
else:
sim.kAlleleMutate(pop, k=2, rates=mu2, loci=[i])
return True
Example usePyOperator demonstrates how to use this
operator. It first initializes the population using two InitGenotype
operators that initialize loci with different allele frequencies. It applies a
PyOperator
with function dynaMutator
and a tuple of parameters. Allele
frequencies at all loci are printed at generation 0
, 10
, 20
, and
30
. Note that this PyOperator
is applied at to the parental
generation so allele frequencies have to be recalculated to be used by post-
mating operator PyEval
.
Example: Use a PyOperator during evolution
>>> pop = sim.Population(size=10000, loci=[2, 3])
>>> pop.evolve(
... initOps=[
... sim.InitSex(),
... sim.InitGenotype(freq=[.99, .01], loci=[0, 2, 4]),
... sim.InitGenotype(freq=[.8, .2], loci=[1, 3])
... ],
... preOps=sim.PyOperator(func=dynaMutator, param=(.2, 1e-2, 1e-5)),
... matingScheme=sim.RandomMating(),
... postOps=[
... sim.Stat(alleleFreq=range(5), step=10),
... sim.PyEval(r"' '.join(['%.2f' % alleleFreq[x][1] for x in range(5)]) + '\n'",
... step=10),
... ],
... gen = 31
... )
0.02 0.20 0.02 0.20 0.02
0.11 0.22 0.11 0.20 0.11
0.19 0.21 0.20 0.20 0.18
0.21 0.21 0.22 0.21 0.21
31
now exiting runScriptInteractively...
During-mating Python operator *¶
A PyOperator
can also be applied during-mating. They can be used to
filter out unwanted offspring (by returning False
in a user-defined
function), modify offspring, calculate statistics, or pass additional
information from parents to offspring. Depending the names of parameters of your
function, the Python operator will pass offspring (parameter off
), his or
her parents (parameter dad
and mom
), the whole population (parameter
pop
) and an optional parameter (parameter param
) to this function. For
example, function func(off)
will accept references to an offspring, and
func(off, mom, dad)
will accept references to both offspring and his or her
parents.
Example duringMatingPyOperator demonstrates the use of a during-mating Python operator. This operator rejects an offspring if it has allele 1 at the first locus of the first homologous chromosome, and results in an offspring population without such individuals.
Example: Use a during-mating PyOperator
>>> import simuPOP as sim
>>> def rejectInd(off):
... 'reject an individual if it off.allele(0) == 1'
... return off.allele(0) == 0
...
>>> pop = sim.Population(size=100, loci=1)
>>> pop.evolve(
... initOps=[
... sim.InitSex(),
... sim.InitGenotype(freq=[0.5, 0.5])
... ],
... matingScheme=sim.RandomMating(
... ops=[
... sim.MendelianGenoTransmitter(),
... sim.PyOperator(func=rejectInd)
... ]),
... gen = 1
... )
1
>>> # You should see no individual with allele 1 at locus 0, ploidy 0.
>>> pop.genotype()[:20]
[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
now exiting runScriptInteractively...
Download pyDuringMatingOperator.py
PyOperator
is the most powerful operator in simuPOP and has been widely
used, for example, to calculate statistics and is not supported by the
Stat
() operator, to examine population property during evolution, or
prepare populations for a special mating scheme. However, because
PyOperator
works in the Python interpreter, it is expected that it runs
slower than operators that are implemented at the C/C++ level. If performance
becomes an issue, you can re-implement part or all the operator in C++. Section
subsec_Using_C++ describes how to do this.
Define your own operators *¶
PyOperator
is a Python class so you can derive your own operator from
this operator. The tricky part is that the constructor of the derived operator
needs to call the __init__
function of PyOperator
will proper
functions. This technique has been used by simuPOP in a number of occasions. For
example, the VarPlotter
operator defined in plotter.py
is derived from
PyOperator
. This class encapsulates several different plot class that
uses rpy
to plot python expressions. One of the plotters is passed to the
func parameter of PyOperator.__init__
so that it can be called when this
operator is applied.
Example sequentialSelfing rewrites the
dynaMutator
defined in Example PyOperator into a derived
operator. The parameters are now passed to the constructor of dynaMutator
and are saved as member variables. A member function mutate
is defined and
is passed to the constructor of PyOperator
. Other than making
dynaMutator
look like a real simuPOP operator, this example does not show a
lot of advantage over defining a function. However, when the operator gets
complicated (as in the case for VarPlotter
), the object oriented
implementation will prevail.
Example: Define a new Python operator
>>> import simuPOP as sim
>>> class dynaMutator(sim.PyOperator):
... '''This mutator mutates commom loci with low mutation rate and rare
... loci with high mutation rate, as an attempt to raise allele frequency
... of rare loci to an higher level.'''
... def __init__(self, cutoff, mu1, mu2, *args, **kwargs):
... self.cutoff = cutoff
... self.mu1 = mu1
... self.mu2 = mu2
... sim.PyOperator.__init__(self, func=self.mutate, *args, **kwargs)
... #
... def mutate(self, pop):
... sim.stat(pop, alleleFreq=range(pop.totNumLoci()))
... for i in range(pop.totNumLoci()):
... # Get the frequency of allele 1 (disease allele)
... if pop.dvars().alleleFreq[i][1] < self.cutoff:
... sim.kAlleleMutate(pop, k=2, rates=self.mu1, loci=[i])
... else:
... sim.kAlleleMutate(pop, k=2, rates=self.mu2, loci=[i])
... return True
...
>>> pop = sim.Population(size=10000, loci=[2, 3])
>>> pop.evolve(
... initOps=[
... sim.InitSex(),
... sim.InitGenotype(freq=[.99, .01], loci=[0, 2, 4]),
... sim.InitGenotype(freq=[.8, .2], loci=[1, 3])
... ],
... preOps=dynaMutator(cutoff=.2, mu1=1e-2, mu2=1e-5),
... matingScheme=sim.RandomMating(),
... postOps=[
... sim.Stat(alleleFreq=range(5), step=10),
... sim.PyEval(r"' '.join(['%.2f' % alleleFreq[x][1] for x in range(5)]) + '\n'",
... step=10),
... ],
... gen = 31
... )
0.02 0.20 0.02 0.20 0.02
0.11 0.22 0.11 0.20 0.11
0.19 0.21 0.20 0.20 0.18
0.21 0.21 0.22 0.21 0.21
31
now exiting runScriptInteractively...
New during-mating operators can be defined similarly. They are usually used to define customized genotype transmitters. Section subsec_Customized_genotype_transmitter will describe this feature in detail.