Individual, Population, pedigree and Simulator¶
class GenoStruTrait¶
-
class
GenoStruTrait
¶ All individuals in a population share the same genotypic properties such as number of chromosomes, number and position of loci, names of markers, chromosomes, and information fields. These properties are stored in this
GenoStruTrait
class and are accessible from bothIndividual
andPopulation
classes. Currently, a genotypic structure consists of- Ploidy, namely the number of homologous sets of chromosomes, of a population. Haplodiploid population is also supported.
- Number of chromosomes and number of loci on each chromosome.
- Positions of loci, which determine the relative distance between
loci on the same chromosome. No unit is assumed so these
positions can be ordinal (
1
,2
,3
, …, the default), in physical distance (bp
,kb
ormb
), or in map distance (e.g.centiMorgan
) depending on applications. - Names of alleles, which can either be shared by all loci or be specified for each locus.
- Names of loci and chromosomes.
- Names of information fields attached to each individual.
In addition to basic property access functions, this class provides some utility functions such as
locusByName
, which looks up a locus by its name.-
GenoStruTrait
()¶ A
GenoStruTrait
object is created with the construction of aPopulation
object and cannot be initialized directly.
-
absLocusIndex
(chrom, locus)¶ return the absolute index of locus locus on chromosome chrom. c.f.
chromLocusPair
.
-
alleleName
(allele, locus=0)¶ return the name of allele allele at lcous specified by the alleleNames parameter of the
Population
function. locus could be ignored if alleles at all loci share the same names. If the name of an allele is unspecified, its value ('0'
,'1'
,'2'
, etc) is returned.
-
alleleNames
(locus=0)¶ return a list of allele names at locus given by the alleleNames parameter of the
Population
function. locus could be ignored if alleles at all loci share the same names. This list does not have to cover all possible allele states of a population soalleleNames()[``*allele*
]`` might fail (usealleleNames(``*allele*
)`` instead).
-
chromBegin
(chrom)¶ return the index of the first locus on chromosome chrom.
-
chromByName
(name)¶ return the index of a chromosome by its name.
-
chromEnd
(chrom)¶ return the index of the last locus on chromosome chrom plus 1.
-
chromLocusPair
(locus)¶ return the chromosome and relative index of a locus using its absolute index locus. c.f.
absLocusIndex
.
-
chromName
(chrom)¶ return the name of a chromosome chrom.
-
chromNames
()¶ return a list of the names of all chromosomes.
-
chromType
(chrom)¶ return the type of a chromosome chrom (
CUSTOMIZED
,AUTOSOME
,CHROMOSOME_X
,CHROMOSOME_Y
orMITOCHONDRIAL
.
-
chromTypes
()¶ return the type of all chromosomes (
CUSTOMIZED
,AUTOSOME
,CHROMOSOME_X
,CHROMOSOME_Y
, orMITOCHONDRIAL
).
-
indexesOfLoci
(loci=ALL_AVAIL)¶ return the indexes of loci with positions positions (list of (chr, pos) pairs). Raise a
ValueError
if any of the loci cannot be found.
-
infoField
(idx)¶ return the name of information field idx.
-
infoFields
()¶ return a list of the names of all information fields of the population.
-
infoIdx
(name)¶ return the index of information field name. Raise an
IndexError
if name is not one of the information fields.
-
lociByNames
(names)¶ return the indexes of loci with names names. Raise a
ValueError
if any of the loci cannot be found.
-
lociDist
(locus1, locus2)¶ Return the distance between loci locus1 and locus2 on the same chromosome. A negative value will be returned if locus1 is after locus2.
-
lociNames
()¶ return the names of all loci specified by the lociNames parameter of the
Population
function. An empty list will be returned if lociNames was not specified.
-
lociPos
()¶ return the positions of all loci, specified by the lociPos prameter of the
Population
function. The default positions are 1, 2, 3, 4, … on each chromosome.
-
locusByName
(name)¶ return the index of a locus with name name. Raise a
ValueError
if no locus is found. Note that empty strings are used for loci without name but you cannot lookup such loci using this function.
-
locusName
(locus)¶ return the name of locus locus specified by the lociNames parameter of the
Population
function. An empty string will be returned if no name has been given to locus locus.
-
locusPos
(locus)¶ return the position of locus locus specified by the lociPos parameter of the
Population
function.
-
numChrom
()¶ return the number of chromosomes.
-
numLoci
(chrom)¶ return the number of loci on chromosome chrom.
-
numLoci
() return a list of the number of loci on all chromosomes.
-
ploidy
()¶ return the number of homologous sets of chromosomes, specified by the ploidy parameter of the
Population
function. Return 2 for a haplodiploid population because two sets of chromosomes are stored for both males and females in such a population.
-
ploidyName
()¶ return the ploidy name of this population, can be one of
haploid
,diploid
,haplodiploid
,triploid
,tetraploid
or#-ploid
where#
is the ploidy number.
-
totNumLoci
()¶ return the total number of loci on all chromosomes.
class Individual¶
-
class
Individual
¶ A
Population
consists of individuals with the same genotypic structure. AnIndividual
object cannot be created independently, but refences to inidividuals can be retrieved using member functions of aPopulation
object. In addition to structural information shared by all individuals in a population (provided by classGenoStruTrait
), theIndividual
class provides member functions to get and set genotype, sex, affection status and information fields of an individual.Genotypes of an individual are stored sequentially and can be accessed locus by locus, or in batch. The alleles are arranged by position, chromosome and ploidy. That is to say, the first allele on the first chromosome of the first homologous set is followed by alleles at other loci on the same chromsome, then markers on the second and later chromosomes, followed by alleles on the second homologous set of the chromosomes for a diploid individual. A consequence of this memory layout is that alleles at the same locus of a non-haploid individual are separated by
Individual::totNumLoci()
loci. It is worth noting that access to invalid chromosomes, such as the Y chromosomes of female individuals, is not restricted.-
Individual
()¶ An
Individual
object cannot be created directly. It has to be accessed from aPopulation
object using functions such asPopulation::Individual(idx)
.
-
affected
()¶ Return
True
if this individual is affected.
-
allele
(idx, ploidy=-1, chrom=-1)¶ return the current allele at a locus, using its absolute index idx. If a ploidy ploidy and/or a chromosome indexes is given, idx is relative to the beginning of specified homologous copy of chromosomes (if chrom=-1) or the beginning of the specified homologous copy of specified chromosome (if chrom >= 0).
-
alleleChar
(idx, ploidy=-1, chrom=-1)¶ return the name of
allele(idx, ploidy, chrom)
. If idx is invalid (e.g. second homologus copy of chromosome Y), ‘_’ is returned.
-
alleleLineage
(idx, ploidy=-1, chrom=-1)¶ return the lineage of the allele at a locus, using its absolute index idx. If a ploidy ploidy and/or a chromosome indexes is given, idx is relative to the beginning of specified homologous copy of chromosomes (if chrom=-1) or the beginning of the specified homologous copy of specified chromosome (if chrom >= 0). This function returns 0 for modules without lineage information.
-
__cmp__
(rhs)¶ a python function used to compare the individual objects
-
genotype
(ploidy=ALL_AVAIL, chroms=ALL_AVAIL)¶ return an editable array (a
carray
object) that represents all alleles of an individual. If ploidy or chroms is given, only alleles on the specified chromosomes and homologous copy of chromosomes will be returned. If multiple chromosomes are specified, there should not be gaps between chromosomes. This function ignores type of chromosomes so it will return unused alleles for sex and mitochondrial chromosomes.
-
info
(field)¶ Return the value of an information field filed (by index or name).
ind.info(name)
is equivalent toind.name
although the function form allows the use of indexes of information fieldes.
-
lineage
(ploidy=ALL_AVAIL, chroms=ALL_AVAIL)¶ return an editable array (a
carray_lineage
object) that represents the lineages of all alleles of an individual. If ploidy or chroms is given, only lineages on the specified chromosomes and homologous copy of chromosomes will be returned. If multiple chromosomes are specified, there should not be gaps between chromosomes. This function ignores type of chromosomes so it will return lineage of unused alleles for sex and mitochondrial chromosomes. ANone
object will be returned for modules without lineage information.
-
mutants
(ploidy=ALL_AVAIL, chroms=ALL_AVAIL)¶ return an itertor that iterate through all mutants (non-zero alleles) of an individual. Each mutant is presented as a tuple of (index, value) where index is the index of mutant ranging from zero to totNumLoci() * ploidy() - 1, so you will have to adjust indexes to check multiple alleles at a locus. If ploidy or chroms is given, only alleles on the specified chromosomes and homologous copy of chromosomes will be iterated. If multiple chromosomes are specified, there should not be gaps between chromosomes. This function ignores type of chromosomes so it will return unused alleles for sex and mitochondrial chromosomes.
-
setAffected
(affected)¶ set affection status to affected (
True
orFalse
).
-
setAllele
(allele, idx, ploidy=-1, chrom=-1)¶ set allele allele to a locus, using its absolute index idx. If a ploidy ploidy and/or a chromosome indexes are given, idx is relative to the beginning of specified homologous copy of chromosomes (if chrom=-1) or the beginning of the specified homologous copy of specified chromosome (if chrom >= 0).
-
setAlleleLineage
(lineage, idx, ploidy=-1, chrom=-1)¶ set lineage lineage to an allele, using its absolute index idx. If a ploidy ploidy and/or a chromosome indexes are given, idx is relative to the beginning of specified homologous copy of chromosomes (if chrom=-1) or the beginning of the specified homologous copy of specified chromosome (if chrom >= 0). This function does nothing for modules without lineage information.
-
setGenotype
(geno, ploidy=ALL_AVAIL, chroms=ALL_AVAIL)¶ Fill the genotype of an individual using a list of alleles geno. If parameters ploidy and/or chroms are specified, alleles will be copied to only all or specified chromosomes on selected homologous copies of chromosomes.
geno
will be reused if its length is less than number of alleles to be filled. This function ignores type of chromosomes so it will set genotype for unused alleles for sex and mitochondrial chromosomes.
-
setInfo
(value, field)¶ set the value of an information field field (by index or name) to value.
ind.setInfo(value, field)
is equivalent toind.field = value
although the function form allows the use of indexes of information fieldes.
-
setLineage
(lineage, ploidy=ALL_AVAIL, chroms=ALL_AVAIL)¶ Fill the lineage of an individual using a list of IDs lineage. If parameters ploidy and/or chroms are specified, lineages will be copied to only all or specified chromosomes on selected homologous copies of chromosomes.
lineage
will be reused if its length is less than number of allelic lineage to be filled. This function ignores type of chromosomes so it will set lineage to unused alleles for sex and mitochondrial chromosomes. It does nothing for modules without lineage information.
-
setSex
(sex)¶ set individual sex to
MALE
orFEMALE
.
-
sex
()¶ return the sex of an individual,
1
for male and2
for female.
-
class Population¶
-
class
Population
¶ A simuPOP population consists of individuals of the same genotypic structure, organized by generations, subpopulations and virtual subpopulations. It also contains a Python dictionary that is used to store arbitrary population variables.
In addition to genotypic structured related functions provided by the
GenoStruTrait
class, the population class provides a large number of member functions that can be used to- Create, copy and compare populations.
- Manipulate subpopulations. A population can be divided into several subpopulations. Because individuals only mate with individuals within the same subpopulation, exchange of genetic information across subpopulations can only be done through migration. A number of functions are provided to access subpopulation structure information, and to merge and split subpopulations.
- Define and access virtual subpopulations. A virtual subpopulation splitter can be assigned to a population, which defines groups of individuals called virtual subpopulations (VSP) within each subpopulation.
- Access individuals individually, or through iterators that iterate through individuals in (virtual) subpopulations.
- Access genotype and information fields of individuals at the
population level. From a population point of view, all genotypes
are arranged sequentially individual by individual. Please refer
to class
Individual
for an introduction to genotype arragement of each individual. - Store and access ancestral generations. A population can save arbitrary number of ancestral generations. It is possible to directly access an ancestor, or make an ancestral generation the current generation for more efficient access.
- Insert or remove loci, resize (shrink or expand) a population, sample from a population, or merge with other populations.
- Manipulate population variables and evaluate expressions in this local namespace.
- Save and load a population.
-
Population
(size=[], ploidy=2, loci=[], chromTypes=[], lociPos=[], ancGen=0, chromNames=[], alleleNames=[], lociNames=[], subPopNames=[], infoFields=[])¶ The following parameters are used to create a population object:
- size
- A list of subpopulation sizes. The length of this list determines the
number of subpopulations of this population. If there is no
subpopulation,
size=[popSize]
can be written assize=popSize
. - ploidy
- Number of homologous sets of chromosomes. Default to
2
(diploid). For efficiency considerations, all chromosomes have the same number of homologous sets, even if some customized chromosomes or some individuals (e.g. males in a haplodiploid population) have different numbers of homologous sets. The first case is handled by setting chromTypes of each chromosome. Only the haplodiploid populations are handled for the second case, for whichploidy=HAPLODIPLOID
should be used. - loci
- A list of numbers of loci on each chromosome. The length of this
parameter determines the number of chromosomes. If there is
only one chromosome,
numLoci
instead of[numLoci]
can be used. - chromTypes
- A list that specifies the type of each chromosome, which can be
AUTOSOME
,CHROMOSOME_X
,CHROMOSOME_Y
, orCUSTOMIZED
. All chromosomes are assumed to be autosomes if this parameter is ignored. Sex chromosome can only be specified in a diploid population where the sex of an individual is determined by the existence of these chromosomes using theXX
(FEMALE
) andXY
(MALE
) convention. Both sex chromosomes have to be available and be specified only once. Because chromosomesX
andY
are treated as two chromosomes, recombination on the pseudo-autosomal regions of the sex chromsomes is not supported.CUSTOMIZED
chromosomes are special chromosomes whose inheritance patterns are undefined. They rely on user- defined functions and operators to be passed from parents to offspring. Multiple customized chromosomes have to be arranged consecutively. - lociPos
- Positions of all loci on all chromosome, as a list of float numbers.
Default to
1
,2
, … etc on each chromosome. lociPos should be arranged chromosome by chromosome. IflociPos
are not in order within a chromosome, they will be re-arranged along with corresponding lociNames (if specified). - ancGen
- Number of the most recent ancestral generations to keep during
evolution. Default to
0
, which means only the current generation will be kept. If it is set to-1
, all ancestral generations will be kept in this population (and exhaust your computer RAM quickly). - chromNames
- A list of chromosome names. Default to
''
for all chromosomes. - alleleNames
- A list or a nested list of allele names. If a list of alleles is
given, it will be used for all loci in this population. For
example,
alleleNames=('A','C','T','G')
gives namesA
,C
,T
, andG
to alleles0
,1
,2
, and3
respectively. If a nested list of names is given, it should specify alleles names for all loci. - lociNames
- A list of names for each locus. It can be empty or a list of unique names for each locus. If loci are not specified in order, loci names will be rearranged according to their position on the chromosome.
- subPopNames
- A list of subpopulation names. All subpopulations will have name
''
if this parameter is not specified. - infoFields
- Names of information fields (named float number) that will be attached to each individual.
-
absIndIndex
(idx, subPop)¶ return the absolute index of an individual idx in subpopulation subPop.
-
addChrom
(lociPos, lociNames=[], chromName="", alleleNames=[], chromType=AUTOSOME)¶ Add chromosome chromName with given type chromType to a population, with loci lociNames inserted at position lociPos. lociPos should be ordered. lociNames and chromName should not exist in the current population. Allele names could be specified for all loci (a list of names) or differently for each locus (a nested list of names), using parameter alleleNames. Empty loci names will be used if lociNames is not specified. The newly added alleles will have zero lineage in modules wiht lineage information.
-
addChromFrom
(pop)¶ Add chromosomes in population pop to the current population. population pop should have the same number of individuals as the current population in the current and all ancestral generations. Chromosomes of pop, if named, should not conflict with names of existing chromosome. This function merges genotypes on the new chromosomes from population
pop
individual by individual.
-
addIndFrom
(pop)¶ Add all individuals, including ancestors, in pop to the current population. Two populations should have the same genotypic structures and number of ancestral generations. Subpopulations in population pop are kept.
-
addInfoFields
(fields, init=0)¶ Add a list of information fields fields to a population and initialize their values to init. If an information field alreay exists, it will be re-initialized.
-
addLoci
(chrom, pos, lociNames=[], alleleNames=[])¶ Insert loci lociNames at positions pos on chromosome chrom. These parameters should be lists of the same length, although names may be ignored, in which case empty strings will be assumed. Single-value input is allowed for parameter chrom and pos if only one locus is added. Alleles at inserted loci are initialized with zero alleles. Note that loci have to be added to existing chromosomes. If loci on a new chromosome need to be added, function
addChrom
should be used. Optionally, allele names could be specified either for all loci (a single list) or each loci (a nested list). This function returns indexes of the inserted loci. Newly inserted alleles will have zero lineage in modules with lineage information.
-
addLociFrom
(pop, byName=False)¶ Add loci from population pop. By default, chromosomes are merged by index and names of merged chromosomes of population pop will be ignored (merge of two chromosomes with different names will yield a warning). If byName is set to
True
, chromosomes in pop will be merged to chromosomes with identical names. Added loci will be inserted according to their position. Their position and names should not overlap with any locus in the current population. population pop should have the same number of individuals as the current population in the current and all ancestral generations. Allele lineages are also copied from pop in modules with lineage information.
-
ancestor
(idx, gen, subPop=[])¶ Return a reference to individual
idx
in ancestral generationgen
. The correct individual will be returned even if the current generation is not the present one (see alsouseAncestralGen
). If a valid subPop is specified, index is relative to that subPop. Virtual subpopulation is not supported. Note that a float idx is acceptable as long as it rounds closely to an integer.
-
ancestralGens
()¶ Return the actual number of ancestral generations stored in a population, which does not necessarily equal to the number set by
setAncestralDepth()
.
-
clone
()¶ Create a cloned copy of a population. Note that Python statement
pop1 = pop
only creates a reference to an existing populationpop
.
-
__cmp__
(rhs)¶ a python function used to compare the population objects
-
dvars
(subPop=[])¶ Return a wrapper of Python dictionary returned by
vars(subPop)
so that dictionary keys can be accessed as attributes.
-
extractIndividuals
(indexes=[], IDs=[], idField="ind_id", filter=None)¶ Extract individuals with given absolute indexes (parameter indexes), IDs (parameter IDs, stored in information field idField, default to
ind_id
), or a filter function (parameter filter). If a list of absolute indexes are specified, the present generation will be extracted and form a one-generational population. If a list of IDs are specified, this function will look through all ancestral generations and extract individuals with given ID. Individuals with shared IDs are allowed. In the last case, a user-defined Python function should be provided. This function should accept parameter"ind"
or one or more of the information fields. All individuals, including ancestors if there are multiple ancestral generations, will be passed to this function. Individuals that returnsTrue
will be extracted. Extracted individuals will be in their original ancestral generations and subpopulations, even if some subpopulations or generations are empty. AnIndexError
will be raised if an index is out of bound but no error will be given if an invalid ID is encountered.
-
extractSubPops
(subPops=ALL_AVAIL, rearrange=False)¶ Extract a list of (virtual) subpopulations from a population and create a new population. If rearrange is
False
(default), structure and names of extracted subpopulations are kept although extracted subpopulations can have fewer individuals if they are created from extracted virtual subpopulations. (e.g. it is possible to extract all male individuals from a subpopulation using aSexSplitter()
). If rearrange isTrue
, each (virtual) subpopulation in subPops becomes a new subpopulation in the extracted population in the order at which they are specified. Because each virtual subpopulation becomes a subpopulation, this function could be used, for example, to separate male and female individuals to two subpopulations (subPops=[(0,0), (0,1)]
). If overlapping (virtual) subpopulations are specified, individuals will be copied multiple times. This function only extract individuals from the present generation.
-
genotype
(subPop=[])¶ Return an editable array of the genotype of all individuals in a population (if
subPop=[]
, default), or individuals in a subpopulation subPop. Virtual subpopulation is unsupported.
-
indByID
(id, ancGens=ALL_AVAIL, idField="ind_id")¶ Return a reference to individual with id stored in information field idField (default to
ind_id
). This function by default search the present and all ancestral generations (ancGen=ALL_AVAIL
), but you can limit the search in specific generations if you know which generations to search (ancGens=[0,1]
for present and parental generations) orUNSPECIFIED
to search only the current generation. If no individual with id is found, anIndexError
will be raised. A float id is acceptable as long as it rounds closely to an integer. Note that this function uses a dynamic searching algorithm which tends to be slow. If you need to look for multiple individuals from a static population, you might want to convert a population object to a pedigree object and use functionPedigree.indByID
.
-
indInfo
(field, subPop=[])¶ Return the values (as a list) of information field
field
(by index or name) of all individuals (ifsubPop=[]
, default), or individuals in a (virtual) subpopulation (ifsubPop=sp
or(sp, vsp)
).
-
individual
(idx, subPop=[])¶ Return a refernce to individual idx in the population (if
subPop=[]
, default) or a subpopulation (ifsubPop=sp
). Virtual subpopulation is not supported. Note that a float idx is acceptable as long as it rounds closely to an integer.
-
individuals
(subPop=[])¶ Return an iterator that can be used to iterate through all individuals in a population (if
subPop=[]
, default), or a (virtual) subpopulation (subPop=spID
or(spID, vspID)
). If you would like to iterate through multiple subpopulations in multiple ancestral generations, please use functionPopulation.allIndividuals()
.
-
lineage
(subPop=[])¶ Return an editable array of the lineage of alleles for all individuals in a population (if
subPop=[]
, default), or individuals in a subpopulation subPop. Virtual subpopulation is unsupported. This function returns ``None`` for modules without lineage information.
-
mergeSubPops
(subPops=ALL_AVAIL, name="", toSubPop=-1)¶ Merge subpopulations subPops. If subPops is
ALL_AVAIL
(default), all subpopulations will be merged. subPops do not have to be adjacent to each other. They will all be merged to the subpopulation with the smallest subpopulation ID, unless a subpopulation ID is specified using parametertoSubPop
. Indexes of the rest of the subpopulation may be changed. A new name can be assigned to the merged subpopulation through parameter name (an empty name will be ignored). This function returns the ID of the merged subpopulation.
-
mutants
(subPop=[])¶ Return an iterator that iterate through mutants of all individuals in a population (if
subPop=[]
, default), or individuals in a subpopulation subPop. Virtual subpopulation is unsupported. Each mutant is presented as a tuple of (index, value) where index is the index of mutant (from 0 to totNumLoci()*ploidy()) so you will have to adjust its value to check multiple alleles at a locus. This function ignores type of chromosomes so non-zero alleles in unused alleles of sex and mitochondrial chromosomes are also iterated.
-
numSubPop
()¶ Return the number of subpopulations in a population. Return 1 if there is no subpopulation structure.
-
numVirtualSubPop
()¶ Return the number of virtual subpopulations (VSP) defined by a VSP splitter. Return
0
if no VSP is defined.
-
popSize
(ancGen=-1, sex=ANY_SEX)¶ Return the total number of individuals in all subpopulations of the current generation (default) or the an ancestral generation ancGen. This function by default returns number of all individuals (
sex=ANY_SEX
), but it will return number of males (ifsex=MALE_ONLY
), number of females (ifsex=MALE_ONLY
), and number of male/female pairs (ifsex=PAIR_ONLY
) which is essentially less of the number of males and females.
-
push
(pop)¶ Push population pop into the current population. Both populations should have the same genotypic structure. The current population is discarded if ancestralDepth (maximum number of ancestral generations to hold) is zero so no ancestral generation can be kept. Otherise, the current population will become the parental generation of pop. If ancGen of a population is positive and there are already ancGen ancestral generations (c.f.
ancestralGens()
), the greatest ancestral generation will be discarded. In any case, Population*pop* becomes invalid as all its individuals are absorbed by the current population.
-
recodeAlleles
(alleles, loci=ALL_AVAIL, alleleNames=[])¶ Recode alleles at loci (can be a list of loci indexes or names, or all loci in a population (
ALL_AVAIL
)) to other values according to parameter alleles. This parameter can a list of new allele numbers for alleles0
,1
,2
, … (allelex
will be recoded tonewAlleles[x]
,x
outside of the range of newAlleles will not be recoded, although a warning will be given ifDBG_WARNING
is defined) or a Python function, which should accept one or both parametersallele
(existing allele) andlocus
(index of locus). The return value will become the new allele. This function is intended to recode some alleles without listing all alleles in a list. It will be called once for each existing allele so it is not possible to recode an allele to different alleles. A new list of allele names could be specified for these loci. Different sets of names could be specified for each locus if a nested list of names are given. This function recode alleles for all subpopulations in all ancestral generations.
-
removeIndividuals
(indexes=[], IDs=[], idField="ind_id", filter=None)¶ remove individual(s) by absolute indexes (parameter index) or their IDs (parameter IDs), or using a filter function (paramter filter). If indexes are used, only individuals at the current generation will be removed. If IDs are used, all individuals with one of the IDs at information field idField (default to
"ind_id"
) will be removed. Although"ind_id"
usually stores unique IDs of individuals, this function is frequently used to remove groups of individuals with the same value at an information field. AnIndexError
will be raised if an index is out of bound, but no error will be given if an invalid ID is specified. In the last case, a user-defined function should be provided. This function should accept parameter"ind"
or one or more of the information fields. All individuals, including ancestors if there are multiple ancestral generations, will be passed to this function. Individuals that returnsTrue
will be removed. This function does not affect subpopulation structure in the sense that a subpopulation will be kept even if all individuals from it are removed.
-
removeInfoFields
(fields)¶ Remove information fields fields from a population.
-
removeLoci
(loci=UNSPECIFIED, keep=UNSPECIFIED)¶ Remove loci (absolute indexes or names) and genotypes at these loci from the current population. Alternatively, a parameter keep can be used to specify loci that will not be removed.
-
removeSubPops
(subPops)¶ Remove (virtual) subpopulation(s) subPops and all their individuals. This function can be used to remove complete subpopulations (with shifted subpopulation indexes) or individuals belonging to virtual subpopulations of a subpopulation. In the latter case, the subpopulations are kept even if all individuals have been removed. This function only handles the present generation.
-
resize
(sizes, propagate=False)¶ Resize population by giving new subpopulation sizes sizes. individuals at the end of some subpopulations will be removed if the new subpopulation size is smaller than the old one. New individuals will be appended to a subpopulation if the new size is larger. Their genotypes will be set to zero (default), or be copied from existing individuals if propagate is set to
True
. More specifically, if a subpopulation with3
individuals is expanded to7
, the added individuals will copy genotypes from individual1
,2
,3
, and1
respectively. Note that this function only resizes the current generation.
-
save
(filename)¶ Save population to a file filename, which can be loaded by a global function
loadPopulation(filename)
.
-
setAncestralDepth
(depth)¶ set the intended ancestral depth of a population to depth, which can be
0
(does not store any ancestral generation),-1
(store all ancestral generations), and a positive number (store depth ancestral generations. If there exists more than depth ancestral generations (if depth > 0), extra ancestral generations are removed.
-
setGenotype
(geno, subPop=[])¶ Fill the genotype of all individuals in a population (if
subPop=[]
) or in a (virtual) subpopulation subPop (ifsubPop=sp
or(sp, vsp)
) using a list of alleles geno. geno will be reused if its length is less thansubPopSize(subPop)*totNumLoci()*ploidy()
.
-
setIndInfo
(values, field, subPop=[])¶ Set information field
field
(specified by index or name) of all individuals (ifsubPop=[]
, default), or individuals in a (virtual) subpopulation (subPop=sp
or(sp, vsp)
) to values. values will be reused if its length is smaller than the size of the population or (virtual) subpopulation.
-
setInfoFields
(fields, init=0)¶ Set information fields fields to a population and initialize them with value init. All existing information fields will be removed.
-
setLineage
(geno, subPop=[])¶ Fill the lineage of all individuals in a population (if
subPop=[]
) or in a (virtual) subpopulation subPop (ifsubPop=sp
or(sp, vsp)
) using a list of IDs lineage. lineage will be reused if its length is less thansubPopSize(subPop)*totNumLoci()*ploidy()
. This function returns directly for modules without lineage information.
-
setSubPopByIndInfo
(field)¶ Rearrange individuals to their new subpopulations according to their integer values at information field field (value returned by
Individual::info(field)
). individuals with negative values at this field will be removed. Existing subpopulation names are kept. New subpopulations will have empty names.
-
setSubPopName
(name, subPop)¶ Assign a name name to subpopulation subPop. Note that subpopulation names do not have to be unique.
-
setVirtualSplitter
(splitter)¶ Set a VSP splitter to the population, which defines the same VSPs for all subpopulations. If different VSPs are needed for different subpopulations, a
CombinedSplitter
can be used to make these VSPs available to all subpopulations.
-
sortIndividuals
(infoFields, reverse=False)¶ Sort individuals according to values at specified information fields (infoFields). Individuals will be sorted at an increasing order unless reverse is set to
true
.
-
splitSubPop
(subPop, sizes, names=[])¶ Split subpopulation subPop into subpopulations of given sizes, which should add up to the size of subpopulation subPop or 1, in which case sizes are treated as proportions. If subPop is not the last subpopulation, indexes of subpopulations after subPop are shifted. If subPop is named, the same name will be given to all new subpopulations unless a new set of names are specified for these subpopulations. This function returns the IDs of split subpopulations.
-
subPopBegin
(subPop)¶ Return the index of the first individual in subpopulation subPop.
-
subPopByName
(name)¶ Return the index of the first subpopulation with name name. An
IndexError
will be raised if subpopulations are not named, or if no subpopulation with name name is found. Virtual subpopulation name is not supported.
-
subPopEnd
(subPop)¶ Return the index of the last individual in subpopulation subPop plus
1
, so thatrange(subPopBegin(subPop)
,subPopEnd(subPop)
can iterate through the index of all individuals in subpopulation subPop.
-
subPopIndPair
(idx)¶ Return the subpopulation ID and relative index of an individual, given its absolute index
idx
.
-
subPopName
(subPop)¶ Return the “spName - vspName” (virtual named subpopulation), “” (unnamed non-virtual subpopulation), “spName” (named subpopulation) or “vspName” (unnamed virtual subpopulation), depending on whether subpopulation is named or if subPop is virtual.
-
subPopNames
()¶ Return the names of all subpopulations (excluding virtual subpopulations). An empty string will be returned for unnamed subpopulations.
-
subPopSizes
(ancGen=-1)¶ Return the sizes of all subpopulations at the current generation (default) or specified ancestral generation ancGen. Virtual subpopulations are not considered.
-
swap
(rhs)¶ Swap the content of two population objects, which can be handy in some particular circumstances. For example, you could swap out a population in a simulator.
-
updateInfoFieldsFrom
(fields, pop, fromFields=[], ancGens=ALL_AVAIL)¶ Update information fields fields from fromFields of another population (or Pedigree) pop. Two populations should have the same number of individuals. If fromFields is not specified, it is assumed to be the same as fields. If ancGens is not
ALL_AVAIL
, only the specified ancestral generations are updated.
-
useAncestralGen
(idx)¶ Making ancestral generation idx (
0
for current generation,1
for parental generation,2
for grand-parental generation, etc) the current generation. This is an efficient way to access Population properties of an ancestral generation.useAncestralGen(0)
should always be called afterward to restore the correct order of ancestral generations.
-
vars
(subPop=[])¶ return variables of a population as a Python dictionary. If a valid subpopulation subPop is specified, a dictionary
vars()["subPop"][subPop]
is returned. AValueError
will be raised if key subPop does not exist invars()
, or if key subPop does not exist invars()["subPop"]
.
-
virtualSplitter
()¶ Return the virtual splitter associated with the population,
None
will be returned if there is no splitter.
-
asPedigree
(idField='ind_id', fatherField='father_id', motherField='mother_id')¶ Convert the existing population object to a pedigree. After this function pedigree function should magically be usable for this function.
-
subPopSize
(subPop=[], ancGen=-1, sex=ANY_SEX)¶ Return the size of a subpopulation (
subPopSize(sp)
) or a virtual subpopulation (subPopSize([sp, vsp])
) in the current generation (default) or a specified ancestral generation ancGen. If no subpop is given, it is the same aspopSize(ancGen, sex)
. Population and virtual subpopulation names can be used. This function by default returns number of all individuals (sex=ANY_SEX
), but it will return number of males (ifsex=MALE_ONLY
), number of females (ifsex=MALE_ONLY
), and number of male/female pairs (ifsex=PAIR_ONLY
) which is essentially less of the number of males and females. <group>2-subpopsize</grouplociList()>
-
allIndividuals
(subPops=ALL_AVAIL, ancGens=True)¶ Return an iterator that iterat through all (virtual) subpopulations in all ancestral generations. A list of (virtual) subpopulations (subPops) and a list of ancestral generations (ancGens, can be a single number) could be specified to iterate through only selected subpopulation and generations. Value
ALL_AVAIL
is acceptable in the specification ofsp
and/orvsp
in specifying a virtual subpopulation(sp, vsp)
for the iteration through all or specific virtual subpopulation in all or specific subpopulations.
-
evolve
(initOps=[], preOps=[], matingScheme=MatingScheme(), postOps=[], finalOps=[], gen=-1, dryrun=False)¶ Evolve the current population gen generations using mating scheme matingScheme and operators initOps (applied before evolution), preOps (applied to the parental population at the beginning of each life cycle), postOps (applied to the offspring population at the end of each life cycle) and finalOps (applied at the end of evolution). More specifically, this function creates a Simulator using the current population, call its evolve function using passed parameters and then replace the current population with the evolved population. Please refer to function
Simulator.evolve
for more details about each parameter.
class Pedigree¶
-
class
Pedigree
¶ The pedigree class is derived from the population class. Unlike a population class that emphasizes on individual properties, the pedigree class emphasizes on relationship between individuals. An unique ID for all individuals is needed to create a pedigree object from a population object. Compared to the
Population
class, aPedigree
object is optimized for access individuals by their IDs, regardless of population structure and ancestral generations. Note that the implementation of some algorithms rely on the fact that parental IDs are smaller than their offspring because individual IDs are assigned sequentially during evolution. Pedigrees with manually assigned IDs should try to obey such a rule.-
Pedigree
(pop, loci=[], infoFields=[], ancGens=ALL_AVAIL, idField="ind_id", fatherField="father_id", motherField="mother_id", stealPop=False)¶ Create a pedigree object from a population, using a subset of loci (parameter loci, can be a list of loci indexes, names, or
ALL_AVAIL
, default to no locus), information fields (parameter infoFields, default to no information field besides idField, fatherField and motherField), and ancestral generations (parameter ancGens, default to all ancestral generations). By default, information fieldfather_id
(parameter fatherField) andmother_id
(parameter motherField) are used to locate parents identified byind_id
(parameter idField), which should store an unique ID for all individuals. Multiple individuls with the same ID are allowed and will be considered as the same individual, but a warning will be given if they actually differ in genotype or information fields. OperatorsIdTagger
andPedigreeTagger
are usually used to assign such IDs, although functionsampling.indexToID
could be used to assign unique IDs and construct parental IDs from index based relationship recorded by operatorParentsTagger
. A pedigree object could be constructed with one or no parent but certain functions such as relative tracking will not be available for such pedigrees. In case that your are no longer using your population object, you could steal the content from the population by setting stealPop toTrue
.
-
clone
()¶ Create a cloned copy of a Pedigree.
-
identifyAncestors
(IDs=ALL_AVAIL, subPops=ALL_AVAIL, ancGens=ALL_AVAIL)¶ If a list of individuals (IDs) is given, this function traces backward in time and find all ancestors of these individuals. If IDs is
ALL_AVAIL
, ancestors of all individuals in the present generation will be located. If a list of (virtual) subpopulations (subPops) or ancestral geneartions (ancGens) is given, the search will be limited to individuals in these subpopulations and generations. This could be used to, for example, find all fathers of IDs. This function returns a list of IDs, which includes valid specified IDs. Invalid IDs will be silently ignored. Note that parameters subPops and ancGens will limit starting IDs ifIDs
is set toALL_AVAIL
, but specified IDs will not be trimmed according to these parameters.
-
identifyFamilies
(pedField="", subPops=ALL_AVAIL, ancGens=ALL_AVAIL)¶ This function goes through all individuals in a pedigree and group related individuals into families. If an information field pedField is given, indexes of families will be assigned to this field of each family member. The return value is a list of family sizes corresponding to families 0, 1, 2, … etc. If a list of (virtual) subpopulations (parameter subPops) or ancestral generations are specified (parameter ancGens), the search will be limited to individuals in these subpopulations and generations.
-
identifyOffspring
(IDs=[], subPops=ALL_AVAIL, ancGens=ALL_AVAIL)¶ This function traces forward in time and find all offspring of individuals specified in parameter IDs. If a list of (virtual) subpopulations (subPops) or ancestral geneartions (ancGens) is given, the search will be limited to individuals in these subpopulations and generations. This could be used to, for example, find all male offspring of IDs. This function returns a list of IDs, which includes valid starting IDs. Invalid IDs are silently ignored. Note that parameters subPops and ancGens will limit search result but will not be used to trim specified IDs.
-
indByID
(id)¶ Return a reference to individual with id. An
IndexError
will be raised if no individual with id is found. An float id is acceptable as long as it rounds closely to an integer.
-
individualsWithRelatives
(infoFields, sex=[], affectionStatus=[], subPops=ALL_AVAIL, ancGens=ALL_AVAIL)¶ Return a list of IDs of individuals who have non-negative values at information fields infoFields. Additional requirements could be specified by parameters sex and affectionStatus. sex can be
ANY_SEX
(default),MALE_ONLY
,FEMALE_ONLY
,SAME_SEX
orOPPOSITE_SEX
, and affectionStatus can beAFFECTED
,UNAFFECTED
orANY_AFFECTION_STATUS
(default). This function by default check all individuals in all ancestral generations, but you could limit the search using parameter subPops (a list of (virtual) subpopulations) and ancestral generations ancGens. Relatives fall out of specified subpopulations and ancestral generaions will be considered invalid.
-
locateRelatives
(relType, resultFields=[], sex=ANY_SEX, affectionStatus=ANY_AFFECTION_STATUS, ancGens=ALL_AVAIL)¶ This function locates relatives (of type relType) of each individual and store their IDs in information fields relFields. The length of relFields determines how many relatives an individual can have.
Parameter relType specifies what type of relative to locate, which can be
SPOUSE
locate spouses with whom an individual has at least one common offspring.OUTBRED_SPOUSE
locate non-slibling spouses, namely spouses with no shared parent.OFFSPRING
all offspring of each individual.COMMON_OFFSPRING
common offspring between each individual and its spouse (located bySPOUSE
orOUTBRED_SPOUSE
). relFields should consist of an information field for spouse andm-1
fields for offspring wherem
is the number of fields.FULLSIBLING
siblings with common father and mother,SIBLING
siblings with at least one common parent.
Optionally, you can specify the sex and affection status of relatives you would like to locate, using parameters sex and affectionStatus. sex can be
ANY_SEX
(default),MALE_ONLY
,FEMALE_ONLY
,SAME_SEX
orOPPOSITE_SEX
, and affectionStatus can beAFFECTED
,UNAFFECTED
orANY_AFFECTION_STATUS
(default). Only relatives with specified properties will be located.This function will by default go through all ancestral generations and locate relatives for all individuals. This can be changed by setting parameter ancGens to certain ancestral generations you would like to process.
-
save
(filename, infoFields=[], loci=[])¶ Save a pedigree to file filename. This function goes through all individuals of a pedigree and outputs in each line the ID of individual, IDs of his or her parents, sex (
'M'
or'F'
), affection status ('A'
or'U'
), values of specified information fields infoFields and genotypes at specified loci (parameterloci
, which can be a list of loci indexes, names, orALL_AVAIL
). Allele numbers, instead of their names are outputed. Two columns are used for each locus if the population is diploid. This file can be loaded using functionloadPedigree
although additional information such as names of information fields need to be specified. This format differs from a ````.ped file used in some genetic analysis software in that there is no family ID and IDs of all individuals have to be unique. Note that parental IDs will be set to zero if the parent is not in the pedigree object. Therefore, the parents of individuals in the top-most ancestral generation will always be zero.
-
traceRelatives
(fieldPath, sex=[], affectionStatus=[], resultFields=[], ancGens=ALL_AVAIL)¶ Trace a relative path in a population and record the result in the given information fields resultFields. This function is used to locate more distant relatives based on the relatives located by function
locateRelatives
. For example, after siblings and offspring of all individuals are located, you can locate mother’s sibling’s offspring using a relative path, and save their indexes in each individuals information fields resultFields.A relative path consits of a fieldPath that specifies which information fields to look for at each step, a sex specifies sex choices at each generation, and a affectionStatus that specifies affection status at each generation. fieldPath should be a list of information fields, sex and affectionStatus are optional. If specified, they should be a list of
ANY_SEX
,MALE_ONLY
,FEMALE_ONLY
,SAME_SEX
andOppsiteSex
for parameter sex, and a list ofUNAFFECTED
,AFFECTED
andANY_AFFECTION_STATUS
for parameter affectionStatus.For example, if
fieldPath = [['father_id', 'mother_id'], ['sib1', 'sib2'], ['off1', 'off2']]
, andsex = [ANY_SEX, MALE_ONLY, FEMALE_ONLY]
, this function will locatefather_id
andmother_id
for each individual, find all individuals referred byfather_id
andmother_id
, find informaton fieldssib1
andsib2
from these parents and locate male individuals referred by these two information fields. Finally, the information fieldsoff1
andoff2
from these siblings are located and are used to locate their female offspring. The results are father or mother’s brother’s daughters. Their indexes will be saved in each individuals information fields resultFields. If a list of ancestral generations is given in parameter ancGens is given, only individuals in these ancestral generations will be processed.
-
asPopulation
()¶ Convert the existing pedigree object to a population. This function will behave like a regular population after this function call.
-
class Simulator¶
-
class
Simulator
¶ A simuPOP simulator is responsible for evolving one or more populations forward in time, subject to various operators. Populations in a simulator are created from one or more replicates of specified populations. A number of functions are provided to access and manipulate populations, and most importantly, to evolve them.
-
Simulator
(pops, rep=1, stealPops=True)¶ Create a simulator with rep (default to
1
) replicates of populations pops, which is a list of populations although a single population object is also acceptable. Contents of passed populations are by default moved to the simulator to avoid duplication of potentially large population objects, leaving empty populations behind. This behavior can be changed by setting stealPops toFalse
, in which case populations are copied to the simulator.
-
add
(pop, stealPop=True)¶ Add a population pop to the end of an existing simulator. This function by default moves pop to the simulator, leaving an empty population for passed population object. If steal is set to
False
, the population will be copied to the simulator, and thus unchanged.
-
clone
()¶ Clone a simulator, along with all its populations. Note that Python assign statement
simu1 = simu
only creates a symbolic link to an existing simulator.
-
__cmp__
(rhs)¶ a Pyton function used to compare the simulator objects Note that mating schemes are not tested.
-
dvars
(rep, subPop=[])¶ Return a wrapper of Python dictionary returned by
vars(rep, subPop)
so that dictionary keys can be accessed as attributes.
-
evolve
(initOps=[], preOps=[], matingScheme=MatingScheme, postOps=[], finalOps=[], gen=-1, dryrun=False)¶ Evolve all populations gen generations, subject to several lists of operators which are applied at different stages of an evolutionary process. Operators initOps are applied to all populations (subject to applicability restrictions of the operators, imposed by the rep parameter of these operators) before evolution. They are used to initialize populations before evolution. Operators finalOps are applied to all populations after the evolution.
Operators preOps, and postOps are applied during the life cycle of each generation. These operators can be applied at all or some of the generations, to all or some of the evolving populations, depending the begin, end, step, at and reps parameters of these operators. These operators are applied in the order at which they are specified. populations in a simulator are evolved one by one. At each generation, operators preOps are applied to the parental generations. A mating scheme is then used to populate an offspring generation. For each offspring, his or her sex is determined before during- mating operators of the mating scheme are used to transmit parental genotypes. After an offspring generation is successfully generated and becomes the current generation, operators postOps are applied to the offspring generation. If any of the preOps and postOps fails (return
False
), the evolution of a population will be stopped. The generation number of a population, which is the variable"gen"
in each populations local namespace, is increased by one if an offspring generation has been successfully populated even if a post-mating operator fails. Another variable"rep"
will also be set to indicate the index of each population in the simulator. Note that populations in a simulator does not have to have the same generation number. You could reset a population’s generation number by changing this variable.Parameter gen can be set to a non-negative number, which is the number of generations to evolve. If a simulator starts at the beginning of a generation
g
(for example 0), a simulator will stop at the beginning (instead of the end) of generationg + gen
(for example gen). If gen is negative (default), the evolution will continue indefinitely, until all replicates are stopped by operators that returnFalse
at some point (these operators are called terminators). At the end of the evolution, the generations that each replicates have evolved are returned. Note that finalOps are applied to all applicable population, including those that have stopped before others.If parameter dryrun is set to
True
, this function will print a description of the evolutionary process generated by functiondescribeEvolProcess()
and exits.
-
extract
(rep)¶ Extract the rep-th population from a simulator. This will reduce the number of populations in this simulator by one.
-
numRep
()¶ Return the number of replicates.
-
population
(rep)¶ Return a reference to the rep-th population of a simulator. The reference will become invalid once the simulator starts evolving or becomes invalid (removed). If an independent copy of the population is needed, you can use
population.clone()
to create a cloned copy orsimulator.extract()
to remove the population from the simulator.
-
populations
()¶ Return a Python iterator that can be used to iterate through all populations in a simulator.
-
vars
(rep, subPop=[])¶ Return the local namespace of the rep-th population, equivalent to
x.Population(rep).vars(subPop)
.
-