Module simuPOP.sampling
¶
This module provides classes and functions that could be used to draw samples
from a simuPOP population. These functions accept a list of parameters such
as subPops
((virtual) subpopulations from which samples will be drawn) and
numOfSamples
(number of samples to draw) and return a list of populations. Both
independent individuals and dependent individuals (Pedigrees) are supported.
Independent individuals could be drawn from any Population. pedigree information is not necessary and is usually ignored. Unique IDs are not needed either although such IDs could help you identify samples in the parent Population.
Pedigrees could be drawn from multi-generational populations or age-structured
populations. All individuals are required to have a unique ID (usually tracked
by operator IdTagger
and are stored in information field ind_id
).
Parents of individuals are usually tracked by operator PedigreeTagger
and
are stored in information fields father_id
and mother_id
. If parental
information is tracked using operator ParentsTagger
and information fields
father_idx
and mother_idx
, a function sampling.indexToID
can be
used to convert index based pedigree to ID based Pedigree. Note that
ParentsTagger
can not be used to track Pedigrees in age-structured
populations because they require parents of each individual resides in a
parental generation.
All sampling functions support virtual subpopulations through parameter
subPops
, although sample size specification might vary. This feature
allows you to draw samples with specified properties. For example, you
could select only female individuals for cases of a female-only disease,
or select individuals within certain age-range. If you specify a list
of (virtual) subpopulations, you are usually allowed to draw certain
number of individuals from each subpopulation.
class BaseSampler¶
-
class
simuPOP.sampling.
BaseSampler
¶ A sampler extracts individuals from a simuPOP population and return them as separate populations. This base class defines the common interface of all sampling classes, including how samples prepared and returned.
-
BaseSampler
(subPops=ALL_AVAIL)¶ Create a sampler with parameter
subPops
, which will be used to prepare population for sampling.subPops
should be a list of (virtual) subpopulations from which samples are drawn. The default value is ALL_AVAIL, which means all available subpopulations of a Population.
-
drawSample
(pop)¶ Draw and return a sample.
-
drawSamples
(pop, numOfSamples)¶ Draw multiple samples and return a list of populations.
-
prepareSample
(pop, rearrange)¶ Prepare passed population object for sampling according to parameter
subPops
. If samples are drawn from the whole population, a Population will be trimmed if only selected (virtual) subpopulations are used. If samples are drawn separately from specified subpopulations, Populationpop
will be rearranged (ifrearrange==True
) so that each subpoulation corresponds to one element in parametersubPops
.
-
class RandomSampler¶
-
class
simuPOP.sampling.
RandomSampler
¶ A sampler that draws individuals randomly.
-
RandomSampler
(sizes, subPops=ALL_AVAIL)¶ Creates a random sampler with specified number of individuals.
-
drawSample
(input_pop)¶ Draw a random sample from passed population.
-
drawSamples
(pop, numOfSamples)¶ Draw multiple samples and return a list of populations.
-
prepareSample
(pop, rearrange)¶ Prepare passed population object for sampling according to parameter
subPops
. If samples are drawn from the whole population, a Population will be trimmed if only selected (virtual) subpopulations are used. If samples are drawn separately from specified subpopulations, Populationpop
will be rearranged (ifrearrange==True
) so that each subpoulation corresponds to one element in parametersubPops
.
-
Function drawRandomSample¶
-
simuPOP.sampling.
drawRandomSample
(pop, sizes, subPops=ALL_AVAIL)¶ Draw
sizes
random individuals from a population. If a singlesizes
is given, individuals are drawn randomly from the whole population or from specified (virtual) subpopulations (parametersubPops
). Otherwise, a list of numbers should be used to specify number of samples from each subpopulation, which can be all subpopulations ifsubPops=ALL_AVAIL
(default), or from each of the specified (virtual) subpopulations. This function returns a population with all extracted individuals.
Function drawRandomSamples¶
-
simuPOP.sampling.
drawRandomSamples
(pop, sizes, numOfSamples=1, subPops=ALL_AVAIL)¶ Draw
numOfSamples
random samples from a population and return a list of populations. Please refer to functiondrawRandomSample
for more details about parameterssizes
andsubPops
.
class CaseControlSampler¶
-
class
simuPOP.sampling.
CaseControlSampler
¶ A sampler that draws affected and unaffected individuals randomly.
-
CaseControlSampler
(cases, controls, subPops=ALL_AVAIL)¶ Ceates a case-control sampler with specified number of cases and controls.
-
drawSample
(input_pop)¶ Draw a case control sample
-
drawSamples
(pop, numOfSamples)¶ Draw multiple samples and return a list of populations.
-
prepareSample
(input_pop)¶ Find out indexes all affected and unaffected individuales.
-
Function drawCaseControlSample¶
-
simuPOP.sampling.
drawCaseControlSample
(pop, cases, controls, subPops=ALL_AVAIL)¶ Draw a case-control samples from a population with
cases
affected andcontrols
unaffected individuals. If singlecases
andcontrols
are given, individuals are drawn randomly from the whole Population or from specified (virtual) subpopulations (parametersubPops
). Otherwise, a list of numbers should be used to specify number of cases and controls from each subpopulation, which can be all subpopulations ifsubPops=ALL_AVAIL
(default), or from each of the specified (virtual) subpopulations. This function returns a population with all extracted individuals.
Function drawCaseControlSamples¶
-
simuPOP.sampling.
drawCaseControlSamples
(pop, cases, controls, numOfSamples=1, subPops=ALL_AVAIL)¶ Draw
numOfSamples
case-control samples from a population withcases
affected andcontrols
unaffected individuals and return a list of populations. Please refer to functiondrawCaseControlSample
for a detailed descriptions of parameters.
class PedigreeSampler¶
-
class
simuPOP.sampling.
PedigreeSampler
¶ The base class of all pedigree based sampler.
-
PedigreeSampler
(families, subPops=ALL_AVAIL, idField='ind_id', fatherField='father_id', motherField='mother_id')¶ Creates a pedigree sampler with parameters
- families
- number of families. This can be a number or a list of numbers. In the latter case, specified families are drawn from each subpopulation.
- subPops
- A list of (virtual) subpopulations from which samples are drawn. The default value is ALL_AVAIL, which means all available subpopulations of a population.
-
drawSample
(input_pop)¶ Randomly select Pedigrees
-
drawSamples
(pop, numOfSamples)¶ Draw multiple samples and return a list of populations.
-
family
(id)¶ Get the family of individual with id.
-
prepareSample
(pop, loci=[], infoFields=[], ancGens=True)¶ Prepare self.pedigree, some pedigree sampler might need additional loci and information fields for this sampler.
-
class AffectedSibpairSampler¶
-
class
simuPOP.sampling.
AffectedSibpairSampler
¶ A sampler that draws a nuclear family with two affected offspring.
-
AffectedSibpairSampler
(families, subPops=ALL_AVAIL, idField='ind_id', fatherField='father_id', motherField='mother_id')¶ Initialize an affected sibpair sampler.
-
drawSample
(input_pop)¶ Randomly select Pedigrees
-
drawSamples
(pop, numOfSamples)¶ Draw multiple samples and return a list of populations.
-
family
(id)¶ Return id, its spouse and their children
-
prepareSample
(input_pop)¶ Find the father or all affected sibpair families
-
Function drawAffectedSibpairSample¶
-
simuPOP.sampling.
drawAffectedSibpairSample
(pop, families, subPops=ALL_AVAIL, idField='ind_id', fatherField='father_id', motherField='mother_id')¶ Draw affected sibpair samples from a population. If a single
families
is given, affected sibpairs and their parents are drawn randomly from the whole population or from specified (virtual) subpopulations (parametersubPops
). Otherwise, a list of numbers should be used to specify number of families from each subpopulation, which can be all subpopulations ifsubPops=ALL_AVAIL
(default), or from each of the specified (virtual) subpopulations. This function returns a population that contains extracted individuals.
Function drawAffectedSibpairSamples¶
-
simuPOP.sampling.
drawAffectedSibpairSamples
(pop, families, numOfSamples=1, subPops=ALL_AVAIL, idField='ind_id', fatherField='father_id', motherField='mother_id')¶ Draw
numOfSamples
affected sibpair samplesa from populationpop
and return a list of populations. Please refer to functiondrawAffectedSibpairSample
for a description of other parameters.
class NuclearFamilySampler¶
-
class
simuPOP.sampling.
NuclearFamilySampler
¶ A sampler that draws nuclear families with specified number of affected parents and offspring.
-
NuclearFamilySampler
(families, numOffspring, affectedParents=0, affectedOffspring=0, subPops=ALL_AVAIL, idField='ind_id', fatherField='father_id', motherField='mother_id')¶ Creates a nuclear family sampler with parameters
- families
- number of families. This can be a number or a list of numbers. In the latter case, specified families are drawn from each subpopulation.
- numOffspring
- number of offspring. This can be a fixed number or a range [min, max].
- affectedParents
- number of affected parents. This can be a fixed number or a range [min, max].
- affectedOffspring
- number of affected offspring. This can be a fixed number of a range [min, max].
- subPops
- A list of (virtual) subpopulations from which samples are drawn. The default value is ALL_AVAIL, which means all available subpopulations of a population.
-
drawSample
(input_pop)¶ Randomly select Pedigrees
-
drawSamples
(pop, numOfSamples)¶ Draw multiple samples and return a list of populations.
-
family
(id)¶ Return id, its spouse and their children
-
prepareSample
(input_pop)¶ Prepare self.pedigree, some pedigree sampler might need additional loci and information fields for this sampler.
-
Function drawNuclearFamilySample¶
-
simuPOP.sampling.
drawNuclearFamilySample
(pop, families, numOffspring, affectedParents=0, affectedOffspring=0, subPops=ALL_AVAIL, idField='ind_id', fatherField='father_id', motherField='mother_id')¶ Draw nuclear families from a population. Number of offspring, number of affected parents and number of affected offspring should be specified using parameters
numOffspring
,affectedParents
andaffectedOffspring
, which can all be a single number, or a range[a, b]
(b
is incldued). If a singlefamilies
is given, Pedigrees are drawn randomly from the whole population or from specified (virtual) subpopulations (parametersubPops
). Otherwise, a list of numbers should be used to specify numbers of families from each subpopulation, which can be all subpopulations ifsubPops=ALL_AVAIL
(default), or from each of the specified (virtual) subpopulations. This function returns a population that contains extracted individuals.
Function drawNuclearFamilySamples¶
-
simuPOP.sampling.
drawNuclearFamilySamples
(pop, families, numOffspring, affectedParents=0, affectedOffspring=0, numOfSamples=1, subPops=ALL_AVAIL, idField='ind_id', fatherField='father_id', motherField='mother_id')¶ Draw
numOfSamples
affected sibpair samplesa from populationpop
and return a list of populations. Please refer to functiondrawNuclearFamilySample
for a description of other parameters.
class ThreeGenFamilySampler¶
-
class
simuPOP.sampling.
ThreeGenFamilySampler
¶ A sampler that draws three-generation families with specified pedigree size and number of affected individuals.
-
ThreeGenFamilySampler
(families, numOffspring, pedSize, numOfAffected=0, subPops=ALL_AVAIL, idField='ind_id', fatherField='father_id', motherField='mother_id')¶ - families
- number of families. This can be a number or a list of numbers. In the latter case, specified families are drawn from each subpopulation.
- numOffspring
- number of offspring. This can be a fixed number or a range [min, max].
- pedSize
- number of individuals in the Pedigree. This can be a fixed number or a range [min, max].
- numAfffected
- number of affected individuals in the Pedigree. This can be a fixed number or a range [min, max]
- subPops
- A list of (virtual) subpopulations from which samples are drawn. The default value is ALL_AVAIL, which means all available subpopulations of a population.
-
drawSample
(input_pop)¶ Randomly select Pedigrees
-
drawSamples
(pop, numOfSamples)¶ Draw multiple samples and return a list of populations.
-
family
(id)¶ Return id, its spouse, their children, children’s spouse and grandchildren
-
prepareSample
(input_pop)¶ Prepare self.pedigree, some pedigree sampler might need additional loci and information fields for this sampler.
-
Function drawThreeGenFamilySample¶
-
simuPOP.sampling.
drawThreeGenFamilySample
(pop, families, numOffspring, pedSize, numOfAffected=0, subPops=ALL_AVAIL, idField='ind_id', fatherField='father_id', motherField='mother_id')¶ Draw three-generation families from a population. Such families consist of grant parents, their children, spouse of these children, and grand children. Number of offspring, total number of individuals, and total number of affected individuals in a pedigree should be specified using parameters
numOffspring
,pedSize
andnumOfAffected
, which can all be a single number, or a range[a, b]
(b
is incldued). If a singlefamilies
is given, Pedigrees are drawn randomly from the whole Population or from specified (virtual) subpopulations (parametersubPops
). Otherwise, a list of numbers should be used to specify numbers of families from each subpopulation, which can be all subpopulations ifsubPops=ALL_AVAIL
(default), or from each of the specified (virtual) subpopulations. This function returns a population that contains extracted individuals.
Function drawThreeGenFamilySamples¶
-
simuPOP.sampling.
drawThreeGenFamilySamples
(pop, families, numOffspring, pedSize, numOfAffected=0, numOfSamples=1, subPops=ALL_AVAIL, idField='ind_id', fatherField='father_id', motherField='mother_id')¶ Draw
numOfSamples
three-generation pedigree samples from populationpop
and return a list of populations. Please refer to functiondrawThreeGenFamilySample
for a description of other parameters.
class CombinedSampler¶
-
class
simuPOP.sampling.
CombinedSampler
¶ A combined sampler accepts a list of sampler objects, draw samples and combine the returned sample into a single population. An id field is required to use this sampler, which will be used to remove extra copies of individuals who have been drawn by different samplers.
-
CombinedSampler
(samplers=[], idField='ind_id')¶ - samplers
- A list of samplers
-
drawSample
(pop)¶ Draw and return a sample.
-
drawSamples
(pop, numOfSamples)¶ Draw multiple samples and return a list of populations.
-
prepareSample
(pop, rearrange)¶ Prepare passed population object for sampling according to parameter
subPops
. If samples are drawn from the whole population, a Population will be trimmed if only selected (virtual) subpopulations are used. If samples are drawn separately from specified subpopulations, Populationpop
will be rearranged (ifrearrange==True
) so that each subpoulation corresponds to one element in parametersubPops
.
-
Function drawCombinedSample¶
-
simuPOP.sampling.
drawCombinedSample
(pop, samplers, idField='ind_id')¶ Draw different types of samples using a list of
samplers
. A Population consists of all individuals from these samples will be returned. AnidField
that stores an unique ID for all individuals is needed to remove duplicated individuals who are drawn multiple numOfSamples from these samplers.