Module simuPOP.utils
¶
This module provides some commonly used operators and format conversion utilities.
class Trajectory¶
-
class
simuPOP.utils.
Trajectory
¶ A
Trajectory
object contains frequencies of one or more loci in one or more subpopulations over several generations. It is usually returned by member functions of classTrajectorySimulator
or equivalent global functionssimulateForwardTrajectory
andsimulateBackwardTrajectory
.The
Trajectory
object provides several member functions to facilitate the use of Trajectory-simulation techiniques. For example,Trajectory.func()
returns a trajectory function that can be provided directly to aControlledOffspringGenerator
;Trajectory.mutators()
provides a list ofPointMutator
that insert mutants at the right generations to initialize a trajectory.For more information about Trajectory simulation techniques and related controlled random mating scheme, please refer to the simuPOP user’s guide, and Peng et al (PLoS Genetics 3(3), 2007).
-
Trajectory
(endGen, nLoci)¶ Create a
Trajectory
object of alleles at nLoci loci with ending generation endGen. endGen is the generation when expected allele frequencies are reached after mating. Therefore, a trajectory for 1000 generations should haveendGen=999
.
-
freq
(gen, subPop)¶ Return frequencies of all loci in subpopulation subPop at generation gen of the simulated Trajectory. Allele frequencies are assumed to be zero if gen is out of range of the simulated Trajectory.
-
func
()¶ Return a Python function that returns allele frequencies for each locus at specified loci. If there are multiple subpopulations, allele frequencies are arranged in the order of
loc0_sp0
,loc1_sp0
, …,loc0_sp1
,loc1_sp1
, … and so on. The returned function can be supplied directly to thefreqFunc
parameter of a controlled random mating scheme (ControlledRandomMating
) or a homogeneous mating scheme that uses a controlled offspring generator (ControlledOffspringGenerator
).
-
mutants
()¶ Return a list of mutants in the form of (loc, gen, subPop)
-
mutators
(loci, inds=0, allele=1, *args, **kwargs)¶ Return a list of
PointMutator
operators that introduce mutants at the beginning of simulated trajectories. These mutators should be added to thepreOps
parameter ofSimulator.evolve
function to introduce a mutant at the beginning of a generation with zero allele frequency before mating, and a positive allele frequency after mating. A parameterloci
is needed to specify actual loci indexes in the real forward simulation. Other than default parametersinds=0
andallele=1
, additional parameters could be passed to point mutator as keyward parameters.
-
class TrajectorySimulator¶
-
class
simuPOP.utils.
TrajectorySimulator
¶ A Trajectory Simulator takes basic demographic and genetic (natural selection) information of an evolutionary process of a diploid population and allow the simulation of Trajectory of allele frequencies of one or more loci. Trajectories could be simulated in two ways: forward-time and backward-time. In a forward-time simulation, the simulation starts from certain allele frequency and simulate the frequency at the next generation using given demographic and genetic information. The simulation continues until an ending generation is reached. A Trajectory is successfully simulated if the allele frequency at the ending generation falls into a specified range. In a backward-time simulation, the simulation starts from the ending generation with a desired allele frequency and simulate the allele frequency at previous generations one by one until the allele gets lost (allele frequency equals zero).
The result of a trajectory simulation is a trajectory object which can be used to direct the simulation of a special random mating process that controls the evolution of one or more disease alleles so that allele frequencies are consistent across replicate simulations. For more information about Trajectory simulation techniques and related controlled random mating scheme, please refer to the simuPOP user’s guide, and Peng et al (PLoS Genetics 3(3), 2007).
-
TrajectorySimulator
(N, nLoci=1, fitness=None, logger=None)¶ Create a trajectory Simulator using provided demographic and genetic (natural selection) parameters. Member functions simuForward and simuBackward can then be used to simulate trajectories within certain range of generations. This class accepts the following parameters
- N
- Parameter N accepts either a constant number for population size (e.g. N=1000), a list of subpopulation sizes (e.g. N=[1000, 2000]), or a demographic function that returns population or subpopulation sizes at each generation. During the evolution, multiple subpopulations can be merged into one, and one population can be split into several subpopulations. The number of subpopulation is determined by the return value of the demographic function. Note that N should be considered as the population size at the end of specified generation.
- nLoci
- Number of unlinked loci for which trajectories of allele frequencies are simulated. We assume a diploid population with diallelic loci. The Trajectory represents frequencies of a
- fitness
- Parameter fitness can be
None
(no selection), a list of fitness values for genotype with 0, 1, and 2 disease alleles (AA, Aa, and aa) at one or more loci; or a function that returns fitness values at each generation. When multiple loci are involved (nLoci), fitness can be a list of 3 (the same fitness values for all loci), a list of 3*nLoci (different fitness values for each locus) or a list of 3**nLoci (fitness value for each combination of genotype). The fitness function should accept generation number and a subpopulation index. The latter parameter allows, and is the only way to specify different fitness in each subpopulation. - logger
- A logging object (see Python module
logging
) that can be used to output intermediate results with debug information.
-
simuBackward
(endGen, endFreq, minMutAge=None, maxMutAge=None, maxAttempts=1000)¶ Simulate trajectories of multiple disease susceptibility loci using a forward time approach. This function accepts allele frequencies of alleles of multiple unlinked loci (endFreq) at the end of generation endGen. Depending on the number of loci and subpopulations, parameter beginFreq can be a number (same frequency for all loci in all subpopulations), or a list of frequencies for each locus (same frequency in all subpopulations), or a list of frequencies for each locus in each subpopulation in the order of
loc0_sp0
,loc1_sp0
, …,loc0_sp1
,loc1_sp1
, … and so on.This simulator will simulate a trajectory generation by generation and restart if the disease allele got fixed (instead of lost), or if the length simulated Trajectory does not fall into minMutAge and maxMutAge (ignored if
None
is given). This simulator will returnNone
if no valid Trajectory is found aftermaxAttempts
attemps.
-
simuForward
(beginGen, endGen, beginFreq, endFreq, maxAttempts=10000)¶ Simulate trajectories of multiple disease susceptibility loci using a forward time approach. This function accepts allele frequencies of alleles of multiple unlinked loci at the beginning generation (
freq
) at generationbeginGen
, and expected range of allele frequencies of these alleles (endFreq
) at the end of generationendGen
. Depending on the number of loci and subpopulations, these parameters accept the following inputs:- beginGen
- Starting generation. The initial frequecies are considered as frequencies at the beginning of this generation.
- endGen
- Ending generation. The ending frequencies are considerd as frequencies at the end of this generation.
- beginFreq
- The initial allele frequency of involved loci in all subpopulations.
It can be a number (same frequency for all loci in all
subpopulations), or a list of frequencies for each locus (same
frequency in all subpopulations), or a list of frequencies for each
locus in each subpopulation in the order of
loc0_sp0
,loc1_sp0
, …,loc0_sp1
,loc1_sp1
, … and so on. - endFreq
- The range of acceptable allele frequencies at the ending generation.
The ranges can be specified for all loci in all subpopulations,
for all loci (allele frequency in the whole population is
considered), or for all loci in all subpopulations, in the order
of
loc0_sp0
,loc1_sp0
, ….loc0_sp1
, … and so on.
This simulator will simulate a trajectory generation by generation and restart if the resulting frequencies do not fall into specified range of frequencies. This simulator will return
None
if no valid Trajectory is found aftermaxAttempts
attemps.
-
Function simulateForwardTrajectory¶
-
simuPOP.utils.
simulateForwardTrajectory
(N, beginGen, endGen, beginFreq, endFreq, nLoci=1, fitness=None, maxAttempts=10000, logger=None)¶ Given a demographic model (N) and the fitness of genotype at one or more loci (fitness), this function simulates a trajectory of one or more unlinked loci (nLoci) from allele frequency freq at generation beginGen forward in time, until it reaches generation endGen. A
Trajectory
object will be returned if the allele frequency falls into specified ranges (endFreq).None
will be returned if no valid Trajectory is simulated aftermaxAttempts
attempts. Please refer to classTrajectory
,TrajectorySimulator
and their member functions for more details about allowed input for these parameters. If a logger object is given, it will send detailed debug information atDEBUG
level and ending allele frequencies at theINFO
level. The latter can be used to adjust your fitness model and/or ending allele frequency if a trajectory is difficult to obtain because of parameter mismatch.
Function simulateBackwardTrajectory¶
-
simuPOP.utils.
simulateBackwardTrajectory
(N, endGen, endFreq, nLoci=1, fitness=None, minMutAge=None, maxMutAge=None, maxAttempts=1000, logger=None)¶ Given a demographic model (N) and the fitness of genotype at one or more loci (fitness), this function simulates a trajectory of one or more unlinked loci (nLoci) from allele frequency freq at generation endGen backward in time, until all alleles get lost. A
Trajectory
object will be returned if the length of simulated Trajectory withminMutAge
andmaxMutAge
(if specified).None
will be returned if no valid Trajectory is simulated aftermaxAttempts
attempts. Please refer to classTrajectory
,TrajectorySimulator
and their member functions for more details about allowed input for these parameters. If a logger object is given, it will send detailed debug information atDEBUG
level and ending generation and frequency at theINFO
level. The latter can be used to adjust your fitness model and/or ending allele frequency if a trajectory is difficult to obtain because of parameter mismatch.
class ProgressBar¶
-
class
simuPOP.utils.
ProgressBar
¶ The
ProgressBar
class defines a progress bar. This class will use a text-based progress bar that outputs progressing dots (.) with intermediate numbers (e.g. 5 for 50%) under a non-GUI mode (gui=False
) or not displaying any progress bar ifgui='batch'
. In the GUI mode, a Tkinter or wxPython progress dialog will be used (gui=Tkinter
orgui=wxPython
). The default mode is determined by the global gui mode of simuPOP (see alsosimuOpt.setOptions
).This class is usually used as follows:
progress = ProgressBar("Start simulation", 500) for i in range(500): # i+1 can be ignored if the progress bar is updated by 1 step progress.update(i+1) # if you would like to make sure the done message is displayed. progress.done()
-
ProgressBar
(message, totalCount, progressChar='.', block=2, done=' Done.n', gui=None)¶ Create a progress bar with
message
, which will be the title of a progress dialog or a message for textbased progress bar. ParametertotalCount
specifies total expected steps. If a text-based progress bar is used, you could specified progress character and intervals at which progresses will be displayed using parametersprogressChar
andblock
. A ending message will also be displayed in text mode.
-
done
()¶ Finish progressbar, print ‘done’ message if in text-mode.
-
update
(count=None)¶ Update the progreebar with
count
steps done. The dialog or textbar may not be updated if it is updated by full percent(s). Ifcount
isNone
, the progressbar increases by one step (not percent).
-
Function viewVars¶
-
simuPOP.utils.
viewVars
(var, gui=None)¶ - list a variable in tree format, either in text format or in a
- wxPython window.
- var
- A dictionary variable to be viewed. Dictionary wrapper objects returned
by
Population.dvars()
andSimulator.dvars()
are also acceptable. - gui
- If gui is
False
or'Tkinter'
, a text presentation (use the pprint module) of the variable will be printed to the screen. If gui is'wxPython'
and wxPython is available, a wxPython windows will be used. The default mode is determined by the global gui mode (see alsosimuOpt.setOptions
).
Function saveCSV¶
-
simuPOP.utils.
saveCSV
(pop, filename='', infoFields=[], loci=True, header=True, subPops=ALL_AVAIL, genoFormatter=None, infoFormatter=None, sexFormatter={1: 'M', 2: 'F'}, affectionFormatter={True: 'A', False: 'U'}, sep=', ', **kwargs)¶ This function is deprecated. Please use
export(format='csv')
instead. Save a simuPOP populationpop
in csv format. Columns of this file is arranged in the order of information fields (infoFields
), sex (ifsexFormatter
is notNone
), affection status (ifaffectionFormatter
is notNone
), and genotype (ifgenoFormatter
is notNone
). This function only output individuals in the present generation of populationpop
. This function accepts the following parameters:- pop
- A simuPOP population object.
- filename
- Output filename. Leading ‘>’ characters are ignored. However, if the first
character of this filename is ‘!’, the rest of the name will be evalulated
in the population’s local namespace. If
filename
is empty, the content will be written to the standard output. - infoFileds
- Information fields to be outputted. Default to none.
- loci
- If a list of loci is given, only genotype at these loci will be
written. Default to
ALL_AVAIL
, meaning all available loci. You can set this parameter to[]
if you do not want to output any genotype. - header
- Whether or not a header should be written. These headers will include
information fields, sex (if
sexFormatter
is notNone
), affection status (ifaffectionFormatter
is notNone
) and loci names. If genotype at a locus needs more than one column,_1
,_2
etc will be appended to loci names. Alternatively, a complete header (a string) or a list of column names could be specified directly. - subPops
- A list of (virtual) subpopulations. If specified, only individuals from these subpopulations will be outputed.
- infoFormatter
- A format string that is used to format all information fields. If
unspecified,
str(value)
will be used for each information field. - genoFormatter
- How to output genotype at specified loci. Acceptable values include
None
(output allele names), a dictionary with genotype as keys, (e.g.genoFormatter={(0,0):1, (0,1):2, (1,0):2, (1,1):3}
, or a function with genotype (as a tuple of integers) as inputs. The dictionary value or the return value of this function can be a single or a list of number or strings. - sexFormatter
- How to output individual sex. Acceptable values include
None
(no output) or a dictionary with keysMALE
andFEMALE
. - affectionFormatter
- How to output individual affection status. Acceptable values include
None
(no output) or a dictionary with keysTrue
andFalse
.
Parameters
genoCode
,sexCode
, andaffectionCode
from version 1.0.0 have been renamed togenoFormatter
,sexFormatter
andaffectionFormatter
but can still be used.
class Exporter¶
-
class
simuPOP.utils.
Exporter
¶ An operator to export the current population in specified format. Currently supported file formats include:
STRUCTURE (http://pritch.bsd.uchicago.edu/structure.html). This format accepts the following parameters:
- markerNames
- If set to True (default), output names of loci that are specified by parameter
lociNames of the
Population
class. No names will be outputted if loci are anonymous. A list of loci names are acceptable which will be outputted directly. - recessiveAlleles
- If specified, value of this parameter will be outputted after the marker names line.
- interMarkerDistances
- If set to True (default), output distances between markers. The first marker of each chromosome has distance -1, as required by this format.
- phaseInformation
- If specified, output the value (0 or 1) of this parameter after the inter marker distances line. Note that simuPOP populations always have phase information.
- label
- Output 1-based indexes of individuals if this parameter is true (default)
- popData
- Output 1-based index of subpopulation if this parameter is set to true (default).
- popFlag
- Output value of this parameter (0 or 1) after popData if this parameter specified.
- locData
- Name of an information field with location information of each individual. Default to None (no location data)
- phenotype
- Name of an information field with phenotype information of each individual. Default to None (no phenotype)
Genotype information are always outputted. Alleles are coded the same way (0, 1, 2, etc) as they are stored in simuPOP.
GENEPOP (http://genepop.curtin.edu.au/). The genepop format accepts the following parameters:
- title
- The tile line. If unspecified, a line similar to ‘produced by simuPOP on XXX’ will be outputted.
- adjust
- Adjust values of alleles by specified value (1 as default). This adjustment is necessary in many cases because GENEPOP treats allele 0 as missing values, and simuPOP treats allele 0 as a valid allele. Exporting alleles 0 and 1 as 1 and 2 will allow GENEPOP to analyze simuPOP-exported files correctly.
Because 0 is reserved as missing data in this format, allele A is outputted as A+adjust. simuPOP will use subpopulation names (if available) and 1-based individual index to output individual label (e.g. SubPop2-3). If parameter subPops is used to output selected individuals, each subpop will be outputted as a separate subpopulation even if there are multiple virtual subpopulations from the same subpopulation. simuPOP currently only export diploid populations to this format.
FSTAT (http://www2.unil.ch/popgen/softwares/fstat.htm). The fstat format accepts the following parameters:
- lociNames
- Names of loci that will be outputted. If unspecified, simuPOP will try to use
names of loci that are specified by parameter lociNames of the
Population
class, or names in the form of chrX-Y. - adjust
- Adjust values of alleles by specified value (1 as default). This adjustment is necessary in many cases because FSTAT treats allele 0 as missing values, and simuPOP treats allele 0 as a valid allele. Exporting alleles 0 and 1 as 1 and 2 will allow FSTAT to analyze simuPOP-exported files correctly.
MAP (marker information format) output information about each loci. Each line of the map file describes a single marker and contains chromosome name, locus name, and position. Chromosome and loci names will be the names specified by parameters
chromNames
andlociNames
of thePopulation
object, and will be chromosome index + 1, and ‘.’ if these parameters are not specified. This format output loci position to the third column. If the unit assumed in your population does not match the intended unit in the MAP file, (e.g. you would like to output position in basepair while the population uses Mbp), you can use parameterposMultiplier
to adjust it. This format accepts the following parameters:- posMultiplier
- A number that will be multiplied to loci positions (default to 1). The result will be outputted in the third column of the output.
PED (Linkage Pedigree pre MAKEPED format), with columns of family, individual, father mother, gender, affection status and genotypes. The output should be acceptable by HaploView or plink, which provides more details of this format in their documentation. If a population does not have
ind_id
,father_id
ormother_id
, this format will output individuals in specified (virtual) subpopulations in the current generation (parental generations are ignored) as unrelated individuals with 0, 0 as parent IDs. An incremental family ID will be assigned for each individual. If a population haveind_id
,father_id
andmother_id
, parents will be recursively traced to separate all individuals in a (multigenerational) population into families of related individuals. father and mother id will be set to zero if one of them does not exist. This format uses 1 for MALE, 2 for FEMALE. If phenoField isNone
, individual affection status will be outputted with 1 for Unaffected and 2 for affected. Otherwise, values of an information field will be outputted as phenotype. Because 0 value indicates missing value, values of alleles will be adjusted by 1 by default, which should be avoided if you are using non-zero alleles to model ACTG alleles in simuPOP. This format will ignore subpopulation structure because parents might belong to different subpopulations. This format accepts the following parameters:- idField
- A field for individual id, default to
ind_id
. Value at this field will be individual ID inside a pedigree. - fatherField
- A field for father id, default to
father_id
. Value at this field will be used to output father of an individual, if an individual with this ID exists in the population. - motherField
- A field for mother id, default to
mother_id
. Value at this field will be used to output mother of an individual, if an individual with this ID exists in the population. - phenoField
- A field for individual phenotype that will be outputted as the sixth column of
the PED file. If
None
is specified (default), individual affection status will be outputted (1 for unaffected and 2 for affected). - adjust
- Adjust values of alleles by specified value (1 as default). This adjustment is necessary in many cases because LINKAGE/PED format treats allele 0 as missing values, and simuPOP treats allele 0 as a valid allele. You should set this paremter to zero if you have already used alleles 1, 2, 3, 4 to model A, C, T, and G alleles.
Phylip (Joseph Felsenstein’s Phylip format). Phylip is generally used for nuclotide sequences and protein sequences. This makes this format suitable for simulations of haploid populations (ploidy=1) with nucleotide or protein sequences (number of alleles = 4 or 24 with alleleNames as nucleotide or amino acid names). If your population does satisfy these conditions, you can still export it, with homologous chromosomes in a diploid population as two sequences, and with specified allele names for allele 0, 1, 2, …. This function outputs sequence name as SXXX where XXX is the 1-based index of individual and SXXX_Y (Y=1 or 2) for diploid individuals, unless names of sequences are provided by parameter seqNames. This format supports the following parameters:
- alleleNames
- Names of alleles 0, 1, 2, … as a single string (e.g. ‘ACTG’) or a list of single-character strings (e.g. [‘A’, ‘C’, ‘T’, ‘G’]). If this parameter is unspecified (default), this program will try to use names of alleles specified in alleleNames parameter of a Population, and raise an error if no name could be found.
- seqNames
- Names of each sequence outputted, for each individual, or for each sequences for non-haploid population. If unspecified, default names such as SXXX or SXXX_Y will be used.
- style
- Output style, can be ‘sequential’ (default) or ‘interleaved’. For sequential output, each sequence consists of for the first line a name and 90 symbols starting from column 11, and subsequent lines of 100 symbols. The interleaved style have subsequent lines as separate blocks.
MS (output from Richard R. Hudson’s MS or msHOT program). This format records genotypes of SNP markers at segregating site so all non-zero genotypes are recorded as 1. simuPOP by default outputs a single block of genotypes at all loci on the first chromosome, and for all individuals, unless parameter
splitBy
is specified to separate genotypes by chromosome or subpopulations.- splitBy:
- simuPOP by default output segregating sites at all loci on the first
chromosome for all individuals. If
splitBy
is set to'subPop'
, genotypes for individuals in all or specified (parametersubPops
) subpopulations are outputted in separate blocks. The subpopulations should have the same number of individuals to produce blocks of the same number of sequences. Alternatively,splitBy
can be set tochrom
, for which genotypes on different chromosomes will be outputted separately.
CSV (comma separated values). This is a general format that output genotypes in comma (or tab etc) separated formats. The function form of this operator
export(format='csv')
is similar to the now-deprecatedsaveCSV
function, but its interface has been adjusted to match other formats supported by this operator. This format outputs a header (optiona), and one line for each individual with values of specified information fields, sex, affection status, and genotypes. All fields except for genotypes are optional. The output format is controlled by the following parameters:- infoFileds
- Information fields to be outputted. Default to none.
- header
- Whether or not a header should be written. These headers will include
information fields, sex (if
sexFormatter
is notNone
), affection status (ifaffectionFormatter
is notNone
) and loci names. If genotype at a locus needs more than one column,_1
,_2
etc will be appended to loci names. Alternatively, a complete header (a string) or a list of column names could be specified directly. - infoFormatter
- A format string that is used to format all information fields. If
unspecified,
str(value)
will be used for each information field. - genoFormatter
- How to output genotype at specified loci. Acceptable values include
None
(output allele values), a dictionary with genotype as keys, (e.g.genoFormatter={(0,0):1, (0,1):2, (1,0):2, (1,1):3}
, or a function with genotype (as a tuple of integers) as inputs. The dictionary value or the return value of this function can be a single or a list of number or strings. - sexFormatter
- How to output individual sex. Acceptable values include
None
(no output) or a dictionary with keysMALE
andFEMALE
. - affectionFormatter
- How to output individual affection status. Acceptable values include
None
(no output) or a dictionary with keysTrue
andFalse
. - delimiter
- Delimiter used to separate values, default to ‘,’.
- subPopFormatter
- How to output population membership. Acceptable values include
None
(no output), a string that will be used for the column name, orTrue
which uses ‘pop’ as the column name. If present, the column is written with the string represenation of the (virtual) subpopulation.
This operator supports the usual applicability parameters such as begin, end, step, at, reps, and subPops. If subPops are specified, only individuals from specified (virtual) subPops are exported. Similar to other operators, parameter
output
can be an output specification string (filename
,>>filename
,!expr
), filehandle (or any Python object with awrite
function), any python function. Unless explicitly stated for a particular format, this operator exports individuals from the current generation if there are multiple ancestral generations in the population.The Exporter class will make use of a progress bar to show the progress. The interface of the progress bar is by default determined by the global GUI status but you can also set it to, for example,
gui=False
to forcefully use a text-based progress bar, orgui='batch'
to suppress the progress bar.-
Exporter
(format, output, begin=0, end=-1, step=1, at=[], reps=True, subPops=ALL_AVAIL, infoFields=[], gui=None, *args, **kwargs)¶ Usage:
- PyOperator(func, param=None, begin=0, end=-1, step=1, at=[],
- reps=ALL_AVAIL, subPops=ALL_AVAIL, infoFields=[])
Details:
Create a pure-Python operator that calls a user-defined function when it is applied. If this operator is applied before or after mating, your function should have form func(pop) or func(pop, param) where pop is the population to which the operator is applied, param is the value specified in parameter param. param will be ignored if your function only accepts one parameter. Althernatively, the function should have form func(ind) with optional parameters pop and param. In this case, the function will be called for all individuals, or individuals in subpopulations subPops. Individuals for which the function returns False will be removed from the population. This operator can therefore perform similar functions as operator DiscardIf. If this operator is applied during mating, your function should accept parameters pop, off (or ind), dad, mom and param where pop is the parental population, and off or ind, dad, and mom are offspring and their parents for each mating event, and param is an optional parameter. If subPops are provided, only offspring in specified (virtual) subpopulations are acceptable. This operator does not support parameters output, and infoFields. If certain output is needed, it should be handled in the user defined function func. Because the status of files used by other operators through parameter output is undetermined during evolution, they should not be open or closed in this Python operator.
Function importPopulation¶
-
simuPOP.utils.
importPopulation
(format, filename, *args, **kwargs)¶ This function import and return a population from a file filename in specified format. Format-specific parameters can be used to define how the input should be interpreted and imported. This function supports the following file format.
GENEPOP (http://genepop.curtin.edu.au/). For input file of this format, this function ignores the first title line, load the second line as loci names, and import genotypes of different POP sections as different subpopulations. This format accepts the following parameters:
- adjust
- Adjust alleles by specified value (default to 0 for no adjustment). This parameter is mostly used to convert alleles 1 and 2 in a GenePop file to alleles 0 and 1 (with adjust=-1) in simuPOP. Negative allele (e.g. missing value 0) will be imported as regular allele with module-dependent values (e.g. -1 imported as 255 for standard module).
FSTAT (http://www2.unil.ch/popgen/softwares/fstat.htm). This format accepts the following parameters:
- adjust
- Adjust alleles by specified value (default to 0 for no adjustment). This parameter is mostly used to convert alleles 1 and 2 in a GenePop file to alleles 0 and 1 (with adjust=-1) in simuPOP. Negative allele (e.g. missing value 0) will be imported as regular allele with module-dependent values (e.g. -1 imported as 255 for standard module).
Phylip (Joseph Felsenstein’s Phylip format). This function ignores sequence names and import sequences in a haploid (default) or diploid population (if there are even number of sequences). An list of allele names are required to translate symbols to allele names. This format accepts the following parameters:
- alleleNames
- Names of alleles 0, 1, 2, … as a single string (e.g. ‘ACTG’) or a list of single-character strings (e.g. [‘A’, ‘C’, ‘T’, ‘G’]). This will be used to translate symbols into numeric alleles in simuPOP. Allele names will continue to be used as allele names of the returned population.
- ploidy
- Ploidy of the returned population, default to 1 (haploid). There should be even number of sequences if ploidy=2 (haploid) is specified.
MS (output from Richard R. Hudson’s MS or msHOT program). The ms program generates npop blocks of nseq haploid chromosomes for command starting with
ms nsample nrepeat
. By default, the result is imported as a haploid population of size nsample. The population will have nrepeat subpopulations each with the same number of loci but different number of segregating sites. This behavior could be changed by the following parameters:- ploidy
- If
ploidy
is set to 2, the sequenences will be paired so the population will havenseq/2
individuals. An error will be raised if an odd number of sequences are simulated. - mergeBy
- By default, replicate samples will be presented as subpopulations. All
individuals have the same number of loci but individuals in different
subpopulations have different segregating sites. If
mergeBy
is set to"chrom"
, the replicates will be presented as separate chromosomes, each with a different set of loci determined by segregating sites.