Pedigrees

Create a pedigree object

A Pedigree object is basically a static population object that is used to track relationship between individuals. An unique ID is required for all individuals so that individuals could be identified easily using their IDs. Individuals in a pedigree usually have one or two information fields to record the IDs of their parents. Operators IdTagger and PedigreeTagger are usually used to maintain these information fields which are, although customizable, almost always ind_id, father_id and mother_id. After pedigrees are identified, population operations could be applied, for example, to extracted identified pedigrees from an existing population. This is basically how module simuPOP.sampling works.

A new pedigree can be created from a population object with an ID field (default to ind_id), and two optional parental ID fields (default to father_id and mother_id). For example,

ped = Pedigree(pop, infoFields=ALL_AVAIL)

will create a pedigree object from population pop with information fields ind_id, father_id and mother_id, copying all available information fields. The ID field should have an unique ID for each individual and the parental ID fields should record the ID of his or her parents. Genotype information and additional information fields can be copied to a pedigree object if needed. The population object is unchanged.

Another method is to directly convert a population object to a pedigree object, using member function asPedigree of a population class. For example,

pop.asPedigree()

will convert the existing population to a pedigree object. Object pop can then be able to call all pedigree member functions. Once your task is done, you can convert the object back to a population using the Pedigree.asPopulation() member function of the object.

A pedigree object can also be created from a file saved by function Pedigree.save() or operator PedigreeTagger using function loadPedigree. Please refer to section save and load pedigrees in details.

Locate close and remote relatives of each individual

A pedigree object provides several functions for you to identify spouse, sibling and more distant relatives of each individual. The results are stored to additional information fields of each individual. For example, if you would like to know the offspring of all individuals, you can call function Pedigree.locateRelatives as follows:

offFields = ['off1', 'off2', 'off3']
ped.addInfoFields(offFields)
ped.locateRelatives(OFFSPRING, resultFields=offFields)

This function will locate up to 3 (determined by the length of resultFields) offspring of each individual and put their IDs in specified informaton fields. This function allows you to identify spouses (it is common to have multiple spouses when random mating is used), outbred spouse (exclude spouses who share at least one of the parents), offspring (all offspring) and common offspring with a specified spouse, siblings (share at least one parent) and full siblings (share two parents). It also allows you to limit the result by sex and affection status (e.g. find only affected female offspring).

More distant relationship can be derived from these relationship using function Pedigree.traceRelatives. This function accepts a path of information fields and follows the path to identify relatives. For example

sibFields = ['sib1', 'sib2']
offFields = ['off1', 'off2', 'off3']
cousinFields = ['cousin1', 'cousin2', 'cousin3']
ped.addInfoFields(sibFields + offFields + cousinFields)
ped.locateRelatives(FULLSIBLING, resultFields=sibFields)
ped.locateRelatives(OFFSPRING, resultFields=offFields)
ped.traceRelatives([['father_id', 'mother_id'], sibFields, offFields],
    sex=[ANY_SEX, MALE_ONLY, FEMALE_ONLY],
    resultField=cousinFields)

would first identify full siblings and offspring of all individuals and then locate father or mother’s male sibling’s daughters. As you can imagine, this function can be used to track very complicated relationships.

This function also provides a function for you to identify individuals with specified relatives. Example locateRelative gives an example how to locate a grandfather with at least five grandchildren. With such information, functions such as Population.extractIndividuals() could be used to extract Pedigrees from a population. This is basically how simuPOP.sampling module works.

Example: Locate close and distant relatives of individuals

>>> import simuPOP as sim
>>> pop = sim.Population(1000, ancGen=2, infoFields=['ind_id', 'father_id', 'mother_id'])
>>> pop.evolve(
...     initOps=[
...         sim.InitSex(),
...         sim.IdTagger(),
...     ],
...     matingScheme=sim.RandomMating(
...         numOffspring=(sim.UNIFORM_DISTRIBUTION, 2, 4),
...         ops=[
...             sim.MendelianGenoTransmitter(),
...             sim.IdTagger(),
...             sim.PedigreeTagger()
...         ],
...     ),
...     gen = 5
... )
5
>>> ped = sim.Pedigree(pop)
>>> offFields = ['off%d' % x for x in range(4)]
>>> grandOffFields = ['grandOff%d' % x for x in range(5)]
>>> ped.addInfoFields(['spouse'] + offFields + grandOffFields)
>>> # only look spouse for fathers...
>>> ped.locateRelatives(sim.OUTBRED_SPOUSE, ['spouse'], sex=sim.FEMALE_ONLY)
>>> ped.locateRelatives(sim.COMMON_OFFSPRING, ['spouse'] + offFields)
>>> # trace offspring of offspring
>>> ped.traceRelatives([offFields, offFields], resultFields=grandOffFields)
True
>>> #
>>> IDs = ped.individualsWithRelatives(grandOffFields)
>>> # check on ID.
>>> grandFather = IDs[0]
>>> grandMother = ped.indByID(grandFather).spouse
>>> # some ID might be invalid.
>>> children = [ped.indByID(grandFather).info(x) for x in offFields]
>>> childrenSpouse = [ped.indByID(x).spouse for x in children if x >= 1]
>>> childrenParents = [ped.indByID(x).father_id for x in children if x >= 1] \
...     + [ped.indByID(x).mother_id for x in children if x >= 1]
>>> grandChildren = [ped.indByID(grandFather).info(x) for x in grandOffFields]
>>> grandChildrenParents = [ped.indByID(x).father_id for x in grandChildren if x >= 1] \
...     + [ped.indByID(x).mother_id for x in grandChildren if x >= 1]
>>>
>>> def idString(IDs):
...     uniqueIDs = list(set(IDs))
...     uniqueIDs.sort()
...     return ', '.join(['%d' % x for x in uniqueIDs if x >= 1])
...
>>> print('''GrandParents: %d, %d
... Children: %s
... Spouses of children: %s
... Parents of children: %s
... GrandChildren: %s
... Parents of grandChildren: %s ''' % \
... (grandFather, grandMother, idString(children), idString(childrenSpouse),
...     idString(childrenParents), idString(grandChildren), idString(grandChildrenParents)))
GrandParents: 3040, 3847
Children: 4078, 4079, 4080
Spouses of children: 4446, 4797
Parents of children: 3040, 3847
GrandChildren: 5188, 5189, 5879, 5880, 5881
Parents of grandChildren: 4078, 4079, 4446, 4797
>>>
>>> # let us look at the structure of this complete pedigree using another method
>>> famSz = ped.identifyFamilies()
>>> # it is amazing that there is a huge family that connects almost everyone
>>> len(famSz), max(famSz)
(533, 2383)
>>> # if we only look at the last two generations, things are much better
>>> ped.addInfoFields('ped_id')
>>> famSz = ped.identifyFamilies(pedField='ped_id', ancGens=[0,1])
>>> len(famSz), max(famSz)
(664, 114)

now exiting runScriptInteractively...

Download locateRelative.py

Save and load pedigrees

A complete pedigree, including ID, sex and affection status of each individual, IDs of their parents, and optionally values of some information fields and genotypes at some loci could be saved to a file, and be loaded using function loadPedigree. The loaded pedigree could be analyzed using pedigree functions, or be used to direct the evolution of another evolutionary process using a pedigree mating scheme.

A pedigree could be saved in two ways. In the first method, a pedigree could be created using the methods described above and be saved using function Pedigree.save(). However, if the population is large, recording all ancestral generations may not be feasible. If this is the case, you can use a PedigreeTagger operator to save individual information during the evolution. If you do not care about details of the top-most ancestral generation, a PedigreeTagger used in a mating scheme should be enough to record pedigree information of all offspring. Individual in the top-most generation who have offspring in the next generation will be constructed in loadPedigree. If you would like to include detailed information about all individuals in the top-most ancestral generation, you can use a PedigreeTagger in the initOps parameter of the Simulator.evolve() or Population.evolve() function.

Example saveLoadPedigree demonstrates how to use these functions to analyze the structure of a complete pedigree.

Example: Save and load a complete pedigree

>>> import simuPOP as sim
>>> pop = sim.Population(4, loci=1, infoFields=['ind_id', 'father_id', 'mother_id'],
...     ancGen=-1)
>>> pop.evolve(
...     initOps=[
...         sim.InitSex(),
...         sim.IdTagger(),
...         sim.InitGenotype(freq=[0.5, 0.5]),
...         sim.PedigreeTagger(output='>>pedigree.ped', outputLoci=0)
...     ],
...     matingScheme=sim.RandomMating(
...         ops=[
...             sim.MendelianGenoTransmitter(),
...             sim.IdTagger(),
...             sim.PedigreeTagger(output='>>pedigree.ped', outputLoci=0)
...         ],
...     ),
...     gen = 2
... )
2
>>> #
>>> print(open('pedigree.ped').read())
1 0 0 F U 0 0
2 0 0 F U 0 1
3 0 0 M U 1 1
4 0 0 M U 1 1
5 4 1 M U 0 1
6 4 2 F U 1 1
7 3 2 F U 0 1
8 3 2 M U 1 1
9 8 7 F U 1 1
10 5 6 M U 1 1
11 5 6 M U 1 1
12 5 7 F U 0 1

>>> pop.asPedigree()
>>> pop.save('pedigree1.ped', loci=0)
>>> print(open('pedigree1.ped').read())
1 0 0 F U 0 0
2 0 0 F U 0 1
3 0 0 M U 1 1
4 0 0 M U 1 1
5 4 1 M U 0 1
6 4 2 F U 1 1
7 3 2 F U 0 1
8 3 2 M U 1 1
9 8 7 F U 1 1
10 5 6 M U 1 1
11 5 6 M U 1 1
12 5 7 F U 0 1

>>> #
>>> ped = sim.loadPedigree('pedigree1.ped')
>>> sim.dump(ped, ancGens=range(3))
Ploidy: 2 (diploid)
Chromosomes:
1:  (AUTOSOME, 1 loci)
   (1)
Information fields:
ind_id father_id mother_id
population size: 4 (1 subpopulations with 4 Individuals)
Number of ancestral populations: 2

SubPopulation 0 (), 4 Individuals:
   0: FU 1 | 1 |  9 8 7
   1: MU 1 | 1 |  10 5 6
   2: MU 1 | 1 |  11 5 6
   3: FU 0 | 1 |  12 5 7

Ancestral population 1
SubPopulation 0 (), 4 Individuals:
   0: MU 0 | 1 |  5 4 1
   1: FU 1 | 1 |  6 4 2
   2: FU 0 | 1 |  7 3 2
   3: MU 1 | 1 |  8 3 2

Ancestral population 2
SubPopulation 0 (), 4 Individuals:
   0: FU 0 | 0 |  1 0 0
   1: FU 0 | 1 |  2 0 0
   2: MU 1 | 1 |  3 0 0
   3: MU 1 | 1 |  4 0 0

Download saveLoadPedigree.py