Many breeding programmes have a more complex structure compared to what we have seen up to now in this course. Here we will look into an animal breeding programme that is still quite simple, but shows more complexity. Specifically, in animal breeding, selection intensity often differs substantially between males and females; its often higher in males. This is driven by the need to have a substantial number of dams (mothers) to generate selection candidates and dam replacements. Also, sires (fathers) can generate many progeny, either through natural mating or artificial insemination. Another complexity in real breeding programmes is that parents can be used across multiple generations, which generates a breeding programme with overlapping generations.
In this vignette, we will show how to simulate an animal breeding programme with overlapping generations and different selection intensities in males and females. To this end, we will simulate a population with 1000 cows of different ages mated with 50 bulls. The key trait of interest will be weaning weight of calves. We will also show how to record data across generations for a later analysis. We will achieve all this by:
We will start the simulation by simulating founder genomes, allocating sex of individuals, defining a trait, and initiating a base population. We will simulate a cattle genome with 30 chromosomes for 2000 founders. The trait under selection will be weaning weight of calves, with a mean of 250 kg, phenotype variance of 400 kg\(^2\), and heritability of 0.3. The trait will have a simple additive genetic architecture with 100 QTL on each chromosome. In cattle, males are on average heavier than females, due to sexual dimorphism. While such an effect can also be added in AlphaSimR, we will here ignore this effect for the sake of simplicity.
# Clean the working environment
rm(list = ls())
# Set the default plot layout
par(mfrow = c(1, 1))
# Load AlphaSimR
library(AlphaSimR)
## Loading required package: R6
# Simulate founder genomes
# ... this runMacs() call will take quite a bit of time!
# founderGenomes = runMacs(nInd = 2000,
# nChr = 30,
# segSites = 100,
# species = "CATTLE")
# ... therefore, we will speed up the demonstration with the quickHaplo()
# (we recommend use of runMacs() for research purposes!)
founderGenomes = quickHaplo(nInd = 2000,
nChr = 30,
segSites = 100)
# Global simulation parameters
SP = SimParam$new(founderGenomes)
SP$setSexes("yes_sys")
phenoVar = 400
heritability = 0.3
genVar = phenoVar * heritability
SP$addTraitA(nQtlPerChr = 100, mean = 250, var = genVar)
# Base population
founders = newPop(founderGenomes)
# Phenotype the base population
founders = setPheno(pop = founders, h2 = heritability)
In this simulation we will work with individuals born in different
years. To keep track of these individuals, we will save them in
different R objects. But we will also combine the individuals with
different birth years into one object. To keep track of the different
years or birth for these individuals, we will store this additional
information in the misc
(=miscellaneous) slot of AlphaSimR
populations. This slot is a list
(type of R object) of
length equal to the number of individuals. We can store in the
misc
slot any information we want. There are two helper
functions, setMisc()
and getMisc()
that you
can use to work with the misc
slot. See
help(setMisc)
and help(getMisc)
to learn more
about these two functions. You can also use standard R list code to work
with the misc
object.
# Assign year of birth for the founders
year = 0
founders = setMisc(x = founders,
node = "yearOfBirth",
value = year)
head(getMisc(x = founders, node = "yearOfBirth"))
## [[1]]
## [1] 0
##
## [[2]]
## [1] 0
##
## [[3]]
## [1] 0
##
## [[4]]
## [1] 0
##
## [[5]]
## [1] 0
##
## [[6]]
## [1] 0
Structure of the simulated breeding programme is shown in Figure 1. In the following we explain this breeding structure and show how to simulate it.
Figure 1: Simulated beef breeding programme with 1000 cows (dams) and 50 bulls (sires) of different ages and different selection intensity.
Now we will take founder males and select 50 superior males as sires (fathers) of the next generation. These 50 sires will be represented by 40 young sires (1 year old) and 10 older sires (2 years old). The rationale here is that better sires will be used for 2 years (2 generations), while we will replace most of the sires with a new generation of genetically improved males. This “male selection path” is shown in the right part of Figure 1.
males = selectInd(pop = founders, nInd = 50, use = "pheno", sex = "M")
sires2 = males[ 1:10]
sires2 = setMisc(x = sires2,
node = "yearOfBirth",
value = -1)
sires1 = males[11:50]
sires1 = setMisc(x = sires1,
node = "yearOfBirth",
value = 0)
sires = c(sires2, sires1)
nInd(sires)
## [1] 50
Let’s inspect year of birth for the sires.
table(unlist(getMisc(x = sires, node = "yearOfBirth")))
##
## -1 0
## 10 40
For generating the selection candidates and future production females
we require a sufficient number of dams (mothers). Specifically, to
generate 1000 progeny every year, we require 1000 dams. In this
simulation we used option SP$setSexes("yes_sys")
that will
systematically assign sexes to newly created individuals, so we will get
exactly 50% male and 50% female progeny. We will take the founder
females and use most of them as dams. These dams will be assumed to have
different ages because beef cows stay in the herd for multiple years and
have multiple calves through their lifetime. In this simulation, we will
assume that dams stay in the herd for up to 5 years, but that every year
we only keep a certain number of phenotypically best dams. This “female
selection path” is shown in the left part of Figure 1.
In reality the realised sex ratio is not strictly 50:50, and you
might consider using option SP$setSexes("yes_rand")
that
will randomly assign a sex to each individual. In that case, the number
of females will fluctuate from year to year and you will also have to
modify the code to ensure a stable number of dams over the years. For
simplicity, in this example we use the systematic assignment of sexes
and a constant number of dams.
First, we need to select only female founders and set the number of dams kept in each year.
cat("Founder females\n")
## Founder females
(nFemales = sum(founders@sex == "F"))
## [1] 1000
females = selectInd(pop = founders, nInd = nFemales, use = "pheno", sex = "F")
# Here we define how many dams are kept in each year
nDams1 = 500
nDams2 = 250
nDams3 = 150
nDams4 = 75
nDams5 = 25
sum(nDams1, nDams2, nDams3, nDams4, nDams5)
## [1] 1000
Now we will select the oldest group of dams.
cat("Dams5\n")
## Dams5
(start = 1)
## [1] 1
(end = nDams5)
## [1] 25
dams5 = females[start:end]
dams5 = setMisc(x = dams5,
node = "yearOfBirth",
value = -4)
nInd(dams5)
## [1] 25
And second oldest group of dams.
cat("Dams4\n")
## Dams4
(start = end + 1)
## [1] 26
(end = start - 1 + nDams4)
## [1] 100
dams4 = females[start:end]
dams4 = setMisc(x = dams4,
node = "yearOfBirth",
value = -3)
nInd(dams4)
## [1] 75
And the other group of dams.
cat("Dams3\n")
## Dams3
(start = end + 1)
## [1] 101
(end = start - 1 + nDams3)
## [1] 250
dams3 = females[start:end]
dams3 = setMisc(x = dams3,
node = "yearOfBirth",
value = -2)
nInd(dams3)
## [1] 150
cat("Dams2\n")
## Dams2
(start = end + 1)
## [1] 251
(end = start - 1 + nDams2)
## [1] 500
dams2 = females[start:end]
dams2 = setMisc(x = dams2,
node = "yearOfBirth",
value = -1)
nInd(dams2)
## [1] 250
cat("Dams1\n")
## Dams1
(start = end + 1)
## [1] 501
(end = start - 1 + nDams1)
## [1] 1000
dams1 = females[start:end]
dams1 = setMisc(x = dams1,
node = "yearOfBirth",
value = 0)
nInd(dams1)
## [1] 500
dams = c(dams5, dams4, dams3, dams2, dams1)
nInd(dams)
## [1] 1000
Let’s inspect year of birth for the dams
table(unlist(getMisc(x = dams, node = "yearOfBirth")))
##
## -4 -3 -2 -1 0
## 25 75 150 250 500
To record data from multiple populations, we will define a data
recording function recordData()
. As an input, the function
will accept: 1) a data frame (data
argument) that will
collate the information from multiple AlphaSimR populations, 2) an
AlphaSimR population (pop
argument) whose data we will
save, and 3) a year of use (yearOfUse
argument) to denote
parent usage.
In this example, we will be storing animal identification
(id
), parents’ identification’s (father
and
mother
), sex (sex
), genetic value
(gv
), phenotype value (pheno
), year of birth
(yearOfBirth
), and year of use for parents
(yearOfUse
).
# Function to record and collate data
recordData <- function(data = NULL, pop, yearOfUse = NA) {
popData = data.frame(id = pop@id,
father = pop@father,
mother = pop@mother,
sex = pop@sex,
gv = pop@gv[, "Trait1"],
pheno = pop@pheno[, "Trait1"],
yearOfBirth = unlist(getMisc(x = pop, node ="yearOfBirth")),
yearOfUse = yearOfUse)
# Manage first instance of calling this function, when data is NULL
if (is.null(data)) {
ret = popData
} else {
ret = rbind(data, popData)
}
return(ret)
}
We will create two data frames. The first one will be called
data4AllAnimals
, where we will store the data for all
animals. The second one will be called data4Parents
, where
we will the store data for parents.
data4AllAnimals = recordData(pop = founders)
head(data4AllAnimals)
## id father mother sex gv pheno yearOfBirth yearOfUse
## 1 1 0 0 M 260.9848 245.9394 0 NA
## 2 2 0 0 F 237.3948 247.9825 0 NA
## 3 3 0 0 M 245.2977 265.6804 0 NA
## 4 4 0 0 F 264.7087 275.0198 0 NA
## 5 5 0 0 M 277.2140 239.7615 0 NA
## 6 6 0 0 F 253.9641 257.6694 0 NA
data4AllParents = recordData(pop = c(sires, dams), yearOfUse = year) # year is 0 at this stage in the script
head(data4AllParents)
## id father mother sex gv pheno yearOfBirth yearOfUse
## 1 1073 0 0 M 262.6554 314.8278 -1 0
## 2 1691 0 0 M 257.0333 303.2053 -1 0
## 3 1049 0 0 M 271.6861 303.1341 -1 0
## 4 1153 0 0 M 272.3277 302.5384 -1 0
## 5 571 0 0 M 273.6652 299.2463 -1 0
## 6 445 0 0 M 265.5089 295.5297 -1 0
data4NewParents = recordData(pop = c(sires1, dams1))
head(data4NewParents)
## id father mother sex gv pheno yearOfBirth yearOfUse
## 1 1215 0 0 M 259.9152 289.7896 0 NA
## 2 777 0 0 M 273.8545 289.4477 0 NA
## 3 737 0 0 M 275.7357 289.0295 0 NA
## 4 1755 0 0 M 240.0897 288.8603 0 NA
## 5 917 0 0 M 253.4514 288.8023 0 NA
## 6 297 0 0 M 275.9568 288.5819 0 NA
We will simulate 20 years of the breeding programme. As mentioned before, we will select 50 phenotypically best males as sires, 40 young (1 year old) and 10 old (2 years old). We will be generating 1000 progeny. To generate these progeny we will require 1000 dams, that will be used up to 5 years, with only phenotypically best dams staying in the herd for longer.
During this simulation we will record the data, as before, for all the newly generated individuals, all parents, and newly selected parents.
for (year in 1:20) {
cat("Working on the year:", year, "\n")
# Generate progeny from current dams and sires
candidates = randCross2(males = sires, females = dams, nCrosses = nInd(dams))
candidates = setMisc(x = candidates, node = "yearOfBirth", value = year)
candidates = setPheno(candidates, h2 = heritability)
# Record data for all newborn animals
data4AllAnimals = recordData(data = data4AllAnimals,
pop = candidates)
# Record data for the used sires and dams (young and old)
data4AllParents = recordData(data = data4AllParents,
pop = c(sires, dams),
yearOfUse = year)
# Update and select sires
sires2 = selectInd(pop = sires1, nInd = 10, use = "pheno")
sires1 = selectInd(pop = candidates, nInd = 40, use = "pheno", sex = "M")
sires = c(sires2, sires1)
# Update and select dams
dams5 = selectInd(pop = dams4, nInd = nDams5, use = "pheno")
dams4 = selectInd(pop = dams3, nInd = nDams4, use = "pheno")
dams3 = selectInd(pop = dams2, nInd = nDams3, use = "pheno")
dams2 = selectInd(pop = dams1, nInd = nDams2, use = "pheno")
dams1 = selectInd(pop = candidates, nInd = nDams1, use = "pheno", sex = "F")
dams = c(dams5, dams4, dams3, dams2, dams1)
# Record data for the newly selected sires and dams (just the new ones)
data4NewParents = recordData(data = data4NewParents,
pop = c(sires1, dams1))
}
## Working on the year: 1
## Working on the year: 2
## Working on the year: 3
## Working on the year: 4
## Working on the year: 5
## Working on the year: 6
## Working on the year: 7
## Working on the year: 8
## Working on the year: 9
## Working on the year: 10
## Working on the year: 11
## Working on the year: 12
## Working on the year: 13
## Working on the year: 14
## Working on the year: 15
## Working on the year: 16
## Working on the year: 17
## Working on the year: 18
## Working on the year: 19
## Working on the year: 20
To summarise response to selection over the years, we will show
distributions of phenotype and genetic values for males and females
separately because selection intensity is different between the two
sexes in this breeding programme. Furthermore, we will show it for
newborn animals (selection candidates), all parents, and for selected
animals (parents). To do this, we will use two additional R packages:
ggplot2
and ggridges
because they enable a
quick generation of quite sophisticated plots. The ggplot2
package and its wider ecosystem is very powerful and versatile. We
warmly recommend you study it further. Here we will simply use the
following plotting code, without delving into details.
# Install additional packages for plotting
install.packages(pkg = c("ggplot2", "ggridges"), repos = "https://cloud.r-project.org")
##
## The downloaded binary packages are in
## /var/folders/vv/pqxjdcs111b2_mv0jc5kn_8r0000gt/T//RtmposIK78/downloaded_packages
# Load the packages
library(ggplot2)
library(ggridges)
## Warning: package 'ggridges' was built under R version 4.3.2
# Range of values
phenoRange = range(c(data4AllAnimals$pheno, data4AllAnimals$gv))
# Plot phenotype values for all newborn animals per year and sex
p = ggplot(data4AllAnimals, aes(x = pheno, y = as.factor(yearOfBirth))) +
geom_density_ridges(aes(fill = sex, linetype = sex), alpha = .4, rel_min_height = 0.01) +
xlim(phenoRange) +
ylab("Year of birth") +
xlab("Phenotype value (kg)") +
ggtitle("Newborn animals") +
theme(legend.position = "top")
print(p)
## Picking joint bandwidth of 4.97
# Plot phenotype values for all parents per year and sex
p = ggplot(data4AllParents, aes(x = pheno, y = as.factor(yearOfUse))) +
geom_density_ridges(aes(fill = sex, linetype = sex), alpha = .4, rel_min_height = 0.01) +
xlim(phenoRange) +
ylab("Year of birth") +
xlab("Phenotype value (kg)") +
ggtitle("Parents") +
theme(legend.position = "top")
print(p)
## Picking joint bandwidth of 3.13
# Plot genetic values for all parents per year and sex
p = ggplot(data4AllParents, aes(x = gv, y = as.factor(yearOfUse))) +
geom_density_ridges(aes(fill = sex, linetype = sex), alpha = .4, rel_min_height = 0.01) +
xlim(phenoRange) +
ylab("Year of birth") +
xlab("Genetic value (kg)") +
ggtitle("Parents") +
theme(legend.position = "top")
print(p)
## Picking joint bandwidth of 2.89
# Plot genetic values for newly selected parents per year and sex
p = ggplot(data4NewParents, aes(x = gv, y = as.factor(yearOfBirth))) +
geom_density_ridges(aes(fill = sex, linetype = sex), alpha = .4, rel_min_height = 0.01) +
xlim(phenoRange) +
ylab("Year of birth") +
xlab("Genetic value (kg)") +
ggtitle("New parents") +
theme(legend.position = "top")
print(p)
## Picking joint bandwidth of 3.06
The first plot showed distribution of phenotype values of newborn animals for each year, separated by sex. We clearly saw a response to selection across years and no difference between sexes. This is expected, because, in the absence of sexual dimorphism, DNA lottery and environment generate similar distributions of genetic and phenotype values for each sex.
The second plot showed distribution of phenotype values of parents for each year, separated by sex. As before, response to selection was clearly seen across years, but we also saw a large difference between distributions for females and males. Males had larger values due to higher selection intensity in males.
The third plot showed distribution of genetic values of parents for each year, separated by sex. Compared to phenotype values, we saw narrower distribution because phenotype values are more dispersed due to environmental effects. Also, the difference between sexes was not as large as we might have thought based on phenotype values.
The fourth plot showed distribution of genetic values of newly selected parents for each year, separated by sex. Compared to all parents, we saw that newly selected parents had higher genetic values due to genetic progress year on year.