One of the purposes of simulations is to test multiple alternative breeding scenarios before deploying a breeding programme in the real-life. In this independent exercise, you will simulate two alternative animal breeding programmes with overlapping generations and different selection intensities in males and females. In the first scenario you will repeat the breeding programme from the screencast, while in the second scenario you will reduce generation interval (and to maintain the size of the breeding programme change the selection intensity) on the male side. You will keep the same overall structure as in the screencast, but mate 1000 cows of different ages mated with 25 bulls instead of 50. To enable a fair comparison of the alternative breeding scenarios, they should start from the same point. Hence you will keep the initial base population of sires and dams for the two scenarios. Finally, you will compare the response to selection between simulated breeding scenarios.
You will achieve all this by:
Start the simulation by simulating founder genomes, defining a trait, and initiating a base population. You will simulate a cattle genome with 30 chromosomes for 2000 founders. The trait under selection will be weaning weight of calves, with a mean of 250 kg, phenotype variance of 400 kg\(^2\), and heritability of 0.3. The trait will have a simple additive genetic architecture with 100 QTL on each chromosome.
# Clean the working environment
rm(list = ls())
# Set the default plot layout
par(mfrow = c(1, 1))
# Load AlphaSimR
library(AlphaSimR)
## Loading required package: R6
# Simulate founder genomes
# ... this runMacs() call will take quite a bit of time!
# founderGenomes = runMacs(nInd = 2000,
# nChr = 30,
# segSites = 100,
# species = "CATTLE")
# ... therefore, we will speed up the demonstration with the quickHaplo()
# (we recommend use of runMacs() for research purposes!)
founderGenomes = quickHaplo(nInd = 2000,
nChr = 30,
segSites = 100)
# Global simulation parameters
SP = SimParam$new(founderGenomes)
SP$setSexes("yes_sys")
phenoVar = 400
heritability = 0.3
genVar = phenoVar * heritability
SP$addTraitA(nQtlPerChr = 100, mean = 250, var = genVar)
# Base population
founders = newPop(founderGenomes)
# Phenotype the base population
founders = setPheno(pop = founders, h2 = heritability)
As in the screencast, you will work with individuals born in
different years. You will store these individuals in different objects
and you will also combine the individuals with different birth years and
from different scenarios into one object. To keep track of their years
or birth, you will store this information in the misc
(=miscellaneous) slot of AlphaSimR populations.
# Assign year of birth for the founders
year = 0
founders = setMisc(x = founders,
node = "yearOfBirth",
value = year)
head(getMisc(x = founders, node = "yearOfBirth"))
## [[1]]
## [1] 0
##
## [[2]]
## [1] 0
##
## [[3]]
## [1] 0
##
## [[4]]
## [1] 0
##
## [[5]]
## [1] 0
##
## [[6]]
## [1] 0
Structure of the simulated breeding programme is shown in Figure 1. In the following we explain this breeding structure and show how to simulate it.
Figure 1: Simulated beef breeding programme with 1000 cows (dams) and 50 bulls (sires) of different ages and different selection intensity.
You will take founder males and select 50 superior males as sires (fathers) of the initial year of each breeding programme. These 50 sires will be represented by 40 young sires (1 year old) and 10 older sires (2 years old). In further years, the two breeding scenarios will differ by the number of new sires selected and used.
males = selectInd(pop = founders, nInd = 50, use = "pheno", sex = "M")
baseSires2 = males[ 1:10]
baseSires2 = setMisc(x = baseSires2,
node = "yearOfBirth",
value = -1)
baseSires1 = males[11:50]
baseSires1 = setMisc(x = baseSires1,
node = "yearOfBirth",
value = 0)
baseSires = c(baseSires2, baseSires1)
nInd(baseSires)
## [1] 50
Inspect year of birth for the sires.
table(unlist(getMisc(x = baseSires, node = "yearOfBirth")))
##
## -1 0
## 10 40
For generating the selection candidates and future production females you need a sufficient number of dams (mothers). Specifically, to generate 1000 progeny every year, you will need 1000 dams. These dams will be assumed to have different ages because cows stay in the herd for multiple years and have multiple calves through their lifetime. You will assume that dams stay in the herd for up to 5 years, but that every year we only keep a predefined number of the phenotypically best dams.
First, we need to select only female founders.
cat("Founder females\n")
## Founder females
(nFemales = sum(founders@sex == "F"))
## [1] 1000
females = selectInd(pop = founders, nInd = nFemales, use = "pheno", sex = "F")
# Here we define how many dams are kept per age group
nDams1 = 500
nDams2 = 250
nDams3 = 150
nDams4 = 75
nDams5 = 25
sum(nDams1, nDams2, nDams3, nDams4, nDams5)
## [1] 1000
Now we will select the oldest group of dams.
cat("Dams5\n")
## Dams5
(start = 1)
## [1] 1
(end = nDams5)
## [1] 25
baseDams5 = females[start:end]
baseDams5 = setMisc(x = baseDams5,
node = "yearOfBirth",
value = -4)
nInd(baseDams5)
## [1] 25
And second oldest group of dams.
cat("Dams4\n")
## Dams4
(start = end + 1)
## [1] 26
(end = start - 1 + nDams4)
## [1] 100
baseDams4 = females[start:end]
baseDams4 = setMisc(x = baseDams4,
node = "yearOfBirth",
value = -3)
nInd(baseDams4)
## [1] 75
And the other group of dams.
cat("Dams3\n")
## Dams3
(start = end + 1)
## [1] 101
(end = start - 1 + nDams3)
## [1] 250
baseDams3 = females[start:end]
baseDams3 = setMisc(x = baseDams3,
node = "yearOfBirth",
value = -2)
nInd(baseDams3)
## [1] 150
cat("Dams2\n")
## Dams2
(start = end + 1)
## [1] 251
(end = start - 1 + nDams2)
## [1] 500
baseDams2 = females[start:end]
baseDams2 = setMisc(x = baseDams2,
node = "yearOfBirth",
value = -1)
nInd(baseDams2)
## [1] 250
cat("Dams1\n")
## Dams1
(start = end + 1)
## [1] 501
(end = start - 1 + nDams1)
## [1] 1000
baseDams1 = females[start:end]
baseDams1 = setMisc(x = baseDams1,
node = "yearOfBirth",
value = 0)
nInd(baseDams1)
## [1] 500
baseDams = c(baseDams5, baseDams4, baseDams3, baseDams2, baseDams1)
nInd(baseDams)
## [1] 1000
Inspect year of birth for the dams
table(unlist(getMisc(x = baseDams, node = "yearOfBirth")))
##
## -4 -3 -2 -1 0
## 25 75 150 250 500
To record data from multiple populations, you will define a data
recording function recordData()
. As an input, the function
will accept: 1) a data frame (data
argument) that will
collate the information from multiple AlphaSimR populations, 2) an
AlphaSimR population (pop
argument) whose data we will
save, 3) a year of use (yearOfUse
argument) to denote
parent usage, and 4) a scenario name (scenario
argument) to
distinguish different scenarios.
In this exercise, you will store animal identification
(id
), parents’ identification’s (father
and
mother
), sex (sex
), genetic value
(gv
), phenotype value (pheno
), year of birth
(yearOfBirth
), year of use for parents
(yearOfUse
), and scenario name (scenario
).
# Function to record and collate data
recordData <- function(data = NULL, pop, yearOfUse = NA, scenario = NA) {
popData = data.frame(id = pop@id,
father = pop@father,
mother = pop@mother,
sex = pop@sex,
gv = pop@gv[, "Trait1"],
pheno = pop@pheno[, "Trait1"],
yearOfBirth = unlist(getMisc(x = pop, node ="yearOfBirth")),
yearOfUse = yearOfUse,
scenario = scenario)
# Manage first instance of calling this function, when data is NULL
if (is.null(data)) {
ret = popData
} else {
ret = rbind(data, popData)
}
return(ret)
}
You will create two data frames. The first one will be called
data4AllAnimals
, where we will store the data for all
animals. The second one will be called data4Parents
, where
you will store data for all and new parents separately.
data4AllAnimals = recordData(pop = founders,
scenario = "Founders")
head(data4AllAnimals)
## id father mother sex gv pheno yearOfBirth yearOfUse scenario
## 1 1 0 0 M 254.3827 245.7703 0 NA Founders
## 2 2 0 0 F 248.2791 271.3962 0 NA Founders
## 3 3 0 0 M 238.5237 243.0706 0 NA Founders
## 4 4 0 0 F 271.3040 303.2933 0 NA Founders
## 5 5 0 0 M 232.2281 229.8609 0 NA Founders
## 6 6 0 0 F 256.8072 263.6807 0 NA Founders
data4AllParents = recordData(pop = c(baseSires, baseDams),
yearOfUse = year,
scenario = "Founders") # year is 0 at this stage in the script
head(data4AllParents)
## id father mother sex gv pheno yearOfBirth yearOfUse scenario
## 1 1435 0 0 M 269.3276 318.7530 -1 0 Founders
## 2 269 0 0 M 269.7317 312.7793 -1 0 Founders
## 3 1633 0 0 M 266.5060 303.8398 -1 0 Founders
## 4 1577 0 0 M 242.9681 301.0434 -1 0 Founders
## 5 1263 0 0 M 262.8483 301.0213 -1 0 Founders
## 6 1375 0 0 M 271.5745 298.9256 -1 0 Founders
data4NewParents = recordData(pop = c(baseSires1, baseDams1),
scenario = "Founders")
head(data4NewParents)
## id father mother sex gv pheno yearOfBirth yearOfUse scenario
## 1 729 0 0 M 271.2515 292.4549 0 NA Founders
## 2 339 0 0 M 263.7107 292.3453 0 NA Founders
## 3 543 0 0 M 270.3776 292.2624 0 NA Founders
## 4 85 0 0 M 255.8700 292.1312 0 NA Founders
## 5 1443 0 0 M 269.7013 290.8817 0 NA Founders
## 6 1721 0 0 M 258.0400 290.5248 0 NA Founders
You will simulate 20 years of each of the two breeding scenarios. In both scenarios you will generate 1000 progeny from 1000 dams, that will be used up to 5 years, with only phenotypically best dams staying in the herd for longer. The two scenarios will differ only on the male side. In scenario 1, you will select 40 young (1 year old) and 10 old (2 years old) as shown on Figure 1 - the first scenario is therefore continuation of the initiated baseline above. Let’s do this, before discussing Scenario 2.
During this simulation you will record the data, as before, for all the newly generated individuals, all parents, and newly selected parents.
# Initiate the sires and dams for the Scenario 1
sires = baseSires
sires1 = baseSires1
dams = baseDams
dams1 = baseDams1
dams2 = baseDams2
dams3 = baseDams3
dams4 = baseDams4
# Set the current scenario name
currentScenario = "Scenario_1"
for (year in 1:20) {
cat("Working on the year:", year, "\n")
# Generate progeny from current dams and sires
candidates = randCross2(males = sires, females = dams, nCrosses = nInd(dams))
candidates = setMisc(x = candidates, node = "yearOfBirth", value = year)
candidates = setPheno(candidates, h2 = heritability)
# Record data for all newborn animals
data4AllAnimals = recordData(data = data4AllAnimals,
pop = candidates,
scenario = currentScenario)
# Record data for the used sires and dams (young and old)
data4AllParents = recordData(data = data4AllParents,
pop = c(sires, dams),
yearOfUse = year,
scenario = currentScenario)
# Update and select sires
sires2 = selectInd(pop = sires1, nInd = 10, use = "pheno")
sires1 = selectInd(pop = candidates, nInd = 40, use = "pheno", sex = "M")
sires = c(sires2, sires1)
# Update and select dams
dams5 = selectInd(pop = dams4, nInd = nDams5, use = "pheno")
dams4 = selectInd(pop = dams3, nInd = nDams4, use = "pheno")
dams3 = selectInd(pop = dams2, nInd = nDams3, use = "pheno")
dams2 = selectInd(pop = dams1, nInd = nDams2, use = "pheno")
dams1 = selectInd(pop = candidates, nInd = nDams1, use = "pheno", sex = "F")
dams = c(dams5, dams4, dams3, dams2, dams1)
# Record data for the newly selected sires and dams (just the new ones)
data4NewParents = recordData(data = data4NewParents,
pop = c(sires1, dams1),
scenario = currentScenario)
}
## Working on the year: 1
## Working on the year: 2
## Working on the year: 3
## Working on the year: 4
## Working on the year: 5
## Working on the year: 6
## Working on the year: 7
## Working on the year: 8
## Working on the year: 9
## Working on the year: 10
## Working on the year: 11
## Working on the year: 12
## Working on the year: 13
## Working on the year: 14
## Working on the year: 15
## Working on the year: 16
## Working on the year: 17
## Working on the year: 18
## Working on the year: 19
## Working on the year: 20
In scenario 2, you will select only 25 young (1 year old), that is, no old sires, as shown on Figure 2.
Figure 2: Simulated beef breeding programme with 1000 cows (dams) and 25 bulls (sires) of different ages and different selection intensity.
# Initiate the sires and dams for the Scenario 2
# To enable a fair comparison, both scenarios have the same starting point
sires = baseSires
dams = baseDams
dams1 = baseDams1
dams2 = baseDams2
dams3 = baseDams3
dams4 = baseDams4
# Set the current scenario name
currentScenario = "Scenario_2"
for (year in 1:20) {
cat("Working on the year:", year, "\n")
# Generate progeny from current dams and sires
candidates = randCross2(males = sires, females = dams, nCrosses = nInd(dams))
candidates = setMisc(x = candidates, node = "yearOfBirth", value = year)
candidates = setPheno(candidates, h2 = heritability)
# Record data for all newborn animals
data4AllAnimals = recordData(data = data4AllAnimals,
pop = candidates,
scenario = currentScenario)
# Record data for the used sires and dams (young and old)
data4AllParents = recordData(data = data4AllParents,
pop = c(sires, dams),
yearOfUse = year,
scenario = currentScenario)
# Update and select sires
sires = selectInd(pop = candidates, nInd = 25, use = "pheno", sex = "M")
# Update and select dams
dams5 = selectInd(pop = dams4, nInd = nDams5, use = "pheno")
dams4 = selectInd(pop = dams3, nInd = nDams4, use = "pheno")
dams3 = selectInd(pop = dams2, nInd = nDams3, use = "pheno")
dams2 = selectInd(pop = dams1, nInd = nDams2, use = "pheno")
dams1 = selectInd(pop = candidates, nInd = nDams1, use = "pheno", sex = "F")
dams = c(dams5, dams4, dams3, dams2, dams1)
# Record data for the newly selected sires and dams (just the new ones)
data4NewParents = recordData(data = data4NewParents,
pop = c(sires, dams1),
scenario = currentScenario)
}
## Working on the year: 1
## Working on the year: 2
## Working on the year: 3
## Working on the year: 4
## Working on the year: 5
## Working on the year: 6
## Working on the year: 7
## Working on the year: 8
## Working on the year: 9
## Working on the year: 10
## Working on the year: 11
## Working on the year: 12
## Working on the year: 13
## Working on the year: 14
## Working on the year: 15
## Working on the year: 16
## Working on the year: 17
## Working on the year: 18
## Working on the year: 19
## Working on the year: 20
To summarise response to selection over the years, you will show mean
of phenotype and genetic values for males and females separately because
selection intensity is different between the two sexes in this breeding
programme. Furthermore, you will show it for newborn animals (selection
candidates), all parents, and for selected animals (parents). To do
this, you will use ggplot2
R package introduced in the
screencast. The ggplot2
package is part of the larger
collection of R packages designed for tidy data science known as
tidyverse
. Here you will use the following data preparation
and plotting code, without delving into details, but we warmly recommend
you study tidyverse further (you can start at; https://www.tidyverse.org).
# Install additional package for tidy data science
install.packages(pkg = c("tidyverse"), repos = "https://cloud.r-project.org")
##
## The downloaded binary packages are in
## /var/folders/vv/pqxjdcs111b2_mv0jc5kn_8r0000gt/T//RtmpcVhVmG/downloaded_packages
# Load the package
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ✖ dplyr::mutate() masks AlphaSimR::mutate()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Range of values
phenoRange = range(c(data4AllAnimals$pheno, data4AllAnimals$gv))
# Plot mean phenotype value for all newborn animals per year and sex
summaryAll = data4AllAnimals %>%
filter(scenario != "Founders") %>%
group_by(scenario, yearOfBirth, sex) %>%
summarise(meanPheno = mean(pheno))
## `summarise()` has grouped output by 'scenario', 'yearOfBirth'. You can override
## using the `.groups` argument.
p = ggplot(summaryAll,
aes(x = yearOfBirth, y = meanPheno,
colour = scenario,
linetype = sex)) +
geom_line() +
ylim(phenoRange) +
xlab("Year of birth") +
ylab("Phenotype value (kg)") +
ggtitle("Newborn animals") +
labs(colour = "Scenario", linetype = "Sex") +
theme(legend.position = "top")
print(p)
# Plot mean phenotype value for all parents per year and sex
summaryAllParents = data4AllParents %>%
filter(scenario != "Founders") %>%
group_by(scenario, yearOfUse, sex) %>%
summarise(meanPheno = mean(pheno),
meanGv = mean(gv))
## `summarise()` has grouped output by 'scenario', 'yearOfUse'. You can override
## using the `.groups` argument.
p = ggplot(summaryAllParents,
aes(x = yearOfUse, y = meanPheno,
colour = scenario,
linetype = sex)) +
geom_line() +
ylim(phenoRange) +
xlab("Year of birth") +
ylab("Phenotype value (kg)") +
ggtitle("Parents") +
labs(colour = "Scenario", linetype = "Sex") +
theme(legend.position = "top")
print(p)
# Plot mean genetic value for all parents per year and sex
p = ggplot(summaryAllParents,
aes(x = yearOfUse, y = meanGv,
colour = scenario,
linetype = sex)) +
geom_line() +
ylim(phenoRange) +
xlab("Year of birth") +
ylab("Genetic value (kg)") +
ggtitle("Parents") +
labs(colour = "Scenario", linetype = "Sex") +
theme(legend.position = "top")
print(p)
# Plot mean genetic value for newly selected parents per year and sex
summaryNewParents = data4NewParents %>%
filter(scenario != "Founders") %>%
group_by(scenario, yearOfBirth, sex) %>%
summarise(meanPheno = mean(pheno),
meanGv = mean(gv))
## `summarise()` has grouped output by 'scenario', 'yearOfBirth'. You can override
## using the `.groups` argument.
p = ggplot(summaryNewParents,
aes(x = yearOfBirth, y = meanGv,
colour = scenario,
linetype = sex)) +
geom_line() +
ylim(phenoRange) +
xlab("Year of birth") +
ylab("Genetic value (kg)") +
ggtitle("New parents") +
labs(colour = "Scenario", linetype = "Sex") +
theme(legend.position = "top")
print(p)
What do the results show you about these two breeding scenarios?
The first plot showed mean phenotype value of newborn animals for each year, separated by sex and scenario. We clearly saw a response to selection across years and no difference between sexes within each scenario. This is expected, because, in the absence of sexual dimorphism, DNA lottery generates similar distributions of genetic and phenotype values for each sex. While both scenarios started from the same point, the second scenario had higher response to selection.
The second plot showed mean phenotype value of parents for each year, separated by sex and scenario. As before, response to selection was clearly seen across years, but now we also saw a large difference between means for females and males. Males had larger values due to higher selection intensity in males. This difference was apparent in both scenarios. Difference between the males from two scenarios was larger compared to difference between the females, because in the second scenario males were selected with higher selection intensity compared to the males in the first scenario, while the female selection was the same in both scenarios.
The third plot showed mean genetic value of parents for each year, separated by sex and scenario. The difference between sexes was not as large as we might have thought based on phenotype values.
The fourth plot showed mean genetic value of newly selected parents for each year, separated by sex and scenario. Comparing this plot with the third plot shows difference between all and new parents - the new parents have higher genetic value due to response to selection.
As an overall conclusion, we saw that increasing selection intensity and reducing generation interval on the male side increased response to selection. To further optimise response to selection, additional scenarios with modifications on the female sides could be tested, but reproductive reality is that a beef breeder requires sufficient number of cows to produce the calves and this simulation has already assumed substantial selection on the female side (we only kept the best half of dams between age groups).