The objective of this tutorial is to simulate plot data using FieldSimR which captures spatial variation with realistic structure and complexity. Genetic values will be simulated using AlphaSimR based on an additive genetic trait representing grain yield.
Summary
The simulation can be summarised by the following steps:
field_trial_error()
and plot_effects()
functionsmake_phenotypes()
function
Clean the working environment.
rm(list = ls())
Load required packages.
library(FieldSimR)
library(AlphaSimR)
#> Loading required package: R6
library(ggplot2)
The exact results in this tutorial can be reproduced by initially setting the following seed.
set.seed(123)
Consider a scenario in which 100 inbred wheat genotypes are evaluated for grain yield (t/ha) in a single field trial. The genotypes are simulated with additive genetic relatedness using AlphaSimR. The trial comprises 10 columns and 20 rows for 200 plots in total. There are two complete blocks of all genotypes aligned in the column direction (side-by-side). The trial is 80 m long in the column direction and 40 m wide in the row direction, with plots 8m long by 2m wide. The trait mean is 4 t/ha and the trait variance is 0.2 t2/ha2. The plot-level heritability is 0.3, which reflects the ratio of genetic variance to total phenotypic variance. The error variance is given by 0.2/0.3-0.2 = 0.47 t2/ha2.
The simulation parameters are:
n_genos <- 100 # no. genotypes
n_reps <- n_blocks <- 2 # no. replicates/blocks
block_dir <- "col" # block direction
n_cols <- 10 # no. columns
n_rows <- 20 # no. rows
n_plots <- n_cols * n_rows # no. plots == n_genos * n_reps
plot_length <- 8
plot_width <- 2
mu <- 4 # trait mean
sigma2 <- 0.2 # trait variance
H2 <- 0.3 # plot-level heritability
sigma2e <- sigma2/H2 - sigma2 # error variance
round(sigma2e, digits = 2)
#> [1] 0.47
The blank field trial layout is:
The phenotypes are generated based on the linear mixed model:
\(\mathbf{y} = \mathbf{Z}\mathbf{gv} + \mathbf{e}\),
where \(\mathbf{gv}\) is the vector of genetic values with design matrix \(\mathbf{Z}\) and \(\mathbf{e}\) is the vector of plot errors. The inclusion of additional non-genetic effects is straightforward.
The simulation involves three steps:
The plot errors are simulated to capture spatial variation, which can be broadly categorised as either global and local trend (smooth spatial trend), random error (noise) or extraneous variation (systematic error).
The plot errors are constructed as the sum of three terms:
\(\mathbf{e} = \mathbf{se} + \mathbf{re} + \mathbf{ee}\),
where \(\mathbf{se}\) is a vector of spatial errors that capture global and local trend, \(\mathbf{re}\) is a vector of random errors that capture random measurement error/noise and \(\mathbf{ee}\) is a vector of extraneous errors that capture extraneous variation.
Simulating the plot errors involves two parts:
field_trial_error()
function and displaying them with the plot_effects()
function.The plot errors are generated based on a proportion of spatial error variance of 0.4 and a proportion of extraneous error variance of 0.2. FieldSimR will then allocate the remaining 0.4 to the random error variance by default.
The initial simulation parameters are:
round(sigma2e, digits = 2) # error variance
#> [1] 0.47
prop_spatial <- 0.4 # proportion of spatial error variance
prop_ext <- 0.2 # proportion of extraneous error variance
1 - (prop_spatial + prop_ext) # proportion of random error variance
#> [1] 0.4
The simulation parameters for the spatial and extraneous errors are defined in the following. No further parameters are required for the random errors.
FieldSimR has the capacity to simulate spatial errors based on a
separable first order autoregressive (AR1) process or bivariate
interpolation. These options are set with
spatial_model = "AR1:AR1"
or
spatial_model = "Bivariate"
.
FieldSimR’s functionality will be demonstrated using a separable AR1 process with column autocorrelation of 0.5 and row autocorrelation of 0.7. The simulation parameters are:
spatial_model1 <- "AR1:AR1"
prop_spatial # proportion of spatial error variance
#> [1] 0.4
col_cor <- 0.5 # column autocorrelation
row_cor <- 0.7 # row autocorrelation
Note that the plot lengths and widths are not required when
spatial_model = "AR1:AR1"
.
FieldSimR’s functionality will also be demonstrated using bivariate
interpolation with 100 additional knots points. This option is set with
complexity = 100
. The simulation parameters are:
spatial_model2 <- "Bivariate"
prop_spatial # proportion of spatial error variance
#> [1] 0.4
complexity <- 100 # number of additional knot points
plot_length
#> [1] 8
plot_width
#> [1] 2
Note that increasing the complexity will generate more local trend relative to global trend, up to a point where the spatial errors capture minimal or no trend.
FieldSimR has the capacity to simulate extraneous errors based on
random or zig-zag ordering. These options are set with
ext_ord = "random"
or ext_ord = "zig-zag"
. The
zig-zag ordering is achieved by alternating the positive and negative
simulated values between neighbouring columns and/or rows.
Extraneous errors can be simulated in both directions, or either the
column or row directions individually. These options are set with
ext_dir = "both"
, ext_dir = "col"
or
ext_dir = "row"
.
The functionality will be demonstrated using zig-zag ordering between neighbouring rows. The simulation parameters are:
prop_ext # proportion of extraneous error variance
#> [1] 0.2
ext_ord <- "zig-zag" # ordering of extraneous variation
ext_dir <- "row" # direction of extraneous variation
The plot errors are simulated using FieldSimR’s function
field_trial_error()
. The functionality will be demonstrated
separately for the simulations involving the AR1 process and bivariate
interpolation.
The simulation is achieved by populating the
field_trial_error()
function with terms appropriate to the
AR1 process, i.e. col_cor
and row_cor
. A data
frame is returned containing the environment, block, column and row
numbers as well as the simulated plot error. When
return_effects = TRUE
, a list is returned with an
additional data frame containing the spatial, random and extraneous
error terms.
error_df1 <- field_trial_error(n_traits = 1,
n_envs = 1,
n_blocks = n_blocks, # 2
block_dir = block_dir, # "col"
n_cols = n_cols, # 10
n_rows = n_rows, # 20
var_R = sigma2e, # 0.47
prop_spatial = prop_spatial, # 0.4
spatial_model = spatial_model1, # "AR1:AR1"
col_cor = col_cor, # 0.5
row_cor = row_cor, # 0.7
prop_ext = prop_ext, # 0.2
ext_ord = ext_ord, # "zig-zag"
ext_dir = ext_dir, # "row"
return_effects = TRUE)
head(error_df1$plot_df) # data frame with simulated plot errors
#> env block col row e.Trait1
#> 1 1 1 1 1 0.58491308
#> 2 1 1 1 2 0.27219805
#> 3 1 1 1 3 -0.02512364
#> 4 1 1 1 4 0.52557053
#> 5 1 1 1 5 -0.31935601
#> 6 1 1 1 6 0.88407277
head(error_df1$Trait1) # data frame with simulated spatial, random and extraneous errors
#> env block col row e.spat e.rand e.ext.col e.ext.row
#> 1 1 1 1 1 -0.3258259 0.9647753 0 -0.05403635
#> 2 1 1 1 2 -0.3218923 0.5682535 0 0.02583686
#> 3 1 1 1 3 0.3680727 -0.1374527 0 -0.25574365
#> 4 1 1 1 4 0.2793471 0.2241504 0 0.02207304
#> 5 1 1 1 5 0.2398195 -0.2041937 0 -0.35498184
#> 6 1 1 1 6 0.8213364 -0.2318872 0 0.29462355
The separate error terms can be displayed using FieldSimR’s function
plot_effects()
. For example, the spatial errors are given
by:
plot_effects(error_df1$Trait1, effect = "e.spat", labels = TRUE)
The total plot errors are given by:
plot_effects(error_df1$plot_df, effect = "e.Trait1", labels = TRUE)
The theoretical and sample variograms can be displayed using
FieldSimR’s functions theoretical_variogram()
and
sample_variogram()
. The sample variogram for the total plot
errors is given by:
sample_variogram(error_df1$plot_df, effect = "e.Trait1")
The simulation is achieved by populating the
field_trial_error()
function with terms appropriate to
bivariate interpolation, i.e. plot_length
,
plot_width
and complexity
.
error_df2 <- field_trial_error(n_traits = 1,
n_envs = 1,
n_blocks = n_blocks, # 2
block_dir = block_dir, # "col"
n_cols = n_cols, # 10
n_rows = n_rows, # 20
var_R = sigma2e, # 0.47
prop_spatial = prop_spatial, # 0.4
spatial_model = spatial_model2, # "Bivariate"
plot_length = plot_length, # 8
plot_width = plot_width, # 2
complexity = complexity, # 100
prop_ext = prop_ext, # 0.2
ext_dir = ext_dir, # "row"
ext_ord = ext_ord, # "zig-zag"
return_effects = TRUE)
head(error_df2$plot_df) # data frame with simulated plot errors
#> env block col row e.Trait1
#> 1 1 1 1 1 -0.27516294
#> 2 1 1 1 2 1.47347874
#> 3 1 1 1 3 0.00652826
#> 4 1 1 1 4 0.26436917
#> 5 1 1 1 5 -0.40152695
#> 6 1 1 1 6 0.28554393
head(error_df2$Trait1) # data frame with simulated spatial, random and extraneous errors
#> env block col row e.spat e.rand e.ext.col e.ext.row
#> 1 1 1 1 1 0.1367561 -0.20053901 0 -0.2113800
#> 2 1 1 1 2 0.2289058 0.55481648 0 0.6897565
#> 3 1 1 1 3 0.3017271 0.09412504 0 -0.3893239
#> 4 1 1 1 4 0.3724624 -0.69437935 0 0.5862861
#> 5 1 1 1 5 0.3942169 -0.45212400 0 -0.3436198
#> 6 1 1 1 6 0.3646449 -0.27283687 0 0.1937359
The spatial errors are given by:
plot_effects(error_df2$Trait1, effect = "e.spat", labels = TRUE)
The total plot errors are given by:
plot_effects(error_df2$plot_df, effect = "e.Trait1", labels = TRUE)
Simulating the genetic values involves two parts:
The number of homogeneous genotypes in the founder population is set to 20, the number of chromosomes is set to 21 and the number of segregating sites (biallelic QTN) per chromosome is set to 300.
n_founders <- 20 # Number of genotypes in the founder population
n_chr <- 21 # Number of chromosomes
n_seg_sites <- 300 # Number of QTN per chromosome
The trait mean and variance are:
mu # trait mean
#> [1] 4
sigma2 # trait variance
#> [1] 0.2
The founder population is simulated using the runMacs()
function.
# Simulating a founder population using AlphaSimR's "WHEAT" presets to mimic it's evolutionary history
founders <- runMacs(nInd = n_founders,
nChr = n_chr,
segSites = n_seg_sites,
inbred = TRUE,
species = "WHEAT",
nThreads = 2)
founders # MapPop object
#> An object of class "MapPop"
#> Ploidy: 2
#> Individuals: 20
#> Chromosomes: 21
#> Loci: 6300
SP <- SimParam$new(founders)
An additive genetic trait representing grain yield is then simulated
using the addTraitA()
function.
SP$addTraitA(nQtlPerChr = n_seg_sites,
mean = mu,
var = sigma2)
founders <- newPop(founders) # create pop-class object
The inbred wheat genotypes are generated by randomly crossing the
founders using the randCross()
function, and then made into
doubled haploid (DH) lines using the makeDH()
function.
f1s <- randCross(pop = founders,
nCrosses = 20, # no. crosses
nProgeny = 5) # no. progeny per cross
DHs <- makeDH(pop = f1s, nDH = 1)
DHs
#> An object of class "Pop"
#> Ploidy: 2
#> Individuals: 100
#> Chromosomes: 21
#> Loci: 6300
#> Traits: 1
The genetic values are obtained for the DH population.
gv <- c(DHs@gv)
round(mean(gv), digits = 2) # trait mean in the DH population
#> [1] 3.93
round(var(gv), digits = 2) # trait variance in the DH population
#> [1] 0.16
gv_df <- data.frame(env = factor(1),
rep = factor(rep(1:n_reps, each = n_genos)),
id = factor(DHs@id),
gv = gv)
head(gv_df)
#> env rep id gv
#> 1 1 1 121 3.497497
#> 2 1 1 122 3.333328
#> 3 1 1 123 4.185656
#> 4 1 1 124 3.970236
#> 5 1 1 125 3.545663
#> 6 1 1 126 3.763558
The phenotypes are generated using FieldSimR’s function
make_phenotypes()
. When randomise = TRUE
,
genotypes are allocated to plots using a randomised complete block
design.
trial_df <- make_phenotypes(gv_df = gv_df, error_df = error_df2, randomise = TRUE)
head(trial_df)
#> env block col row id phe.Trait1
#> 1 1 1 1 1 191 3.581255
#> 2 1 1 1 2 190 5.754405
#> 3 1 1 1 3 154 3.946641
#> 4 1 1 1 4 193 3.924392
#> 5 1 1 1 5 220 3.680489
#> 6 1 1 1 6 206 4.610504
The true genetic values are also added to aid with interpretation.
Z <- model.matrix(~id - 1, data = trial_df)
trial_df$gv.Trait1 <- c(Z %*% gv)
head(trial_df)
#> env block col row id phe.Trait1 gv.Trait1
#> 1 1 1 1 1 191 3.581255 3.856418
#> 2 1 1 1 2 190 5.754405 4.280926
#> 3 1 1 1 3 154 3.946641 3.940112
#> 4 1 1 1 4 193 3.924392 3.660022
#> 5 1 1 1 5 220 3.680489 4.082016
#> 6 1 1 1 6 206 4.610504 4.324960
The phenotypes are given by:
plot_effects(trial_df, effect = "phe.Trait1", labels = TRUE)
The simulated plot-level and line-mean heritabilities can be formally quantified to aid with interpretation. The square-root of the latter represents the expected prediction accuracy for the genetic values.
A quick look at the phenotypes and true genetic values.
#> env block col row id phe.Trait1 gv.Trait1
#> 1 1 1 1 1 191 3.581255 3.856418
#> 2 1 1 1 2 190 5.754405 4.280926
#> 3 1 1 1 3 154 3.946641 3.940112
#> 4 1 1 1 4 193 3.924392 3.660022
#> 5 1 1 1 5 220 3.680489 4.082016
#> 6 1 1 1 6 206 4.610504 4.324960
The plot-level heritability is:
round(cor(trial_df$phe.Trait1, trial_df$gv.Trait1)^2, digits = 2) # simulated plot-level heritability
#> [1] 0.2
round(sigma2/(sigma2 + sigma2e), digits = 2) # expected plot-level heritability
#> [1] 0.3
The minor discrepancies here are a result of the sampling process.
A quick look at the average phenotypes and true genetic values.
index <- trimws(trial_df$id[trial_df$block == 1]) # obtain genotype ordering
ave_yield <- with(trial_df, tapply(phe.Trait1, id, mean))[index]
The line-mean heritability in the DH population is:
round(cor(ave_yield, trial_df$gv.Trait1[is.element(trial_df$block, 1)])^2, digits = 2) # simulated line-mean heritability
#> [1] 0.34
round(sigma2/(sigma2 + sigma2e/n_reps), digits = 2) # expected line-mean heritability
#> [1] 0.46
round(cor(ave_yield, trial_df$gv.Trait1[is.element(trial_df$block, 1)]), digits = 2) # simulated prediction accuracy
#> [1] 0.58
round(sqrt(sigma2/(sigma2 + sigma2e/n_reps)), digits = 2) # expected prediction accuracy
#> [1] 0.68