The objective of this tutorial is to simulate a MET dataset using FieldSimR which captures GxE interaction and spatial variation with realistic structure and complexity. Genetic values will be simulated using AlphaSimR based on an additive genetic trait representing grain yield.
Summary
The simulation can be summarised by the following steps:
field_trial_error()
and plot_effects()
functionsmake_phenotypes()
function
Clean the working environment.
rm(list = ls())
Load required packages.
library(FieldSimR)
library(AlphaSimR)
#> Loading required package: R6
library(ggplot2)
library(mbend)
The exact results in this tutorial can be reproduced by initially setting the following seed.
set.seed(123)
Consider a scenario in which 200 inbred wheat genotypes are evaluated for grain yield (t/ha) in field trials across 10 growing environments. The genotypes are simulated with additive genetic relatedness using AlphaSimR. Each trial comprises 20 columns and 20 rows for 400 plots in total, with two complete blocks of all genotypes aligned in the column direction (side-by-side). The overall trait mean is 4 t/ha and the overall trait variance is 0.2 t2/ha2. The average plot-level heritability is 0.3, which reflects the average ratio of genetic variance to total phenotypic variance within environments. The error variance is given by 0.2/0.3-0.2 = 0.47 t2/ha2.
The simulation parameters are:
n_genos <- 200 # no. genotypes
n_envs <- 10 # no. environments
n_reps <- n_blocks <- 2 # no. replicates/blocks per environment
block_dir <- "col" # block direction
n_cols <- 20 # no. columns per environment
n_rows <- 20 # no. rows per environment
n_plots <- n_cols * n_rows # no. plots per environment == n_genos * n_reps
mu <- 4 # overall trait mean
sigma2 <- 0.2 # overall trait variance
H2 <- 0.3 # average plot-level heritability
sigma2e <- sigma2/H2 - sigma2 # error variance
round(sigma2e, digits = 2)
#> [1] 0.47
The field trial layout for environment 1 is:
The phenotypes are generated based on the linear mixed model:
\(\mathbf{y} = \mathbf{Z}\mathbf{gv} + \mathbf{e}\),
where \(\mathbf{gv}\) is a vector of genetic values with design matrix \(\mathbf{Z}\) and \(\mathbf{e}\) is a vector of plot errors. The inclusion of additional non-genetic effects is straightforward.
The simulation involves three steps:
The genetic values are simulated to capture GxE interaction, which can be broadly categorised as either heterogeneity of genetic variance (changes in genotype scale) or lack of genetic correlation (changes in genotype rank).
Simulating the genetic values involves two parts:
FieldSimR provides wrapper functions for compound symmetry and unstructured models for GxE interaction.
The number of homogeneous genotypes in the founder population is set to 20, the number of chromosomes is set to 20 and the number of segregating sites (biallelic QTN) per chromosome is set to 300.
n_founders <- 20 # Number of genotypes in the founder population.
n_chr <- 21 # Number of chromosomes.
n_seg_sites <- 300 # Number of QTN per chromosome.
The overall trait mean and variance are:
mu # trait mean
#> [1] 4
sigma2 # trait variance
#> [1] 0.2
The founder population is simulated using the runMacs()
function.
# Simulating a founder population using AlphaSimR's "WHEAT" presets to mimic the species' evolutionary history
founders <- runMacs(nInd = n_founders,
nChr = n_chr,
segSites = n_seg_sites,
inbred = TRUE,
species = "WHEAT",
nThreads = 2)
founders # MapPop object
#> An object of class "MapPop"
#> Ploidy: 2
#> Individuals: 20
#> Chromosomes: 21
#> Loci: 6300
SP <- SimParam$new(founders)
An additive genetic trait representing grain yield is then simulated with AlphaSimR, using FieldSimR’s wrapper functions. The functionality will be demonstrated separately for the simulations involving the compound symmetry and unstructured models.
The genetic values for the compound symmetry model are constructed as the sum of two terms:
\(\mathbf{gv} = \mathbf{1} \otimes \mathbf{g} + \mathbf{{ge}}\),
where \(\mathbf{g}\) is the vector of genotype main effects and \(\mathbf{{ge}}\) is the vector of GxE interaction effects. The relative magnitude of main effect variance to total genetic variance (main effect + interaction variance) is set to 0.5. This allocates half of the overall trait variance to the genotype main effects and half to the GxE interaction effects.
rel_main_eff <- 0.5
FieldSimR’s wrapper function compsym_asr_input()
is then
used to populate a list of input parameters for AlphaSimR. The list
contains vectors for the mean and variance of the genotype main effects
and interaction effects, as well as the correlation matrix between all
effects.
input_asr1 <- compsym_asr_input(n_traits = 1,
n_envs = n_envs,
mean = mu,
var = sigma2,
rel_main_eff_A = rel_main_eff)
input_asr1
#> $mean
#> [1] 4 0 0 0 0 0 0 0 0 0 0
#>
#> $var
#> [1] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
#>
#> $cor_A
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
#> [1,] 1 0 0 0 0 0 0 0 0 0 0
#> [2,] 0 1 0 0 0 0 0 0 0 0 0
#> [3,] 0 0 1 0 0 0 0 0 0 0 0
#> [4,] 0 0 0 1 0 0 0 0 0 0 0
#> [5,] 0 0 0 0 1 0 0 0 0 0 0
#> [6,] 0 0 0 0 0 1 0 0 0 0 0
#> [7,] 0 0 0 0 0 0 1 0 0 0 0
#> [8,] 0 0 0 0 0 0 0 1 0 0 0
#> [9,] 0 0 0 0 0 0 0 0 1 0 0
#> [10,] 0 0 0 0 0 0 0 0 0 1 0
#> [11,] 0 0 0 0 0 0 0 0 0 0 1
Note that the overall trait mean is assigned to the genotype main effects.
An additive genetic trait representing grain yield is then simulated, which produces 11 columns in the founder population. The first column contains the genotype main effects, while the remaining columns contain the GxE interaction effects.
SP$addTraitA(nQtlPerChr = n_seg_sites,
mean = input_asr1$mean,
var = input_asr1$var,
corA = input_asr1$cor_A)
founders1 <- newPop(founders) # create pop-class object
head(founders1@gv)
#> Trait1 Trait2 Trait3 Trait4 Trait5 Trait6
#> [1,] 4.507921 0.22091428 -0.21247781 0.266862496 -0.2116897 -0.26634128
#> [2,] 4.146366 -0.13610030 0.23203094 0.019463379 -0.1325774 -0.61241904
#> [3,] 4.034316 0.36390890 -0.04087896 0.006152697 0.2078321 0.29064333
#> [4,] 3.809426 -0.02633595 -0.03638150 -0.576911660 -0.2299918 0.24457708
#> [5,] 3.660717 0.06424753 0.85861671 0.011598320 -0.1612934 0.28337313
#> [6,] 4.282772 -0.20359990 -0.02687326 0.160932637 -0.3852944 0.02875901
#> Trait7 Trait8 Trait9 Trait10 Trait11
#> [1,] 0.3174922 0.006956874 0.49571203 0.02901406 -0.06409046
#> [2,] 0.3149393 -0.595855023 0.30833434 0.30345226 0.14039230
#> [3,] 0.1458506 -0.053083925 0.09321479 -0.27890150 0.47069988
#> [4,] -0.2799334 -0.270474467 -0.40806311 0.44517086 0.02288598
#> [5,] 0.3694393 -0.138224661 0.30919559 0.26697985 -0.53057050
#> [6,] -0.2768191 0.122751983 0.08698575 -0.31565054 -0.60453500
The inbred wheat genotypes are generated by randomly crossing the
founders using the randCross()
function, and then made into
DH lines using the makeDH()
function.
f1s <- randCross(pop = founders1,
nCrosses = 20, # no. crosses
nProgeny = 10) # no. progeny per cross
DHs <- makeDH(pop = f1s, nDH = 1)
DHs
#> An object of class "Pop"
#> Ploidy: 2
#> Individuals: 200
#> Chromosomes: 21
#> Loci: 6300
#> Traits: 11
FieldSimR’s wrapper function compsym_asr_output()
is
then used to obtain the genetic values for the DH population. This
function constructs the genetic values for each environment as the sum
of the genotype main effects and GxE interaction effects. When
return_effects = TRUE
, a list is returned with an
additional data frame containing the genotype main effects and GxE
interaction effects for each environment.
gv_df1 <- compsym_asr_output(pop = DHs,
n_traits = 1,
n_envs = n_envs,
n_reps = n_reps,
return_effects = TRUE)
head(gv_df1$Trial_df) # data frame with simulated genetic values
#> env rep id gv.Trait1
#> 1 1 1 221 4.423045
#> 2 1 1 222 4.251034
#> 3 1 1 223 4.311441
#> 4 1 1 224 4.318957
#> 5 1 1 225 3.994904
#> 6 1 1 226 4.558925
head(gv_df1$Trait1) # data frame with simulated genotype main effects and GxE interaction effects
#> id main int.Env1 int.Env2 int.Env3 int.Env4 int.Env5
#> 1 221 4.329882 0.09316353 0.55278333 0.051959381 0.34292847 -0.2371918
#> 2 222 4.167223 0.08381139 0.23097697 -0.006207443 -0.37549467 -0.1712432
#> 3 223 3.882204 0.42923625 0.05902127 0.070045022 -0.07337969 -0.3267089
#> 4 224 4.210671 0.10828578 0.27992990 -0.041946891 -0.01063246 -0.2483420
#> 5 225 4.020083 -0.02517896 0.27184636 0.092397850 -0.22388424 0.1297582
#> 6 226 4.299234 0.25969087 0.27005091 0.091363176 -0.03158187 0.2050466
#> int.Env6 int.Env7 int.Env8 int.Env9 int.Env10
#> 1 0.04775242 -0.4558783 -0.04258169 -0.024428552 0.34731671
#> 2 -0.04731661 -0.4288389 -0.29031517 -0.349867791 0.37160639
#> 3 0.06965547 -0.4838334 0.40857187 0.041897964 0.16619257
#> 4 -0.04322254 -0.1048843 0.01752054 -0.005105602 0.30526549
#> 5 0.36985120 -0.0376870 -0.16460878 0.256049882 0.93213711
#> 6 -0.08617849 -0.5931950 0.02606397 0.191390095 -0.06574682
GV <- matrix(gv_df1$Trial_df$gv.Trait1, ncol = n_envs)
round(mean(GV), digits = 2) # overall trait mean in the DH population
#> [1] 3.9
round(var(GV), digits = 2) # simulated between-environment genetic covariance matrix in the DH population
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#> [1,] 0.28 0.19 0.16 0.16 0.14 0.13 0.16 0.18 0.16 0.19
#> [2,] 0.19 0.26 0.16 0.14 0.15 0.17 0.16 0.18 0.15 0.16
#> [3,] 0.16 0.16 0.22 0.16 0.11 0.13 0.15 0.17 0.11 0.14
#> [4,] 0.16 0.14 0.16 0.24 0.11 0.11 0.14 0.17 0.12 0.16
#> [5,] 0.14 0.15 0.11 0.11 0.21 0.14 0.17 0.14 0.14 0.16
#> [6,] 0.13 0.17 0.13 0.11 0.14 0.23 0.16 0.16 0.13 0.16
#> [7,] 0.16 0.16 0.15 0.14 0.17 0.16 0.25 0.16 0.15 0.17
#> [8,] 0.18 0.18 0.17 0.17 0.14 0.16 0.16 0.28 0.14 0.17
#> [9,] 0.16 0.15 0.11 0.12 0.14 0.13 0.15 0.14 0.20 0.15
#> [10,] 0.19 0.16 0.14 0.16 0.16 0.16 0.17 0.17 0.15 0.26
round(mean(diag(var(GV))), digits = 2) # overall trait variance in the DH population
#> [1] 0.24
The genetic values for the unstructured model are simulated as:
\(\mathbf{gv} \sim \mbox{N}(\mu\mathbf{1}, \mathbf{G_e} \otimes \mathbf{I})\),
where \(\mu\) is the overall trait mean and \(\mathbf{G_e}\) is the between-environment genetic covariance matrix, which is constructed as:
\(\mathbf{G_e} = \mathbf{D}^{1/2}_\mathbf{e}\mathbf{C_e}\mathbf{D}^{1/2}_\mathbf{e}\)
where \(\mathbf{D_e}\) is a diagonal genetic variance matrix and \(\mathbf{C_e}\) is a between-environment genetic correlation matrix.
The genetic variances are sampled from an inverse gamma distribution, with shape parameter of 5 and scale parameter of 1. The genetic variances are then shifted to give an overall trait variance of 0.2.
de <- 1/rgamma(n_envs, shape = 5, rate = 1) # sample
de <- c(scale(de, center = TRUE, scale = FALSE) + sigma2) # shift
round(de, digits = 2)
#> [1] 0.06 0.11 0.23 0.18 0.30 -0.02 0.20 0.07 0.43 0.44
mean(de) # overall trait variance
#> [1] 0.2
De <- diag(de)
The genetic correlations are sampled from a continuous uniform
distribution ranging from 0.3 to 0.7, which produces a mean genetic
correlation of 0.5. This is achieved using FieldSimR’s function
rand_cor_mat()
. When pos_def = TRUE
, the
correlation matrix is bent to ensure it is positive definite.
Ce <- rand_cor_mat(p = n_envs, min_cor = 0.3, max_cor = 0.7, pos_def = TRUE)
#> Unweighted bending
#> max.iter = 10000
#> small.positive = 1e-04
#> method = hj
#> Found a correlation matrix.
#> Convergence met after 5 iterations.
round(Ce, digits = 2)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#> [1,] 1.00 0.37 0.58 0.62 0.66 0.66 0.36 0.65 0.33 0.70
#> [2,] 0.37 1.00 0.32 0.44 0.61 0.59 0.56 0.44 0.35 0.31
#> [3,] 0.58 0.32 1.00 0.50 0.43 0.54 0.58 0.68 0.69 0.45
#> [4,] 0.62 0.44 0.50 1.00 0.58 0.43 0.43 0.45 0.48 0.62
#> [5,] 0.66 0.61 0.43 0.58 1.00 0.67 0.65 0.42 0.64 0.51
#> [6,] 0.66 0.59 0.54 0.43 0.67 1.00 0.58 0.39 0.58 0.68
#> [7,] 0.36 0.56 0.58 0.43 0.65 0.58 1.00 0.52 0.60 0.36
#> [8,] 0.65 0.44 0.68 0.45 0.42 0.39 0.52 1.00 0.52 0.54
#> [9,] 0.33 0.35 0.69 0.48 0.64 0.58 0.60 0.52 1.00 0.44
#> [10,] 0.70 0.31 0.45 0.62 0.51 0.68 0.36 0.54 0.44 1.00
FieldSimR’s wrapper function unstr_asr_input()
is then
used to populate a list of input parameters for AlphaSimR. The list
contains vectors for the mean and variance of the genetic values in each
environment, as well as the between-environment genetic correlation
matrix.
input_asr2 <- unstr_asr_input(n_traits = 1,
n_envs = n_envs,
mean = mu,
var = de,
cor_A = Ce)
input_asr2
#> $mean
#> [1] 4 4 4 4 4 4 4 4 4 4
#>
#> $var
#> [1] 0.05790685 0.10823111 0.22666303 0.18322584 0.30279583 -0.01706116
#> [7] 0.19531678 0.06666998 0.43365466 0.44259708
#>
#> $cor_A
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#> [1,] 1.0000000 0.3718899 0.5831631 0.6197509 0.6620683 0.6564778 0.3613163
#> [2,] 0.3718899 1.0000000 0.3215893 0.4418798 0.6064223 0.5850056 0.5556100
#> [3,] 0.5831631 0.3215893 1.0000000 0.4975177 0.4308969 0.5431501 0.5798006
#> [4,] 0.6197509 0.4418798 0.4975177 1.0000000 0.5765267 0.4288828 0.4266768
#> [5,] 0.6620683 0.6064223 0.4308969 0.5765267 1.0000000 0.6735068 0.6489137
#> [6,] 0.6564778 0.5850056 0.5431501 0.4288828 0.6735068 1.0000000 0.5781115
#> [7,] 0.3613163 0.5556100 0.5798006 0.4266768 0.6489137 0.5781115 1.0000000
#> [8,] 0.6529008 0.4447530 0.6760477 0.4536838 0.4182234 0.3895078 0.5152486
#> [9,] 0.3315370 0.3540169 0.6850190 0.4790590 0.6448676 0.5750719 0.5994941
#> [10,] 0.7010280 0.3144170 0.4533120 0.6155981 0.5085389 0.6789134 0.3644169
#> [,8] [,9] [,10]
#> [1,] 0.6529008 0.3315370 0.7010280
#> [2,] 0.4447530 0.3540169 0.3144170
#> [3,] 0.6760477 0.6850190 0.4533120
#> [4,] 0.4536838 0.4790590 0.6155981
#> [5,] 0.4182234 0.6448676 0.5085389
#> [6,] 0.3895078 0.5750719 0.6789134
#> [7,] 0.5152486 0.5994941 0.3644169
#> [8,] 1.0000000 0.5228378 0.5401210
#> [9,] 0.5228378 1.0000000 0.4390177
#> [10,] 0.5401210 0.4390177 1.0000000
An additive genetic trait representing grain yield is then simulated, which produces 10 columns in the founder population. The columns contain the genetic values for each environment.
SP <- SimParam$new(founders)
SP$addTraitA(nQtlPerChr = n_seg_sites,
mean = input_asr2$mean,
var = input_asr2$var,
corA = input_asr2$cor_A)
#> Warning in sqrt(var[i]): NaNs produced
founders2 <- newPop(founders) # create pop-class object
head(founders2@gv)
#> Trait1 Trait2 Trait3 Trait4 Trait5 Trait6 Trait7 Trait8
#> [1,] 3.957456 3.466188 3.891599 3.691421 3.341357 NaN 3.350102 3.892580
#> [2,] 3.887408 4.110552 2.874634 3.719378 3.856488 NaN 3.708284 3.610557
#> [3,] 4.216900 4.329390 4.354999 3.706663 4.069446 NaN 4.908793 4.399781
#> [4,] 3.531674 3.936215 3.858938 3.316243 3.782335 NaN 4.331822 3.694656
#> [5,] 3.749395 4.509515 3.219285 3.914530 3.927330 NaN 3.969030 3.705926
#> [6,] 4.123313 3.662853 3.871763 3.133405 4.239831 NaN 3.213352 3.902217
#> Trait9 Trait10
#> [1,] 3.324880 2.796227
#> [2,] 3.021443 3.191348
#> [3,] 3.759965 4.732117
#> [4,] 4.768531 3.061471
#> [5,] 3.088558 3.058519
#> [6,] 3.930163 4.791642
The wheat genotypes are generated by randomly crossing the founders
using the randCross()
function, and then made into doubled
haploid (DH) lines using the makeDH()
function.
f1s <- randCross(pop = founders2,
nCrosses = 20, # no. crosses
nProgeny = 10) # no. progeny per cross
DHs <- makeDH(pop = f1s, nDH = 1)
DHs
#> An object of class "Pop"
#> Ploidy: 2
#> Individuals: 200
#> Chromosomes: 21
#> Loci: 6300
#> Traits: 10
FieldSimR’s wrapper function unstr_asr_output()
is then
used to obtain the genetic values for the DH population.
gv_df2 <- unstr_asr_output(pop = DHs,
n_traits = 1,
n_envs = n_envs,
n_reps = n_reps)
head(gv_df2)
#> env rep id gv.Trait1
#> 1 1 1 221 4.376070
#> 2 1 1 222 4.041503
#> 3 1 1 223 4.243742
#> 4 1 1 224 3.952004
#> 5 1 1 225 4.186191
#> 6 1 1 226 3.719800
GV <- matrix(gv_df2$gv.Trait1[gv_df2$rep == 1], ncol = n_envs)
round(mean(GV), digits = 2) # overall trait mean in the DH population
#> [1] NaN
round(cov2cor(var(GV)), digits = 2) # simulated between-environment genetic correlation matrix in the DH population
#> Warning in cov2cor(var(GV)): diag(.) had 0 or NA entries; non-finite result is
#> doubtful
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#> [1,] 1.00 0.32 0.52 0.26 0.72 NA 0.27 0.62 0.20 0.58
#> [2,] 0.32 1.00 0.31 0.29 0.55 NA 0.60 0.37 0.28 0.21
#> [3,] 0.52 0.31 1.00 0.44 0.38 NA 0.53 0.65 0.71 0.37
#> [4,] 0.26 0.29 0.44 1.00 0.23 NA 0.20 0.26 0.49 0.42
#> [5,] 0.72 0.55 0.38 0.23 1.00 NA 0.52 0.36 0.46 0.39
#> [6,] NA NA NA NA NA 1 NA NA NA NA
#> [7,] 0.27 0.60 0.53 0.20 0.52 NA 1.00 0.49 0.59 0.33
#> [8,] 0.62 0.37 0.65 0.26 0.36 NA 0.49 1.00 0.45 0.39
#> [9,] 0.20 0.28 0.71 0.49 0.46 NA 0.59 0.45 1.00 0.33
#> [10,] 0.58 0.21 0.37 0.42 0.39 NA 0.33 0.39 0.33 1.00
round(Ce, digits = 2) # expected between-environment genetic correlation matrix
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#> [1,] 1.00 0.37 0.58 0.62 0.66 0.66 0.36 0.65 0.33 0.70
#> [2,] 0.37 1.00 0.32 0.44 0.61 0.59 0.56 0.44 0.35 0.31
#> [3,] 0.58 0.32 1.00 0.50 0.43 0.54 0.58 0.68 0.69 0.45
#> [4,] 0.62 0.44 0.50 1.00 0.58 0.43 0.43 0.45 0.48 0.62
#> [5,] 0.66 0.61 0.43 0.58 1.00 0.67 0.65 0.42 0.64 0.51
#> [6,] 0.66 0.59 0.54 0.43 0.67 1.00 0.58 0.39 0.58 0.68
#> [7,] 0.36 0.56 0.58 0.43 0.65 0.58 1.00 0.52 0.60 0.36
#> [8,] 0.65 0.44 0.68 0.45 0.42 0.39 0.52 1.00 0.52 0.54
#> [9,] 0.33 0.35 0.69 0.48 0.64 0.58 0.60 0.52 1.00 0.44
#> [10,] 0.70 0.31 0.45 0.62 0.51 0.68 0.36 0.54 0.44 1.00
round(mean(diag(var(GV))), digits = 2) # overall trait variance in the DH population
#> [1] NA
It is possible to obtain genotype main effects and GxE interaction effects for this approach, although these components were not explicitly simulated. The proportion of main effect and interaction variance can also be formally quantified to aid with interpretation.
The genotype main effects are obtained as simple averages across environments.
g_main <- rowMeans(GV)
round(var(g_main), digits = 2) # simulated main effect variance
#> [1] NA
Ge <- sqrt(De) %*% Ce %*% sqrt(De)
#> Warning in sqrt(De): NaNs produced
#> Warning in sqrt(De): NaNs produced
round(mean(Ge), digits = 2) # expected main effect variance
#> [1] NaN
The GxE interaction effects are then obtained after adjusting for the genotype main effects.
ge_int <- c(GV - g_main)
GE_int <- matrix(ge_int, ncol = n_envs)
round(var(GE_int), digits = 2) # simulated interaction variance matrix
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#> [1,] NA NA NA NA NA NA NA NA NA NA
#> [2,] NA NA NA NA NA NA NA NA NA NA
#> [3,] NA NA NA NA NA NA NA NA NA NA
#> [4,] NA NA NA NA NA NA NA NA NA NA
#> [5,] NA NA NA NA NA NA NA NA NA NA
#> [6,] NA NA NA NA NA NA NA NA NA NA
#> [7,] NA NA NA NA NA NA NA NA NA NA
#> [8,] NA NA NA NA NA NA NA NA NA NA
#> [9,] NA NA NA NA NA NA NA NA NA NA
#> [10,] NA NA NA NA NA NA NA NA NA NA
round(mean(diag(var(GE_int))), digits = 2) # simulated pooled interaction variance
#> [1] NA
round(mean(diag(Ge - mean(Ge))), digits = 2) # expected pooled interaction variance
#> [1] NaN
The pooled interaction variance can be further split into heterogeneity of genetic variance and lack of genetic correlation (Cooper et al., 1994).
round(sum((sqrt(diag(var(GV))) - mean(sqrt(diag(var(GV)))))^2)/n_envs, digits = 2) # heterogeneity of genetic variance
#> [1] NA
round(sum(outer(sqrt(diag(var(GV))), sqrt(diag(var(GV))), "*") * (1 - cov2cor(var(GV))))/n_envs^2, digits = 2) # lack of genetic correlation
#> Warning in cov2cor(var(GV)): diag(.) had 0 or NA entries; non-finite result is
#> doubtful
#> [1] NA
This indicates that 0.1 of the interaction variance corresponds to changes in scale between environments and 0.9 corresponds to changes in rank.
The plot errors are obtained as the sum of spatial, random and extraneous error terms. The proportions of spatial and extraneous error variance are sampled from a continuous uniform distribution ranging from 0.2 to 0.6 and from 0 to 0.2, respectively. This produces a mean proportion of spatial error variance of 0.5 and a mean proportion of extraneous error variance of 0.1. FieldSimR will then allocate the remaining error variance to the random errors by default.
The initial simulation parameters are:
spatial_model <- "AR1:AR1"
round(sigma2e, digits = 2) # error variance
#> [1] 0.47
prop_spatial <- runif(n_envs, min = 0.2, max = 0.6) # proportions of spatial error variance
round(prop_spatial, digits = 2)
#> [1] 0.56 0.58 0.22 0.37 0.39 0.55 0.22 0.52 0.40 0.43
prop_ext <- runif(n_envs, min = 0, max = 0.2) # proportions of extraneous error variance
round(prop_ext, digits = 2)
#> [1] 0.04 0.00 0.06 0.07 0.15 0.09 0.10 0.05 0.12 0.16
round(1 - (prop_spatial + prop_ext), digits = 2) # proportions of random error variance
#> [1] 0.40 0.42 0.72 0.55 0.46 0.36 0.69 0.43 0.48 0.42
The column and row autocorrelations are sampled from a continuous uniform distribution ranging from 0.3 to 0.7 and from 0.5 to 0.9, respectively. This produces a mean column autocorrelation of 0.5 and a mean row autocorrelation of 0.7.
col_cor <- runif(n_envs, min = 0.3, max = 0.7) # column autocorrelations
row_cor <- runif(n_envs, min = 0.5, max = 0.9) # row autocorrelations
The extraneous errors are simulated based on zig-zag ordering between neighbouring rows.
ext_ord <- "zig-zag" # ordering of extraneous variation
ext_dir <- "row"
The simulation is achieved by populating the
field_trial_error()
function with the terms above. A data
frame is returned containing the environment, block, column and row
numbers as well as the simulated plot error. When
return_effects = TRUE
, a list is returned with an
additional data frame containing the spatial, random and extraneous
error terms.
error_df <- field_trial_error(n_traits = 1,
n_envs = n_envs,
n_blocks = n_blocks,
block_dir = block_dir,
n_cols = n_cols,
n_rows = n_rows,
var_R = sigma2e,
prop_spatial = prop_spatial,
spatial_model = spatial_model,
col_cor = col_cor,
row_cor = row_cor,
prop_ext = prop_ext,
ext_ord = ext_ord,
ext_dir = ext_dir,
return_effects = TRUE)
head(error_df$plot_df) # data frame with simulated plot errors
#> env block col row e.Trait1
#> 1 1 1 1 1 0.99816574
#> 2 1 1 1 2 0.01815270
#> 3 1 1 1 3 1.00706780
#> 4 1 1 1 4 0.01946961
#> 5 1 1 1 5 0.07762564
#> 6 1 1 1 6 0.62174521
head(error_df$Trait1) # data frame with simulated spatial, random and extraneous errors <--
#> env block col row e.spat e.rand e.ext.col e.ext.row
#> 1 1 1 1 1 0.283384372 0.81137172 0 -0.09659036
#> 2 1 1 1 2 0.001659233 -0.06918546 0 0.08567893
#> 3 1 1 1 3 0.213431803 0.86967955 0 -0.07604355
#> 4 1 1 1 4 -0.052119742 0.02349163 0 0.04809772
#> 5 1 1 1 5 -0.349301824 0.46173516 0 -0.03480769
#> 6 1 1 1 6 0.345313365 0.22253975 0 0.05389209
The separate error terms can be displayed using FieldSimR’s function
plot_effects()
. The spatial errors for environment 1 are
given by:
plot_effects(error_df$Trait1[error_df$Trait1$env == 1,], effect = "e.spat", labels = TRUE)
The total plot errors for environment 1 are given by:
plot_effects(error_df$plot_df[error_df$Trait1$env == 1,], effect = "e.Trait1", labels = TRUE)
The theoretical and sample variograms can be displayed using
FieldSimR’s functions theoretical_variogram()
and
sample_variogram()
. The sample variogram for environment 1
for the total plot errors is given by:
sample_variogram(error_df$plot_df[error_df$Trait1$env == 1,], effect = "e.Trait1")
The phenotypes are generated using FieldSimR’s function
make_phenotypes()
. When randomise = TRUE
,
genotypes are allocated to plots using a randomised complete block
design.
met_df <- make_phenotypes(gv_df = gv_df2, error_df = error_df, randomise = TRUE)
head(met_df)
#> env block col row id phe.Trait1
#> 1 1 1 1 1 224 4.950170
#> 2 1 1 1 2 330 3.980112
#> 3 1 1 1 3 385 5.127561
#> 4 1 1 1 4 335 4.021688
#> 5 1 1 1 5 286 3.702615
#> 6 1 1 1 6 284 4.530832
The true genetic values are also added to aid with interpretation.
Z <- model.matrix(~id:env - 1, data = met_df)
gv <- c(GV)
met_df$gv.Trait1 <- c(Z %*% gv)
head(met_df)
#> env block col row id phe.Trait1 gv.Trait1
#> 1 1 1 1 1 224 4.950170 NaN
#> 2 1 1 1 2 330 3.980112 NaN
#> 3 1 1 1 3 385 5.127561 NaN
#> 4 1 1 1 4 335 4.021688 NaN
#> 5 1 1 1 5 286 3.702615 NaN
#> 6 1 1 1 6 284 4.530832 NaN
The phenotypes for environment 1 are given by:
plot_effects(met_df[met_df$env == 1,], effect = "phe.Trait1", labels = TRUE)
The simulated plot-level and line-mean heritabilities can be formally quantified to aid with interpretation.
The plot-level heritabilities within environments are:
round(c(by(met_df[,c("phe.Trait1","gv.Trait1")], met_df$env, function(x) {cor(x$phe.Trait1, x$gv.Trait1)}))^2, digits = 2) # simulated plot-level heritabilities
#> 1 2 3 4 5 6 7 8 9 10
#> NA NA NA NA NA NA NA NA NA NA
round(de/(de + sigma2e), digits = 2) # expected plot-level heritabilities
#> [1] 0.11 0.19 0.33 0.28 0.39 -0.04 0.30 0.13 0.48 0.49
The minor discrepancies here are a result of the sampling process.
The line-mean heritabilities within environments are:
ave_yield <- with(met_df, tapply(phe.Trait1, list(id, env), mean))
tmp <- met_df[met_df$block == 1,]
tmp <- tmp[order(tmp$env, tmp$id),]
round(c(by(tmp[,c("phe.Trait1","gv.Trait1")], tmp$env, function(x) {cor(x$phe.Trait1, x$gv.Trait1)}))^2, digits = 2) # simulated line-mean heritabilities
#> 1 2 3 4 5 6 7 8 9 10
#> NA NA NA NA NA NA NA NA NA NA
round(de/(de + sigma2e/n_reps), digits = 2) # expected line-mean heritabilities
#> [1] 0.20 0.32 0.49 0.44 0.56 -0.08 0.46 0.22 0.65 0.65
round(c(by(tmp[,c("phe.Trait1","gv.Trait1")], tmp$env, function(x) {cor(x$phe.Trait1, x$gv.Trait1)})), digits = 2) # simulated prediction accuracies
#> 1 2 3 4 5 6 7 8 9 10
#> NA NA NA NA NA NA NA NA NA NA
round(sqrt(de/(de + sigma2e/n_reps)), digits = 2) # expected prediction accuracies
#> Warning in sqrt(de/(de + sigma2e/n_reps)): NaNs produced
#> [1] 0.45 0.56 0.70 0.66 0.75 NaN 0.68 0.47 0.81 0.81
The line-mean heritability across all environments is:
ave_yield <- with(met_df, tapply(phe.Trait1, id, mean))
round(cor(ave_yield, g_main)^2, digits = 2) # simulated line-mean heritability
#> [1] NA
round(var(g_main)/(var(g_main) + mean(diag(var(GE_int)))/n_envs + sigma2e/(n_envs*n_reps)), digits = 2) # expected line-mean heritability
#> [1] NA
round(cor(ave_yield, g_main)^2, digits = 2) # simulated prediction accuracy
#> [1] NA
round(sqrt(var(g_main)/(var(g_main) + mean(diag(var(GE_int)))/n_envs + sigma2e/(n_envs*n_reps))), digits = 2) # expected prediction accuracy
#> [1] NA