The objective of this tutorial is to simulate a MET dataset using FieldSimR which captures GxE interaction and spatial variation with realistic structure and complexity. Genetic values will be simulated using AlphaSimR based on an additive genetic trait representing grain yield.


Summary

The simulation can be summarised by the following steps:

  1. Simulating genetic values with AlphaSimR
    1. Simulation parameters
    2. FieldSimR’s wrapper functions for simulating GxE interaction
      • Measures of variance explained

  2. Simulating plot errors with FieldSimR
    1. Simulation parameters
    2. field_trial_error() and plot_effects() functions

  1. Generating phenotypes by combining the plot errors with the genetic values
    1. make_phenotypes() function
      • Measures of heritability and expected accuracy


Preliminaries

Clean the working environment.

rm(list = ls())

Load required packages.

library(FieldSimR)
library(AlphaSimR)
#> Loading required package: R6
library(ggplot2)
library(mbend)

The exact results in this tutorial can be reproduced by initially setting the following seed.

set.seed(123)


Simulation details

Consider a scenario in which 200 inbred wheat genotypes are evaluated for grain yield (t/ha) in field trials across 10 growing environments. The genotypes are simulated with additive genetic relatedness using AlphaSimR. Each trial comprises 20 columns and 20 rows for 400 plots in total, with two complete blocks of all genotypes aligned in the column direction (side-by-side). The overall trait mean is 4 t/ha and the overall trait variance is 0.2 t2/ha2. The average plot-level heritability is 0.3, which reflects the average ratio of genetic variance to total phenotypic variance within environments. The error variance is given by 0.2/0.3-0.2 = 0.47 t2/ha2.


The simulation parameters are:

n_genos <- 200 # no. genotypes
n_envs <- 10   # no. environments
n_reps <- n_blocks <- 2 # no. replicates/blocks per environment
block_dir <- "col" # block direction
n_cols <- 20   # no. columns per environment
n_rows <- 20   # no. rows per environment
n_plots <- n_cols * n_rows # no. plots per environment == n_genos * n_reps
mu <- 4        # overall trait mean
sigma2 <- 0.2  # overall trait variance 
H2 <- 0.3      # average plot-level heritability
sigma2e <- sigma2/H2 - sigma2 # error variance
round(sigma2e, digits = 2)
#> [1] 0.47


The field trial layout for environment 1 is:


The phenotypes are generated based on the linear mixed model:

\(\mathbf{y} = \mathbf{Z}\mathbf{gv} + \mathbf{e}\),

where \(\mathbf{gv}\) is a vector of genetic values with design matrix \(\mathbf{Z}\) and \(\mathbf{e}\) is a vector of plot errors. The inclusion of additional non-genetic effects is straightforward.

The simulation involves three steps:

  1. Simulating the genetic values with AlphaSimR, using FieldSimR’s wrapper functions
  2. Simulating the plot errors with FieldSimR
  3. Generating the phenotypes by combining the genetic values with the non-genetic effects and plot errors.


1. Simulating the genetic values with AlphaSimR

The genetic values are simulated to capture GxE interaction, which can be broadly categorised as either heterogeneity of genetic variance (changes in genotype scale) or lack of genetic correlation (changes in genotype rank).

Simulating the genetic values involves two parts:

  1. Setting the simulation parameters for the founder population and trait architecture
  2. Simulating an additive genetic trait using FieldSimR’s wrapper functions for GxE interaction.

FieldSimR provides wrapper functions for compound symmetry and unstructured models for GxE interaction.


a. Setting the simulation parameters

The number of homogeneous genotypes in the founder population is set to 20, the number of chromosomes is set to 20 and the number of segregating sites (biallelic QTN) per chromosome is set to 300.

n_founders <- 20 # Number of genotypes in the founder population.
n_chr <- 21 # Number of chromosomes.
n_seg_sites <- 300 # Number of QTN per chromosome.


The overall trait mean and variance are:

mu # trait mean
#> [1] 4
sigma2 # trait variance
#> [1] 0.2


b. Simulating an additive genetic trait

The founder population is simulated using the runMacs() function.

# Simulating a founder population using AlphaSimR's "WHEAT" presets to mimic the species' evolutionary history
founders <- runMacs(nInd = n_founders,
                    nChr = n_chr,
                    segSites = n_seg_sites,
                    inbred = TRUE,
                    species = "WHEAT",
                    nThreads = 2)
founders # MapPop object
#> An object of class "MapPop" 
#> Ploidy: 2 
#> Individuals: 20 
#> Chromosomes: 21 
#> Loci: 6300
SP <- SimParam$new(founders)


An additive genetic trait representing grain yield is then simulated with AlphaSimR, using FieldSimR’s wrapper functions. The functionality will be demonstrated separately for the simulations involving the compound symmetry and unstructured models.


Compound symmetry model

The genetic values for the compound symmetry model are constructed as the sum of two terms:

\(\mathbf{gv} = \mathbf{1} \otimes \mathbf{g} + \mathbf{{ge}}\),

where \(\mathbf{g}\) is the vector of genotype main effects and \(\mathbf{{ge}}\) is the vector of GxE interaction effects. The relative magnitude of main effect variance to total genetic variance (main effect + interaction variance) is set to 0.5. This allocates half of the overall trait variance to the genotype main effects and half to the GxE interaction effects.

rel_main_eff <- 0.5


FieldSimR’s wrapper function compsym_asr_input() is then used to populate a list of input parameters for AlphaSimR. The list contains vectors for the mean and variance of the genotype main effects and interaction effects, as well as the correlation matrix between all effects.

input_asr1 <- compsym_asr_input(n_traits = 1,
                                n_envs = n_envs,
                                mean = mu,
                                var = sigma2,
                                rel_main_eff_A = rel_main_eff)
input_asr1
#> $mean
#>  [1] 4 0 0 0 0 0 0 0 0 0 0
#> 
#> $var
#>  [1] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
#> 
#> $cor_A
#>       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
#>  [1,]    1    0    0    0    0    0    0    0    0     0     0
#>  [2,]    0    1    0    0    0    0    0    0    0     0     0
#>  [3,]    0    0    1    0    0    0    0    0    0     0     0
#>  [4,]    0    0    0    1    0    0    0    0    0     0     0
#>  [5,]    0    0    0    0    1    0    0    0    0     0     0
#>  [6,]    0    0    0    0    0    1    0    0    0     0     0
#>  [7,]    0    0    0    0    0    0    1    0    0     0     0
#>  [8,]    0    0    0    0    0    0    0    1    0     0     0
#>  [9,]    0    0    0    0    0    0    0    0    1     0     0
#> [10,]    0    0    0    0    0    0    0    0    0     1     0
#> [11,]    0    0    0    0    0    0    0    0    0     0     1

Note that the overall trait mean is assigned to the genotype main effects.


An additive genetic trait representing grain yield is then simulated, which produces 11 columns in the founder population. The first column contains the genotype main effects, while the remaining columns contain the GxE interaction effects.

SP$addTraitA(nQtlPerChr = n_seg_sites,
             mean = input_asr1$mean,
             var = input_asr1$var,
             corA = input_asr1$cor_A)
founders1 <- newPop(founders) # create pop-class object
head(founders1@gv)
#>        Trait1      Trait2      Trait3       Trait4     Trait5      Trait6
#> [1,] 4.507921  0.22091428 -0.21247781  0.266862496 -0.2116897 -0.26634128
#> [2,] 4.146366 -0.13610030  0.23203094  0.019463379 -0.1325774 -0.61241904
#> [3,] 4.034316  0.36390890 -0.04087896  0.006152697  0.2078321  0.29064333
#> [4,] 3.809426 -0.02633595 -0.03638150 -0.576911660 -0.2299918  0.24457708
#> [5,] 3.660717  0.06424753  0.85861671  0.011598320 -0.1612934  0.28337313
#> [6,] 4.282772 -0.20359990 -0.02687326  0.160932637 -0.3852944  0.02875901
#>          Trait7       Trait8      Trait9     Trait10     Trait11
#> [1,]  0.3174922  0.006956874  0.49571203  0.02901406 -0.06409046
#> [2,]  0.3149393 -0.595855023  0.30833434  0.30345226  0.14039230
#> [3,]  0.1458506 -0.053083925  0.09321479 -0.27890150  0.47069988
#> [4,] -0.2799334 -0.270474467 -0.40806311  0.44517086  0.02288598
#> [5,]  0.3694393 -0.138224661  0.30919559  0.26697985 -0.53057050
#> [6,] -0.2768191  0.122751983  0.08698575 -0.31565054 -0.60453500


The inbred wheat genotypes are generated by randomly crossing the founders using the randCross() function, and then made into DH lines using the makeDH() function.

f1s <- randCross(pop = founders1, 
                 nCrosses = 20, # no. crosses
                 nProgeny = 10) # no. progeny per cross
DHs <- makeDH(pop = f1s, nDH = 1)
DHs
#> An object of class "Pop" 
#> Ploidy: 2 
#> Individuals: 200 
#> Chromosomes: 21 
#> Loci: 6300 
#> Traits: 11


FieldSimR’s wrapper function compsym_asr_output() is then used to obtain the genetic values for the DH population. This function constructs the genetic values for each environment as the sum of the genotype main effects and GxE interaction effects. When return_effects = TRUE, a list is returned with an additional data frame containing the genotype main effects and GxE interaction effects for each environment.

gv_df1 <- compsym_asr_output(pop = DHs,
                             n_traits = 1,
                             n_envs = n_envs,
                             n_reps = n_reps,
                             return_effects = TRUE)
head(gv_df1$Trial_df) # data frame with simulated genetic values
#>   env rep  id gv.Trait1
#> 1   1   1 221  4.423045
#> 2   1   1 222  4.251034
#> 3   1   1 223  4.311441
#> 4   1   1 224  4.318957
#> 5   1   1 225  3.994904
#> 6   1   1 226  4.558925
head(gv_df1$Trait1) # data frame with simulated genotype main effects and GxE interaction effects
#>    id     main    int.Env1   int.Env2     int.Env3    int.Env4   int.Env5
#> 1 221 4.329882  0.09316353 0.55278333  0.051959381  0.34292847 -0.2371918
#> 2 222 4.167223  0.08381139 0.23097697 -0.006207443 -0.37549467 -0.1712432
#> 3 223 3.882204  0.42923625 0.05902127  0.070045022 -0.07337969 -0.3267089
#> 4 224 4.210671  0.10828578 0.27992990 -0.041946891 -0.01063246 -0.2483420
#> 5 225 4.020083 -0.02517896 0.27184636  0.092397850 -0.22388424  0.1297582
#> 6 226 4.299234  0.25969087 0.27005091  0.091363176 -0.03158187  0.2050466
#>      int.Env6   int.Env7    int.Env8     int.Env9   int.Env10
#> 1  0.04775242 -0.4558783 -0.04258169 -0.024428552  0.34731671
#> 2 -0.04731661 -0.4288389 -0.29031517 -0.349867791  0.37160639
#> 3  0.06965547 -0.4838334  0.40857187  0.041897964  0.16619257
#> 4 -0.04322254 -0.1048843  0.01752054 -0.005105602  0.30526549
#> 5  0.36985120 -0.0376870 -0.16460878  0.256049882  0.93213711
#> 6 -0.08617849 -0.5931950  0.02606397  0.191390095 -0.06574682
GV <- matrix(gv_df1$Trial_df$gv.Trait1, ncol = n_envs)
round(mean(GV), digits = 2) # overall trait mean in the DH population
#> [1] 3.9
round(var(GV), digits = 2) # simulated between-environment genetic covariance matrix in the DH population
#>       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#>  [1,] 0.28 0.19 0.16 0.16 0.14 0.13 0.16 0.18 0.16  0.19
#>  [2,] 0.19 0.26 0.16 0.14 0.15 0.17 0.16 0.18 0.15  0.16
#>  [3,] 0.16 0.16 0.22 0.16 0.11 0.13 0.15 0.17 0.11  0.14
#>  [4,] 0.16 0.14 0.16 0.24 0.11 0.11 0.14 0.17 0.12  0.16
#>  [5,] 0.14 0.15 0.11 0.11 0.21 0.14 0.17 0.14 0.14  0.16
#>  [6,] 0.13 0.17 0.13 0.11 0.14 0.23 0.16 0.16 0.13  0.16
#>  [7,] 0.16 0.16 0.15 0.14 0.17 0.16 0.25 0.16 0.15  0.17
#>  [8,] 0.18 0.18 0.17 0.17 0.14 0.16 0.16 0.28 0.14  0.17
#>  [9,] 0.16 0.15 0.11 0.12 0.14 0.13 0.15 0.14 0.20  0.15
#> [10,] 0.19 0.16 0.14 0.16 0.16 0.16 0.17 0.17 0.15  0.26
round(mean(diag(var(GV))), digits = 2) # overall trait variance in the DH population
#> [1] 0.24


Unstructured model

The genetic values for the unstructured model are simulated as:

\(\mathbf{gv} \sim \mbox{N}(\mu\mathbf{1}, \mathbf{G_e} \otimes \mathbf{I})\),

where \(\mu\) is the overall trait mean and \(\mathbf{G_e}\) is the between-environment genetic covariance matrix, which is constructed as:

\(\mathbf{G_e} = \mathbf{D}^{1/2}_\mathbf{e}\mathbf{C_e}\mathbf{D}^{1/2}_\mathbf{e}\)

where \(\mathbf{D_e}\) is a diagonal genetic variance matrix and \(\mathbf{C_e}\) is a between-environment genetic correlation matrix.

The genetic variances are sampled from an inverse gamma distribution, with shape parameter of 5 and scale parameter of 1. The genetic variances are then shifted to give an overall trait variance of 0.2.

de <- 1/rgamma(n_envs, shape = 5, rate = 1) # sample
de <- c(scale(de, center = TRUE, scale = FALSE) + sigma2) # shift
round(de, digits = 2)
#>  [1]  0.06  0.11  0.23  0.18  0.30 -0.02  0.20  0.07  0.43  0.44
mean(de) # overall trait variance
#> [1] 0.2
De <- diag(de)


The genetic correlations are sampled from a continuous uniform distribution ranging from 0.3 to 0.7, which produces a mean genetic correlation of 0.5. This is achieved using FieldSimR’s function rand_cor_mat(). When pos_def = TRUE, the correlation matrix is bent to ensure it is positive definite.

Ce <- rand_cor_mat(p = n_envs, min_cor = 0.3, max_cor = 0.7, pos_def = TRUE)
#> Unweighted bending
#> max.iter = 10000
#> small.positive = 1e-04
#> method = hj
#> Found a correlation matrix.
#> Convergence met after 5 iterations.
round(Ce, digits = 2)
#>       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#>  [1,] 1.00 0.37 0.58 0.62 0.66 0.66 0.36 0.65 0.33  0.70
#>  [2,] 0.37 1.00 0.32 0.44 0.61 0.59 0.56 0.44 0.35  0.31
#>  [3,] 0.58 0.32 1.00 0.50 0.43 0.54 0.58 0.68 0.69  0.45
#>  [4,] 0.62 0.44 0.50 1.00 0.58 0.43 0.43 0.45 0.48  0.62
#>  [5,] 0.66 0.61 0.43 0.58 1.00 0.67 0.65 0.42 0.64  0.51
#>  [6,] 0.66 0.59 0.54 0.43 0.67 1.00 0.58 0.39 0.58  0.68
#>  [7,] 0.36 0.56 0.58 0.43 0.65 0.58 1.00 0.52 0.60  0.36
#>  [8,] 0.65 0.44 0.68 0.45 0.42 0.39 0.52 1.00 0.52  0.54
#>  [9,] 0.33 0.35 0.69 0.48 0.64 0.58 0.60 0.52 1.00  0.44
#> [10,] 0.70 0.31 0.45 0.62 0.51 0.68 0.36 0.54 0.44  1.00

FieldSimR’s wrapper function unstr_asr_input() is then used to populate a list of input parameters for AlphaSimR. The list contains vectors for the mean and variance of the genetic values in each environment, as well as the between-environment genetic correlation matrix.

input_asr2 <- unstr_asr_input(n_traits = 1,
                              n_envs = n_envs,
                              mean = mu,
                              var = de,
                              cor_A = Ce)
input_asr2
#> $mean
#>  [1] 4 4 4 4 4 4 4 4 4 4
#> 
#> $var
#>  [1]  0.05790685  0.10823111  0.22666303  0.18322584  0.30279583 -0.01706116
#>  [7]  0.19531678  0.06666998  0.43365466  0.44259708
#> 
#> $cor_A
#>            [,1]      [,2]      [,3]      [,4]      [,5]      [,6]      [,7]
#>  [1,] 1.0000000 0.3718899 0.5831631 0.6197509 0.6620683 0.6564778 0.3613163
#>  [2,] 0.3718899 1.0000000 0.3215893 0.4418798 0.6064223 0.5850056 0.5556100
#>  [3,] 0.5831631 0.3215893 1.0000000 0.4975177 0.4308969 0.5431501 0.5798006
#>  [4,] 0.6197509 0.4418798 0.4975177 1.0000000 0.5765267 0.4288828 0.4266768
#>  [5,] 0.6620683 0.6064223 0.4308969 0.5765267 1.0000000 0.6735068 0.6489137
#>  [6,] 0.6564778 0.5850056 0.5431501 0.4288828 0.6735068 1.0000000 0.5781115
#>  [7,] 0.3613163 0.5556100 0.5798006 0.4266768 0.6489137 0.5781115 1.0000000
#>  [8,] 0.6529008 0.4447530 0.6760477 0.4536838 0.4182234 0.3895078 0.5152486
#>  [9,] 0.3315370 0.3540169 0.6850190 0.4790590 0.6448676 0.5750719 0.5994941
#> [10,] 0.7010280 0.3144170 0.4533120 0.6155981 0.5085389 0.6789134 0.3644169
#>            [,8]      [,9]     [,10]
#>  [1,] 0.6529008 0.3315370 0.7010280
#>  [2,] 0.4447530 0.3540169 0.3144170
#>  [3,] 0.6760477 0.6850190 0.4533120
#>  [4,] 0.4536838 0.4790590 0.6155981
#>  [5,] 0.4182234 0.6448676 0.5085389
#>  [6,] 0.3895078 0.5750719 0.6789134
#>  [7,] 0.5152486 0.5994941 0.3644169
#>  [8,] 1.0000000 0.5228378 0.5401210
#>  [9,] 0.5228378 1.0000000 0.4390177
#> [10,] 0.5401210 0.4390177 1.0000000


An additive genetic trait representing grain yield is then simulated, which produces 10 columns in the founder population. The columns contain the genetic values for each environment.

SP <- SimParam$new(founders)
SP$addTraitA(nQtlPerChr = n_seg_sites,
             mean = input_asr2$mean,
             var = input_asr2$var,
             corA = input_asr2$cor_A)
#> Warning in sqrt(var[i]): NaNs produced
founders2 <- newPop(founders) # create pop-class object
head(founders2@gv)
#>        Trait1   Trait2   Trait3   Trait4   Trait5 Trait6   Trait7   Trait8
#> [1,] 3.957456 3.466188 3.891599 3.691421 3.341357    NaN 3.350102 3.892580
#> [2,] 3.887408 4.110552 2.874634 3.719378 3.856488    NaN 3.708284 3.610557
#> [3,] 4.216900 4.329390 4.354999 3.706663 4.069446    NaN 4.908793 4.399781
#> [4,] 3.531674 3.936215 3.858938 3.316243 3.782335    NaN 4.331822 3.694656
#> [5,] 3.749395 4.509515 3.219285 3.914530 3.927330    NaN 3.969030 3.705926
#> [6,] 4.123313 3.662853 3.871763 3.133405 4.239831    NaN 3.213352 3.902217
#>        Trait9  Trait10
#> [1,] 3.324880 2.796227
#> [2,] 3.021443 3.191348
#> [3,] 3.759965 4.732117
#> [4,] 4.768531 3.061471
#> [5,] 3.088558 3.058519
#> [6,] 3.930163 4.791642


The wheat genotypes are generated by randomly crossing the founders using the randCross() function, and then made into doubled haploid (DH) lines using the makeDH() function.

f1s <- randCross(pop = founders2, 
                 nCrosses = 20, # no. crosses
                 nProgeny = 10) # no. progeny per cross
DHs <- makeDH(pop = f1s, nDH = 1)
DHs
#> An object of class "Pop" 
#> Ploidy: 2 
#> Individuals: 200 
#> Chromosomes: 21 
#> Loci: 6300 
#> Traits: 10


FieldSimR’s wrapper function unstr_asr_output() is then used to obtain the genetic values for the DH population.

gv_df2 <- unstr_asr_output(pop = DHs,
                           n_traits = 1,
                           n_envs = n_envs,
                           n_reps = n_reps)
head(gv_df2)
#>   env rep  id gv.Trait1
#> 1   1   1 221  4.376070
#> 2   1   1 222  4.041503
#> 3   1   1 223  4.243742
#> 4   1   1 224  3.952004
#> 5   1   1 225  4.186191
#> 6   1   1 226  3.719800
GV <- matrix(gv_df2$gv.Trait1[gv_df2$rep == 1], ncol = n_envs)
round(mean(GV), digits = 2) # overall trait mean in the DH population
#> [1] NaN
round(cov2cor(var(GV)), digits = 2) # simulated between-environment genetic correlation matrix in the DH population
#> Warning in cov2cor(var(GV)): diag(.) had 0 or NA entries; non-finite result is
#> doubtful
#>       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#>  [1,] 1.00 0.32 0.52 0.26 0.72   NA 0.27 0.62 0.20  0.58
#>  [2,] 0.32 1.00 0.31 0.29 0.55   NA 0.60 0.37 0.28  0.21
#>  [3,] 0.52 0.31 1.00 0.44 0.38   NA 0.53 0.65 0.71  0.37
#>  [4,] 0.26 0.29 0.44 1.00 0.23   NA 0.20 0.26 0.49  0.42
#>  [5,] 0.72 0.55 0.38 0.23 1.00   NA 0.52 0.36 0.46  0.39
#>  [6,]   NA   NA   NA   NA   NA    1   NA   NA   NA    NA
#>  [7,] 0.27 0.60 0.53 0.20 0.52   NA 1.00 0.49 0.59  0.33
#>  [8,] 0.62 0.37 0.65 0.26 0.36   NA 0.49 1.00 0.45  0.39
#>  [9,] 0.20 0.28 0.71 0.49 0.46   NA 0.59 0.45 1.00  0.33
#> [10,] 0.58 0.21 0.37 0.42 0.39   NA 0.33 0.39 0.33  1.00
round(Ce, digits = 2) # expected between-environment genetic correlation matrix
#>       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#>  [1,] 1.00 0.37 0.58 0.62 0.66 0.66 0.36 0.65 0.33  0.70
#>  [2,] 0.37 1.00 0.32 0.44 0.61 0.59 0.56 0.44 0.35  0.31
#>  [3,] 0.58 0.32 1.00 0.50 0.43 0.54 0.58 0.68 0.69  0.45
#>  [4,] 0.62 0.44 0.50 1.00 0.58 0.43 0.43 0.45 0.48  0.62
#>  [5,] 0.66 0.61 0.43 0.58 1.00 0.67 0.65 0.42 0.64  0.51
#>  [6,] 0.66 0.59 0.54 0.43 0.67 1.00 0.58 0.39 0.58  0.68
#>  [7,] 0.36 0.56 0.58 0.43 0.65 0.58 1.00 0.52 0.60  0.36
#>  [8,] 0.65 0.44 0.68 0.45 0.42 0.39 0.52 1.00 0.52  0.54
#>  [9,] 0.33 0.35 0.69 0.48 0.64 0.58 0.60 0.52 1.00  0.44
#> [10,] 0.70 0.31 0.45 0.62 0.51 0.68 0.36 0.54 0.44  1.00
round(mean(diag(var(GV))), digits = 2) # overall trait variance in the DH population
#> [1] NA


Measures of variance explained

It is possible to obtain genotype main effects and GxE interaction effects for this approach, although these components were not explicitly simulated. The proportion of main effect and interaction variance can also be formally quantified to aid with interpretation.

The genotype main effects are obtained as simple averages across environments.

g_main <- rowMeans(GV)
round(var(g_main), digits = 2) # simulated main effect variance
#> [1] NA
Ge <- sqrt(De) %*% Ce %*% sqrt(De)
#> Warning in sqrt(De): NaNs produced

#> Warning in sqrt(De): NaNs produced
round(mean(Ge), digits = 2) # expected main effect variance
#> [1] NaN


The GxE interaction effects are then obtained after adjusting for the genotype main effects.

ge_int <- c(GV - g_main)
GE_int <- matrix(ge_int, ncol = n_envs)
round(var(GE_int), digits = 2) # simulated interaction variance matrix
#>       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#>  [1,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
#>  [2,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
#>  [3,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
#>  [4,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
#>  [5,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
#>  [6,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
#>  [7,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
#>  [8,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
#>  [9,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
#> [10,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
round(mean(diag(var(GE_int))), digits = 2) # simulated pooled interaction variance
#> [1] NA
round(mean(diag(Ge - mean(Ge))), digits = 2) # expected pooled interaction variance
#> [1] NaN


The pooled interaction variance can be further split into heterogeneity of genetic variance and lack of genetic correlation (Cooper et al., 1994).

round(sum((sqrt(diag(var(GV))) - mean(sqrt(diag(var(GV)))))^2)/n_envs, digits = 2) # heterogeneity of genetic variance
#> [1] NA
round(sum(outer(sqrt(diag(var(GV))), sqrt(diag(var(GV))), "*") * (1 - cov2cor(var(GV))))/n_envs^2, digits = 2) # lack of genetic correlation
#> Warning in cov2cor(var(GV)): diag(.) had 0 or NA entries; non-finite result is
#> doubtful
#> [1] NA

This indicates that 0.1 of the interaction variance corresponds to changes in scale between environments and 0.9 corresponds to changes in rank.


2. Simulating the plot errors with FieldSimR

The plot errors are obtained as the sum of spatial, random and extraneous error terms. The proportions of spatial and extraneous error variance are sampled from a continuous uniform distribution ranging from 0.2 to 0.6 and from 0 to 0.2, respectively. This produces a mean proportion of spatial error variance of 0.5 and a mean proportion of extraneous error variance of 0.1. FieldSimR will then allocate the remaining error variance to the random errors by default.

The initial simulation parameters are:

spatial_model <- "AR1:AR1"
round(sigma2e, digits = 2) # error variance
#> [1] 0.47
prop_spatial <- runif(n_envs, min = 0.2, max = 0.6) # proportions of spatial error variance
round(prop_spatial, digits = 2)
#>  [1] 0.56 0.58 0.22 0.37 0.39 0.55 0.22 0.52 0.40 0.43
prop_ext <- runif(n_envs, min = 0, max = 0.2) # proportions of extraneous error variance
round(prop_ext, digits = 2)
#>  [1] 0.04 0.00 0.06 0.07 0.15 0.09 0.10 0.05 0.12 0.16
round(1 - (prop_spatial + prop_ext), digits = 2) # proportions of random error variance
#>  [1] 0.40 0.42 0.72 0.55 0.46 0.36 0.69 0.43 0.48 0.42

The column and row autocorrelations are sampled from a continuous uniform distribution ranging from 0.3 to 0.7 and from 0.5 to 0.9, respectively. This produces a mean column autocorrelation of 0.5 and a mean row autocorrelation of 0.7.

col_cor <- runif(n_envs, min = 0.3, max = 0.7) # column autocorrelations
row_cor <- runif(n_envs, min = 0.5, max = 0.9) # row autocorrelations

The extraneous errors are simulated based on zig-zag ordering between neighbouring rows.

ext_ord <- "zig-zag" # ordering of extraneous variation
ext_dir <- "row"

The simulation is achieved by populating the field_trial_error() function with the terms above. A data frame is returned containing the environment, block, column and row numbers as well as the simulated plot error. When return_effects = TRUE, a list is returned with an additional data frame containing the spatial, random and extraneous error terms.

error_df <- field_trial_error(n_traits = 1,
                              n_envs = n_envs,
                              n_blocks = n_blocks,
                              block_dir = block_dir,
                              n_cols = n_cols,
                              n_rows = n_rows,
                              var_R = sigma2e,
                              prop_spatial = prop_spatial,
                              spatial_model = spatial_model,
                              col_cor = col_cor,
                              row_cor = row_cor,
                              prop_ext = prop_ext,
                              ext_ord = ext_ord,
                              ext_dir = ext_dir,
                              return_effects = TRUE)
head(error_df$plot_df) # data frame with simulated plot errors
#>   env block col row   e.Trait1
#> 1   1     1   1   1 0.99816574
#> 2   1     1   1   2 0.01815270
#> 3   1     1   1   3 1.00706780
#> 4   1     1   1   4 0.01946961
#> 5   1     1   1   5 0.07762564
#> 6   1     1   1   6 0.62174521
head(error_df$Trait1) # data frame with simulated spatial, random and extraneous errors <--
#>   env block col row       e.spat      e.rand e.ext.col   e.ext.row
#> 1   1     1   1   1  0.283384372  0.81137172         0 -0.09659036
#> 2   1     1   1   2  0.001659233 -0.06918546         0  0.08567893
#> 3   1     1   1   3  0.213431803  0.86967955         0 -0.07604355
#> 4   1     1   1   4 -0.052119742  0.02349163         0  0.04809772
#> 5   1     1   1   5 -0.349301824  0.46173516         0 -0.03480769
#> 6   1     1   1   6  0.345313365  0.22253975         0  0.05389209


The separate error terms can be displayed using FieldSimR’s function plot_effects(). The spatial errors for environment 1 are given by:

plot_effects(error_df$Trait1[error_df$Trait1$env == 1,], effect = "e.spat", labels = TRUE)


The total plot errors for environment 1 are given by:

plot_effects(error_df$plot_df[error_df$Trait1$env == 1,], effect = "e.Trait1", labels = TRUE)


The theoretical and sample variograms can be displayed using FieldSimR’s functions theoretical_variogram() and sample_variogram(). The sample variogram for environment 1 for the total plot errors is given by:

sample_variogram(error_df$plot_df[error_df$Trait1$env == 1,], effect = "e.Trait1")


3. Generating the phenotypes

The phenotypes are generated using FieldSimR’s function make_phenotypes(). When randomise = TRUE, genotypes are allocated to plots using a randomised complete block design.

met_df <- make_phenotypes(gv_df = gv_df2, error_df = error_df, randomise = TRUE)
head(met_df)
#>   env block col row  id phe.Trait1
#> 1   1     1   1   1 224   4.950170
#> 2   1     1   1   2 330   3.980112
#> 3   1     1   1   3 385   5.127561
#> 4   1     1   1   4 335   4.021688
#> 5   1     1   1   5 286   3.702615
#> 6   1     1   1   6 284   4.530832

The true genetic values are also added to aid with interpretation.

Z <- model.matrix(~id:env - 1, data = met_df)
gv <- c(GV)
met_df$gv.Trait1 <- c(Z %*% gv)
head(met_df)
#>   env block col row  id phe.Trait1 gv.Trait1
#> 1   1     1   1   1 224   4.950170       NaN
#> 2   1     1   1   2 330   3.980112       NaN
#> 3   1     1   1   3 385   5.127561       NaN
#> 4   1     1   1   4 335   4.021688       NaN
#> 5   1     1   1   5 286   3.702615       NaN
#> 6   1     1   1   6 284   4.530832       NaN


The phenotypes for environment 1 are given by:

plot_effects(met_df[met_df$env == 1,], effect = "phe.Trait1", labels = TRUE)


Measures of heritability and expected expected accuracy

The simulated plot-level and line-mean heritabilities can be formally quantified to aid with interpretation.

The plot-level heritabilities within environments are:

round(c(by(met_df[,c("phe.Trait1","gv.Trait1")], met_df$env, function(x) {cor(x$phe.Trait1, x$gv.Trait1)}))^2, digits = 2) # simulated plot-level heritabilities
#>  1  2  3  4  5  6  7  8  9 10 
#> NA NA NA NA NA NA NA NA NA NA
round(de/(de + sigma2e), digits = 2) # expected plot-level heritabilities
#>  [1]  0.11  0.19  0.33  0.28  0.39 -0.04  0.30  0.13  0.48  0.49

The minor discrepancies here are a result of the sampling process.


The line-mean heritabilities within environments are:

ave_yield <- with(met_df, tapply(phe.Trait1, list(id, env), mean))
tmp <- met_df[met_df$block == 1,]
tmp <- tmp[order(tmp$env, tmp$id),]
round(c(by(tmp[,c("phe.Trait1","gv.Trait1")], tmp$env, function(x) {cor(x$phe.Trait1, x$gv.Trait1)}))^2, digits = 2) # simulated line-mean heritabilities
#>  1  2  3  4  5  6  7  8  9 10 
#> NA NA NA NA NA NA NA NA NA NA
round(de/(de + sigma2e/n_reps), digits = 2) # expected line-mean heritabilities
#>  [1]  0.20  0.32  0.49  0.44  0.56 -0.08  0.46  0.22  0.65  0.65
round(c(by(tmp[,c("phe.Trait1","gv.Trait1")], tmp$env, function(x) {cor(x$phe.Trait1, x$gv.Trait1)})), digits = 2) # simulated prediction accuracies
#>  1  2  3  4  5  6  7  8  9 10 
#> NA NA NA NA NA NA NA NA NA NA
round(sqrt(de/(de + sigma2e/n_reps)), digits = 2) # expected prediction accuracies
#> Warning in sqrt(de/(de + sigma2e/n_reps)): NaNs produced
#>  [1] 0.45 0.56 0.70 0.66 0.75  NaN 0.68 0.47 0.81 0.81


The line-mean heritability across all environments is:

ave_yield <- with(met_df, tapply(phe.Trait1, id, mean))
round(cor(ave_yield, g_main)^2, digits = 2) # simulated line-mean heritability
#> [1] NA
round(var(g_main)/(var(g_main) + mean(diag(var(GE_int)))/n_envs + sigma2e/(n_envs*n_reps)), digits = 2) # expected line-mean heritability
#> [1] NA
round(cor(ave_yield, g_main)^2, digits = 2) # simulated prediction accuracy
#> [1] NA
round(sqrt(var(g_main)/(var(g_main) + mean(diag(var(GE_int)))/n_envs + sigma2e/(n_envs*n_reps))), digits = 2) # expected prediction accuracy
#> [1] NA