The objective of this tutorial is to simulate plot data using FieldSimR which captures spatial variation with realistic structure and complexity. Genetic values will be simulated using AlphaSimR based on an additive genetic trait representing grain yield.


Summary

The simulation can be summarised by the following steps:

  1. Simulating plot errors with FieldSimR
    1. Simulation parameters
    2. field_trial_error() and plot_effects() functions

  1. Simulating genetic values with AlphaSimR
    1. Simulation parameters
    2. Simulating an additive genetic trait

  1. Generating phenotypes by combining the plot errors with the genetic values
    1. make_phenotypes() function
      • Measures of heritability and expected accuracy


Preliminaries

Clean the working environment.

rm(list = ls())

Load required packages.

library(FieldSimR)
library(AlphaSimR)
#> Loading required package: R6
library(ggplot2)

The exact results in this tutorial can be reproduced by initially setting the following seed.

set.seed(123)


Simulation details

Consider a scenario in which 100 inbred wheat genotypes are evaluated for grain yield (t/ha) in a single field trial. The genotypes are simulated with additive genetic relatedness using AlphaSimR. The trial comprises 10 columns and 20 rows for 200 plots in total. There are two complete blocks of all genotypes aligned in the column direction (side-by-side). The trial is 80 m long in the column direction and 40 m wide in the row direction, with plots 8m long by 2m wide. The trait mean is 4 t/ha and the trait variance is 0.2 t2/ha2. The plot-level heritability is 0.3, which reflects the ratio of genetic variance to total phenotypic variance. The error variance is given by 0.2/0.3-0.2 = 0.47 t2/ha2.


The simulation parameters are:

n_genos <- 100 # no. genotypes
n_reps <- n_blocks <- 2 # no. replicates/blocks
block_dir <- "col" # block direction
n_cols <- 10   # no. columns
n_rows <- 20   # no. rows
n_plots <- n_cols * n_rows # no. plots == n_genos * n_reps
plot_length <- 8
plot_width <- 2 
mu <- 4        # trait mean
sigma2 <- 0.2  # trait variance 
H2 <- 0.3      # plot-level heritability
sigma2e <- sigma2/H2 - sigma2 # error variance
round(sigma2e, digits = 2)
#> [1] 0.47


The blank field trial layout is:


The phenotypes are generated based on the linear mixed model:

\(\mathbf{y} = \mathbf{Z}\mathbf{gv} + \mathbf{e}\),

where \(\mathbf{gv}\) is the vector of genetic values with design matrix \(\mathbf{Z}\) and \(\mathbf{e}\) is the vector of plot errors. The inclusion of additional non-genetic effects is straightforward.

The simulation involves three steps:

  1. Simulating the plot errors with FieldSimR
  2. Simulating the genetic values with AlphaSimR
  3. Generating the phenotypes by combining the plot errors with the genetic values.


1. Simulating the plot errors with FieldSimR

The plot errors are simulated to capture spatial variation, which can be broadly categorised as either global and local trend (smooth spatial trend), random error (noise) or extraneous variation (systematic error).

The plot errors are constructed as the sum of three terms:

\(\mathbf{e} = \mathbf{se} + \mathbf{re} + \mathbf{ee}\),

where \(\mathbf{se}\) is a vector of spatial errors that capture global and local trend, \(\mathbf{re}\) is a vector of random errors that capture random measurement error/noise and \(\mathbf{ee}\) is a vector of extraneous errors that capture extraneous variation.

Simulating the plot errors involves two parts:

  1. Setting the simulation parameters for the spatial, extraneous and random error terms
  2. Simulating the plot errors with the field_trial_error() function and displaying them with the plot_effects() function.


a. Setting the simulation parameters

The plot errors are generated based on a proportion of spatial error variance of 0.4 and a proportion of extraneous error variance of 0.2. FieldSimR will then allocate the remaining 0.4 to the random error variance by default.

The initial simulation parameters are:

round(sigma2e, digits = 2) # error variance
#> [1] 0.47
prop_spatial <- 0.4 # proportion of spatial error variance
prop_ext <- 0.2 # proportion of extraneous error variance
1 - (prop_spatial + prop_ext) # proportion of random error variance
#> [1] 0.4

The simulation parameters for the spatial and extraneous errors are defined in the following. No further parameters are required for the random errors.


Spatial errors

FieldSimR has the capacity to simulate spatial errors based on a separable first order autoregressive (AR1) process or bivariate interpolation. These options are set with spatial_model = "AR1:AR1" or spatial_model = "Bivariate".

FieldSimR’s functionality will be demonstrated using a separable AR1 process with column autocorrelation of 0.5 and row autocorrelation of 0.7. The simulation parameters are:

spatial_model1 <- "AR1:AR1"
prop_spatial # proportion of spatial error variance
#> [1] 0.4
col_cor <- 0.5 # column autocorrelation
row_cor <- 0.7 # row autocorrelation

Note that the plot lengths and widths are not required when spatial_model = "AR1:AR1".


FieldSimR’s functionality will also be demonstrated using bivariate interpolation with 100 additional knots points. This option is set with complexity = 100. The simulation parameters are:

spatial_model2 <- "Bivariate"
prop_spatial # proportion of spatial error variance
#> [1] 0.4
complexity <- 100 # number of additional knot points
plot_length
#> [1] 8
plot_width
#> [1] 2

Note that increasing the complexity will generate more local trend relative to global trend, up to a point where the spatial errors capture minimal or no trend.


Extraneous errors

FieldSimR has the capacity to simulate extraneous errors based on random or zig-zag ordering. These options are set with ext_ord = "random" or ext_ord = "zig-zag". The zig-zag ordering is achieved by alternating the positive and negative simulated values between neighbouring columns and/or rows.

Extraneous errors can be simulated in both directions, or either the column or row directions individually. These options are set with ext_dir = "both", ext_dir = "col" or ext_dir = "row".

The functionality will be demonstrated using zig-zag ordering between neighbouring rows. The simulation parameters are:

prop_ext # proportion of extraneous error variance
#> [1] 0.2
ext_ord <- "zig-zag" # ordering of extraneous variation
ext_dir <- "row" # direction of extraneous variation


b. Simulating the plot errors

The plot errors are simulated using FieldSimR’s function field_trial_error(). The functionality will be demonstrated separately for the simulations involving the AR1 process and bivariate interpolation.


Autoregressive process

The simulation is achieved by populating the field_trial_error() function with terms appropriate to the AR1 process, i.e. col_cor and row_cor. A data frame is returned containing the environment, block, column and row numbers as well as the simulated plot error. When return_effects = TRUE, a list is returned with an additional data frame containing the spatial, random and extraneous error terms.

error_df1 <- field_trial_error(n_traits = 1,
                               n_envs = 1,
                               n_blocks = n_blocks, # 2
                               block_dir = block_dir, # "col"
                               n_cols = n_cols, # 10
                               n_rows = n_rows, # 20
                               var_R = sigma2e, # 0.47
                               prop_spatial = prop_spatial, # 0.4
                               spatial_model = spatial_model1, # "AR1:AR1"
                               col_cor = col_cor, # 0.5
                               row_cor = row_cor, # 0.7
                               prop_ext = prop_ext, # 0.2
                               ext_ord = ext_ord, # "zig-zag"
                               ext_dir = ext_dir, # "row"
                               return_effects = TRUE)
head(error_df1$plot_df) # data frame with simulated plot errors
#>   env block col row    e.Trait1
#> 1   1     1   1   1  0.58491308
#> 2   1     1   1   2  0.27219805
#> 3   1     1   1   3 -0.02512364
#> 4   1     1   1   4  0.52557053
#> 5   1     1   1   5 -0.31935601
#> 6   1     1   1   6  0.88407277
head(error_df1$Trait1) # data frame with simulated spatial, random and extraneous errors
#>   env block col row     e.spat     e.rand e.ext.col   e.ext.row
#> 1   1     1   1   1 -0.3258259  0.9647753         0 -0.05403635
#> 2   1     1   1   2 -0.3218923  0.5682535         0  0.02583686
#> 3   1     1   1   3  0.3680727 -0.1374527         0 -0.25574365
#> 4   1     1   1   4  0.2793471  0.2241504         0  0.02207304
#> 5   1     1   1   5  0.2398195 -0.2041937         0 -0.35498184
#> 6   1     1   1   6  0.8213364 -0.2318872         0  0.29462355


The separate error terms can be displayed using FieldSimR’s function plot_effects(). For example, the spatial errors are given by:

plot_effects(error_df1$Trait1, effect = "e.spat", labels = TRUE)


The total plot errors are given by:

plot_effects(error_df1$plot_df, effect = "e.Trait1", labels = TRUE)


The theoretical and sample variograms can be displayed using FieldSimR’s functions theoretical_variogram() and sample_variogram(). The sample variogram for the total plot errors is given by:

sample_variogram(error_df1$plot_df, effect = "e.Trait1")

Bivariate interpolation

The simulation is achieved by populating the field_trial_error() function with terms appropriate to bivariate interpolation, i.e. plot_length, plot_width and complexity.

error_df2 <- field_trial_error(n_traits = 1,
                               n_envs = 1,
                               n_blocks = n_blocks, # 2
                               block_dir = block_dir, # "col"
                               n_cols = n_cols, # 10
                               n_rows = n_rows, # 20
                               var_R = sigma2e, # 0.47
                               prop_spatial = prop_spatial, # 0.4
                               spatial_model = spatial_model2, # "Bivariate"
                               plot_length = plot_length, # 8
                               plot_width = plot_width, # 2
                               complexity = complexity, # 100
                               prop_ext = prop_ext, # 0.2
                               ext_dir = ext_dir, # "row"
                               ext_ord = ext_ord, # "zig-zag"
                               return_effects = TRUE)
head(error_df2$plot_df) # data frame with simulated plot errors
#>   env block col row    e.Trait1
#> 1   1     1   1   1 -0.27516294
#> 2   1     1   1   2  1.47347874
#> 3   1     1   1   3  0.00652826
#> 4   1     1   1   4  0.26436917
#> 5   1     1   1   5 -0.40152695
#> 6   1     1   1   6  0.28554393
head(error_df2$Trait1) # data frame with simulated spatial, random and extraneous errors
#>   env block col row    e.spat      e.rand e.ext.col  e.ext.row
#> 1   1     1   1   1 0.1367561 -0.20053901         0 -0.2113800
#> 2   1     1   1   2 0.2289058  0.55481648         0  0.6897565
#> 3   1     1   1   3 0.3017271  0.09412504         0 -0.3893239
#> 4   1     1   1   4 0.3724624 -0.69437935         0  0.5862861
#> 5   1     1   1   5 0.3942169 -0.45212400         0 -0.3436198
#> 6   1     1   1   6 0.3646449 -0.27283687         0  0.1937359


The spatial errors are given by:

plot_effects(error_df2$Trait1, effect = "e.spat", labels = TRUE)


The total plot errors are given by:

plot_effects(error_df2$plot_df, effect = "e.Trait1", labels = TRUE)


2. Simulating the genetic values with AlphaSimR

Simulating the genetic values involves two parts:

  1. Setting the simulation parameters for the founder population and trait architecture
  2. Simulating an additive genetic trait for a population of inbred wheat genotypes.


a. Setting the simulation parameters

The number of homogeneous genotypes in the founder population is set to 20, the number of chromosomes is set to 21 and the number of segregating sites (biallelic QTN) per chromosome is set to 300.

n_founders <- 20 # Number of genotypes in the founder population
n_chr <- 21 # Number of chromosomes
n_seg_sites <- 300 # Number of QTN per chromosome


The trait mean and variance are:

mu # trait mean
#> [1] 4
sigma2 # trait variance
#> [1] 0.2


b. Simulating an additive genetic trait

The founder population is simulated using the runMacs() function.

# Simulating a founder population using AlphaSimR's "WHEAT" presets to mimic it's evolutionary history
founders <- runMacs(nInd = n_founders,
                    nChr = n_chr,
                    segSites = n_seg_sites,
                    inbred = TRUE,
                    species = "WHEAT",
                    nThreads = 2)
founders # MapPop object
#> An object of class "MapPop" 
#> Ploidy: 2 
#> Individuals: 20 
#> Chromosomes: 21 
#> Loci: 6300
SP <- SimParam$new(founders)


An additive genetic trait representing grain yield is then simulated using the addTraitA() function.

SP$addTraitA(nQtlPerChr = n_seg_sites,
             mean = mu,
             var = sigma2)
founders <- newPop(founders) # create pop-class object


The inbred wheat genotypes are generated by randomly crossing the founders using the randCross() function, and then made into doubled haploid (DH) lines using the makeDH() function.

f1s <- randCross(pop = founders, 
                 nCrosses = 20, # no. crosses
                 nProgeny = 5) # no. progeny per cross
DHs <- makeDH(pop = f1s, nDH = 1)
DHs
#> An object of class "Pop" 
#> Ploidy: 2 
#> Individuals: 100 
#> Chromosomes: 21 
#> Loci: 6300 
#> Traits: 1


The genetic values are obtained for the DH population.

gv <- c(DHs@gv)
round(mean(gv), digits = 2) # trait mean in the DH population
#> [1] 3.93
round(var(gv), digits = 2) # trait variance in the DH population
#> [1] 0.16
gv_df <- data.frame(env = factor(1),
                    rep = factor(rep(1:n_reps, each = n_genos)),
                    id = factor(DHs@id),
                    gv = gv)
head(gv_df)
#>   env rep  id       gv
#> 1   1   1 121 3.497497
#> 2   1   1 122 3.333328
#> 3   1   1 123 4.185656
#> 4   1   1 124 3.970236
#> 5   1   1 125 3.545663
#> 6   1   1 126 3.763558


3. Generating the phenotypes

The phenotypes are generated using FieldSimR’s function make_phenotypes(). When randomise = TRUE, genotypes are allocated to plots using a randomised complete block design.

trial_df <- make_phenotypes(gv_df = gv_df, error_df = error_df2, randomise = TRUE)
head(trial_df)
#>   env block col row  id phe.Trait1
#> 1   1     1   1   1 191   3.581255
#> 2   1     1   1   2 190   5.754405
#> 3   1     1   1   3 154   3.946641
#> 4   1     1   1   4 193   3.924392
#> 5   1     1   1   5 220   3.680489
#> 6   1     1   1   6 206   4.610504

The true genetic values are also added to aid with interpretation.

Z <- model.matrix(~id - 1, data = trial_df)
trial_df$gv.Trait1 <- c(Z %*% gv)
head(trial_df)
#>   env block col row  id phe.Trait1 gv.Trait1
#> 1   1     1   1   1 191   3.581255  3.856418
#> 2   1     1   1   2 190   5.754405  4.280926
#> 3   1     1   1   3 154   3.946641  3.940112
#> 4   1     1   1   4 193   3.924392  3.660022
#> 5   1     1   1   5 220   3.680489  4.082016
#> 6   1     1   1   6 206   4.610504  4.324960


The phenotypes are given by:

plot_effects(trial_df, effect = "phe.Trait1", labels = TRUE)


c. Measures of heritability and expected accuracy

The simulated plot-level and line-mean heritabilities can be formally quantified to aid with interpretation. The square-root of the latter represents the expected prediction accuracy for the genetic values.

A quick look at the phenotypes and true genetic values.

#>   env block col row  id phe.Trait1 gv.Trait1
#> 1   1     1   1   1 191   3.581255  3.856418
#> 2   1     1   1   2 190   5.754405  4.280926
#> 3   1     1   1   3 154   3.946641  3.940112
#> 4   1     1   1   4 193   3.924392  3.660022
#> 5   1     1   1   5 220   3.680489  4.082016
#> 6   1     1   1   6 206   4.610504  4.324960

The plot-level heritability is:

round(cor(trial_df$phe.Trait1, trial_df$gv.Trait1)^2, digits = 2) # simulated plot-level heritability
#> [1] 0.2
round(sigma2/(sigma2 + sigma2e), digits = 2) # expected plot-level heritability
#> [1] 0.3

The minor discrepancies here are a result of the sampling process.


A quick look at the average phenotypes and true genetic values.

index <- trimws(trial_df$id[trial_df$block == 1]) # obtain genotype ordering
ave_yield <- with(trial_df, tapply(phe.Trait1, id, mean))[index]

The line-mean heritability in the DH population is:

round(cor(ave_yield, trial_df$gv.Trait1[is.element(trial_df$block, 1)])^2, digits = 2) # simulated line-mean heritability
#> [1] 0.34
round(sigma2/(sigma2 + sigma2e/n_reps), digits = 2) # expected line-mean heritability
#> [1] 0.46
round(cor(ave_yield, trial_df$gv.Trait1[is.element(trial_df$block, 1)]), digits = 2) # simulated prediction accuracy
#> [1] 0.58
round(sqrt(sigma2/(sigma2 + sigma2e/n_reps)), digits = 2) # expected prediction accuracy
#> [1] 0.68