Genotype Calling


User presentations

Construction of a strawberry breeding core collection to capture and exploit genetic variation

Tim Koorevaar, Johan Willemsen, Paul Arens, Chris Maliepaard, Richard Visser. Wageningen University and Research - Plant Breeding, the Netherlands. 

As genotyping by sequencing (GBS) methods are becoming cheaper their applications become broader. For genotyping, GBS has become an alternative to SNP arrays which have certain limitations that can be overcome by GBS, such as genome coverage and ascertainment bias. Ideally, all material in a plant breeding program would be screened by using high coverage and deep sequencing. However, this is not cost-effective and probably not needed because high-quality genotypes can also be obtained by more cost-effective GBS tools with lower depth and/or less coverage which are imputed by utilizing a high-quality haplotype reference panel. A reference panel (core collection) of genotypes that represents the full width of the breeding program is essential for accurate imputation. In this study, we show a stepwise approach to obtain a representative core collection in a commercial plant breeding program that can be used as a reference panel for the utilization of cost-effective GBS methods. First, the most important crossing parents of advanced selections and specific genotypes (with specific traits) are identified and selected because they represent future genetic variation. Then, the core collection is finalized by maximizing the representativeness of the core collection compared to the current whole breeding program. Constructing representative core collections is commonly done by using genetic distances but pedigree-genomic-based relationship coefficients allow for accurate relationship estimation without the need to genotype each genotype in the breeding program. These pedigree-genomic-based relationship coefficients can identify pedigree errors, correct for missing links, and estimate relationships among founder genotypes. Consequently, this pedigree-genomic-based relationship matrix was used to complement the core collection by maximizing the representativeness of the total core collection.

Training presentations


Multiploidy support in polyRAD - presented by Lindsay V. Clark, Joyce Njuguna, Alexander E. Lipka, and Erik J. Sacks. Department of Crop Sciences, University of Illinois, Urbana-Champaign, Urbana, IL.

         polyRAD is an R package for Bayesian genotype calling from sequence read depth in diploid and polyploid organisms. It can use population structure or mapping population design to inform genotype calls and can export discrete or continuous genotypes. Although the original version of polyRAD allowed inheritance model to vary across the genome, it still required all individuals to be the same ploidy, limiting its use in staple crops such as banana and yam in which breeding populations typically consist of a mixture of ploidies. polyRAD 2.0 will support multiploidy, allowing simultaneous genotyping of individuals of different ploidies. The “possiblePloidies” slot will still be used to indicate potential inheritance modes for loci. A new slot called “taxaPloidy” contains one integer for each individual to indicate its ploidy, and acts as a multiplier for the values stored in “possiblePloidies”. Examples of how to code this information in various crops will be presented in the digital poster. We will also present Miscanthus sacchariflorus as a use case, in which introgression has occurred among diploid, triploid, and tetraploid populations. The development version of polyRAD 2.0 can be installed from GitHub.



User presentations

Reads2Map: Practical and reproducible workflows to build linkage maps from sequencing data - presented by Cris Taniguti et al.

Cristiane H. Taniguti1, Lucas M. Taniguti3, Gabriel S. Gesteira2, Thiago P. Oliveira3, Jeekin Lau1, Getulio C. Ferreira3, Rodrigo R. Amadeu3, David Byrne1, Oscar Riera-Lizarazu1, Guilherme S. Pereira2, Marcelo Mollinari2, and Augusto F. Garcia3. 1Texas A&M University, College Station, TX. 2North Carolina State University, Raleigh, NC. 3University of Sao Paulo, Sao Paulo, Brazil.

High-throughput sequencing methods produce millions of sequence reads that need to be processed by bioinformatic tools before being applied in genetics research. For each step of the procedure, such as alignment of reads, SNPs identification, and genotype calling, several tools are available, all with different methods and parameters to be selected by users. Changes in a single parameter in the pipeline can cause downstream consequences in the analysis quality. Because the genetic properties of meiotic events are well-known, it is possible to identify low-quality markers using linkage analysis. Genotyping errors lead to an overestimation of recombination events amount, inflated linkage map distances, and issues while grouping and ordering markers. Thus, good-quality genetic maps validate all upstream procedures and help to identify the best combinations of software and parameters. Here, we present the Reads2Map workflows to build linkage maps from sequencing data of experimental F1 outcrossing populations testing combinations of upstream tools. The workflows are written with Workflow Description Language (WDL) which offers a comprehensive structure and metadata for each step, making it easier for users to adapt specific parameters. WDL also allows interfacing with containers to increase reproducibility, facilitate access to diverse software, and use in high-performance computing or cloud service environments. The final workflow output is the input for the Reads2MapApp, a Shiny app, which allows interactive visualization of the produced genetic maps and selection of the best pipeline. We demonstrate Reads2Map workflows and Reads2MapApp using both simulated and empirical RADseq data.

Smooth Descent: a ploidy-agnostic algorithm to improve linkage mapping in the presence of genotyping errors, Alejandro Thérèse Navarro et al.

Alejandro Thérèse Navarro, Peter Bourke, Eric van de Weg, Paul Arens, Richard Finkers, Chris Maliepaard. Wageningen University and Research, Wagengingen, the Netherlands. 

Linkage mapping is an approach to order markers based on recombination events. Mapping algorithms cannot easily handle genotyping errors, which are common in high-throughput genotyping data. To solve this issue, strategies have been developed, aimed mostly at identifying and eliminating spurious genotypes. One such strategy is SMOOTH (van Os et al. 2005), an iterative algorithm to detect genotyping errors. Unlike other approaches, SMOOTH can also be used to impute the most probable alternative genotypes, but its application is limited to diploid species and to markers heterozygous only in one of the parents. We adapted SMOOTH to expand its use to any marker type and to autopolyploids with the use of identity-by-descent probabilities, naming the updated algorithm Smooth Descent (SD). We applied SD to real and simulated data, showing that in the presence of genotyping errors this method produces better genetic maps in terms of marker order and map length. SD is particularly useful for error rates between 5% and 20% and when error rates are not homogeneous among markers or individuals. Moreover, the simplicity of the algorithm allows hundreds of thousands of markers to be efficiently processed, thus being particularly useful for error detection in high-throughput data. We implemented SD within an R package, SmoothDescent that can perform error detection, genotype imputation, and iterative mapping for diploids and autopolyploids.

Poster presentations 

Identifying a Rose Germplasm Panel to Attain Optimal SNP Array Genotype Calling of Small Samples of Genotyped Individuals

Jeekin Lau, Cristiane H. Taniguti, David Byrne, and Oscar Riera-Lizarazu. Texas A&M University, College Station, TX.

                  Since genotyping with the Axiom WagRhSNP68K SNP array can be cost-prohibitive, we explored an approach that would permit robust genotyping of samples in one or two 96-well plates. We have observed that genotyping accuracy via SNP arrays increases as the number of individuals used for genotype calling increases. We reasoned that this increased accuracy may be due to greater sample size and allelic diversity. To test this idea, we conducted an experiment where one bi-parental mapping population of 94 individuals plus two parents were clustered alone (one plate of genotyping) and in combinations with sets of related biparental populations and unrelated germplasm with increasing numbers and various levels of genetic diversity. We then compared both marker statistics and the linkage map quality generated from genotype calls of the target mapping population using the various datasets. As the number of individuals used in clustering increased, the number of useful markers increased nominally. However, the resulting linkage maps revealed that the addition of other genotypes in the marker clustering step resulted in shorter total map length and smaller gap sizes as the number of individuals and diversity increased. The decreased map lengths and gap sizes indicate that the inclusion of other genotypes helped genotyping accuracy. The output of this study will be a core set of genotyped rose germplasm that may be used to improve genotype calling of small samples of genotyped materials.


Development of a Genotyping by Sequencing Pipeline in Tetraploid Roses (Rosa sp.)

Tessa Hochhaus, Cristiane H. Taniguti, Jeekin Lau, Patricia E. Klein, David H. Byrne, and Osar Riera-Lizarazu. Department of Horticultural Sciences, Texas A&M University, College Station, TX.

                  Roses are highly heterozygous and most commonly diploids, triploids, and tetraploids. Genotyping by sequencing (GBS) has been performed in diploid rose populations, however, it has not been done in populations with higher ploidy because of their increased complexity (autopolyploidy). This complexity is due to the greater number of genotypic classes and the difficulty in accurately calling allele dosage. GBS uses restriction enzymes to reduce genome complexity and adapter barcodes to allow the pooling of multiple samples to increase efficiency and to lower the sample cost. In this study, we are optimizing a GBS protocol for tetraploid roses using three populations (Morden Blush x George Vancouver, Stormy Weather x Brite Eyes, and Brite Eyes x My Girl). The optimization will entail varying sequencing read depth and coverage while minimizing missing data, and using in-house workflows to test various combinations of open-source software for quality control, alignment of reads, identifying SNPs, and dosage calling. Through the development of this pipeline, we hope to facilitate cost-effective genotyping in polyploid roses and the use of genomic-assisted breeding.