Foreword
Preface
1. Introduction
The Promises and Challenges of Big Data in Biology and Life Sciences
Infrastructure Challenges
Toward a Cloud-Based Ecosystem for Data Sharing and Analysis
Cloud-Hosted Data and Compute
Platforms for Research in the Life Sciences
Standardization and Reuse of Infrastructure
Being FAIR
Wrap-Up and Next Steps
2. Genomics in a Nutshell: A Primer for Newcomers to the Field
Introduction to Genomics
The Gene as a Discrete Unit of Inheritance (Sort Of)
The Central Dogma of Biology: DNA to RNA to Protein
The Origins and Consequences of DNA Mutations
Genomics as an Inventory of Variation in and Among Genomes
The Challenge of Genomic Scale, by the Numbers
Genomic Variation
The Reference Genome as Common Framework
Physical Classification of Variants
Germline Variants Versus Somatic Alterations
High-Throughput Sequencing Data Generation
From Biological Sample to Huge Pile of Read Data
Types of DNA Libraries: Choosing the Right Experimental Design
Data Processing and Analysis
Mapping Reads to the Reference Genome
Variant Calling
Data Quality and Sources of Error
Functional Equivalence Pipeline Specification
Wrap-Up and Next Steps
3. Computing Technology Basics for Life Scientists
Basic Infrastructure Components and Performance Bottlenecks
Types of Processor Hardware: CPU, GPU, TPU, FPGA, OMG
Levels of Compute Organization: Core, Node, Cluster, and Cloud
Addressing Performance Bottlenecks
Parallel Computing
Parallelizing a Simple Analysis
From Cores to Clusters and Clouds: Many Levels of Parallelism
Trade-Offs of Parallelism: Speed, Efficiency, and Cost
Pipelining for ParaUelization and Automation
Workflow Languages
Popular Pipelining Languages for Genomics
Workflow Management Systems
Virtualization and the CIoud
VMs and Containers
Introducing the Cloud
Categories of Research Use Cases for Cloud Services
Wrap-Up and Next Steps
4. First Steps in the Cloud
Setting Up Your Google Cloud Account and First Project
Creating a Project
Checking Your Billing Account and Activating Free Credits
Running Basic Commands in Google Cloud Shell
Logging in to the Cloud Shell VM
Using gsutil to Access and Manage Files
Pulling a Docker Image and Spinning Up the Container
Mounting a Volume to Access the Filesystem from Within the Container
Setting Up Your Own Custom VM
Creating and Configuring Your VM Instance
Logging into Your VM by Using SSH
Checking Your Authentication
Copying the Book Materials to Your VM
Installing Docker on Your VM
Setting Up the GATK Container Image
……
6. GATK Best Practices for Germline Short Variant Discovery
7. GATK Best Practices for Somatic Variant Discovery
8. Automatina Analysis Execution with Workflows
9. Deciphering Real Genomics Workflows
10. Running Single Workflows at Scale with Pipelines API
11. Running Many Workflows Conveniently in Terra
12. Interactive Analysis in Jupyter Notebook
13. Assembling Your Own Workspace in Terra
14. Making a Fully Reproducible Paper
Glossary
Index