Foreword Preface Part Ⅰ.Hadoop Fundamentals 1.MeetHadoop Data! Data Storage and Analysis Querying All Your Data Beyond Batch Comparison with Other Systems Relational Database Management Systems Grid Computing Volunteer Computing A Brief History of Apache Hadoop What's in This Book? 2.MapReduce A Weather Dataset Data Format Analyzing the Data with Unix Tools Analyzing the Data with Hadoop Map and Reduce Java MapReduce Scaling Out Data Flow Combiner Functions Running a Distributed MapReduce Job Hadoop Streaming Ruby Python 3.The Hadoop Distributed Filesystem The Design of HDFS HDFS Concepts Blocks Namenodes and Datanodes Block Caching HDFS Federation HDFS High Availability The Command—Line Interface Basic Filesystem Operations Hadoop Filesystems Interfaces The Java Interface Reading Data from a Hadoop URL Reading Data Using the FileSystem API Writing Data Directories Querying the Filesystem Deleting Data Data Flow Anatomy of a File Read Anatomy of a File Write Coherency Model Parallel Copying with distcp Keeping an HDFS Cluster Balanced 4.YARN Anatomy of a YARN Application Run Resource Requests Application Lifespan Building YARN Applications YARN Compared to MapReduce 1 Scheduling in YARN Scheduler Options Capacity Scheduler Configuration Fair Scheduler Configuration Delay Scheduling Dominant Resource Fairness Further Reading 5.Hadoop I/O Data Integrity Data Integrity in HDFS LocaIFileSystem ChecksumFileSystem Compression Codecs Compression and Input Splits Using Compression in MapReduce Serialization The Writable Interface Writable Classes Implementing a Custom Writable Serialization Frameworks File—Based Data Structures SequenceFile MapFile Other File Formats and Column—Oriented Formats Part Ⅱ.MapReduce 6.Developing a MapReduce Application The Conflguration API Combining Resources Variable Expansion Setting Up the Development Environment Managing Configuration GenericOptionsParser, Tool, and ToolRunner Writing a Unit Test with MRUnit Mapper Reducer Running Locally on Test Data Running a Job in a Local Job Runner Testing the Driver Running on a Cluster Packaging a Job Launching a Job The MapReduce Web UI Retrieving the Results Debugging a Job Hadoop Logs Remote Debugging Tuning a Job Profiling Tasks MapReduce Workflows Decomposing a Problem into MapReduce Jobs IobControl Apache Oozie 7.How MapReduce Works Anatomy ofa MapReduce Job Run Job Submission Job Initialization Task Assignmenl Task Execution Progress and Status Updates Job Completion Failures Task Failure Application Master Failure Node Manager Failure Resource Manager Failure Shuffle and Sort The Map Side The Reduce Side Configuration Tuning Task Execution The Task Execution Environment Speculative Execution Output Committers 8.MapReduce Typesand Formats MapReduce Types The Default MapReduce Job Input Formats Input Splits and Records Text Input Binary Input Multiple Inputs Database Input (and Output) Output Formats Text Output Binary Output Multiple Outputs Lazy Output Database Output …… 9.MapReduce Features Part Ⅲ.Hadoop Operations 10.Setting Up a Hadoop Cluster 11.Administering Hadoop Part Ⅳ.RelatedProjects 12.Avro 13.Parquet 14.Flume 15.Sqoop 16.Pig 17.Hive 18.Crunch 19.Spark 20.HBase 21.ZooKeeper Part Ⅴ.Case Studies 22.Composable Data at Cerner 23.Biological Data Saence: Saving Lives with Software 24.Cascading A.Installing Apache Hadoop B.Cloudera's Distribution Including Apache Hadoop C.Preparing the NCDC Weather Data D.The Old and New Java MapReduce APls Index