Batch Processing with Unix Tools
Simple Log Analysis
The Unix Philosophy
MapReduce and Distributed Filesystems
MapReduce Job Execution
Reduce-Side Joins and Grouping
Map-Side Joins
The Output of Batch Workflows
Comparing Hadoop to Distributed Databases
Beyond MapReduce
Materialization of Intermediate State
Graphs and Iterative Processing
High-Level APIs and Languages
Summary
11. Stream Processing
Transmitting Event Streams
Messaging Systems
Partitioned Logs
Databases and Streams
Keeping Systems in Sync
Change Data Capture
Event Sourcing
State, Streams, and Immutability
Processing Streams
Uses of Stream Processing
Reasoning About Time
Stream Joins
Fault Tolerance
Summary
12. The Future of Data Systems
Data Integration
Combining Specialized Tools by Deriving Data
Batch and Stream Processing
Unbundling Databases
Composing Data Storage Technologies
Designing Applications Around Dataflow
Observing Derived State
Aiming for Correctness
The End-to-End Argument for Databases
Enforcing Constraints
Timeliness and Integrity
Trust, but Verify
Doing the Right Thing
Predictive Analytics
Privacy and Tracking
Summary
Glossary
Index