Biography Preface PART 1INTRODUCTION CHAPTER 1Statistical Machine Learning 1.1Types of Learning 1.2Examples of Machine Learning Tasks 1.2.1Supervised Learning 1.2.2Unsupervised Learning 1.2.3Further Topics 1.3Structure of This Textbook PART 2STATISTICS AND PROBABILITY CHAPTER 2Random Variables and Probability Distributions 2.1Mathematical Preliminaries 2.2Probability 2.3Random Variable and Probability Distribution 2.4Properties of Probability Distributions 2.4.1Expectation, Median, and Mode 2.4.2Variance and Standard Deviation 2.4.3Skewness, Kurtosis, and Moments 2.5Transformation of Random Variables CHAPTER 3Examples of Discrete Probability Distributions 3.1Discrete Uniform Distribution 3.2Binomial Distribution 3.3Hypergeometric Distribution 3.4Poisson Distribution 3.5Negative Binomial Distribution 3.6Geometric Distribution CHAPTER 4Examples of Continuous Probability Distributions 4.1Continuous Uniform Distribution 4.2Normal Distribution 4.3Gamma Distribution, Exponential Distribution, and Chi—Squared Distribution 4.4Beta Distribution 4.5Cauchy Distribution and Laplace Distribution 4.6t—Distribution and F—Distribution CHAPTER 5Multidimensional Probability Distributions 5.1Joint Probability Distribution 5.2Conditional Probability Distribution 5.3Contingency Table 5.4Bayes’ Theorem 5.5Covariance and Correlation 5.6Independence CHAPTER 6Examples of Multidimensional Probability Distributions 6.1Multinomial Distribution 6.2Multivariate Normal Distribution 6.3Dirichlet Distribution 6.4Wishart Distribution CHAPTER 7Sum of Independent Random Variables 7.1Convolution 7.2Reproductive Property 7.3Law of Large Numbers 7.4Central Limit Theorem CHAPTER 8Probability Inequalities 8.1Union Bound 8.2Inequalities for Probabilities 8.2.1Markov’s Inequality and Chernoff’s Inequality 8.2.2Cantelli’s Inequality and Chebyshev’s Inequality 8.3Inequalities for Expectation 8.3.1Jensen’s Inequality 8.3.2H?lder’s Inequality and Schwarz’s Inequality 8.3.3Minkowski’s Inequality 8.3.4Kantorovich’s Inequality 8.4Inequalities for the Sum of Independent Random Vari—ables 8.4.1Chebyshev’s Inequality and Chernoff’s Inequality 8.4.2Hoeffding’s Inequality and Bernstein’s Inequality 8.4.3Bennett’s Inequality CHAPTER 9Statistical Estimation 9.1Fundamentals of Statistical Estimation 9.2Point Estimation 9.2.1Parametric Density Estimation 9.2.2Nonparametric Density Estimation 9.2.3Regression and Classification 9.2.4Model Selection 9.3Interval Estimation 9.3.1Interval Estimation for Expectation of Normal Samples 9.3.2Bootstrap Confidence Interval 9.3.3Bayesian Credible Interval CHAPTER 10Hypothesis Testing 10.1Fundamentals of Hypothesis Testing 10.2Test for Expectation of Normal Samples 10.3Neyman—Pearson Lemma 10.4Test for Contingency Tables 10.5Test for Difference in Expectations of Normal Samples 10.5.1Two Samples without Correspondence 10.5.2Two Samples with Correspondence 10.6Nonparametric Test for Ranks 10.6.1Two Samples without Correspondence 10.6.2Two Samples with Correspondence 10.7Monte Carlo Test PART 3GENERATIVE APPROACH TO STATISTICAL PATTERN RECOGNITION CHAPTER 11Pattern Recognition via Generative Model Estimation 11.1Formulation of Pattern Recognition 11.2Statistical Pattern Recognition 11.3Criteria for Classifier Training 11.3.1MAP Rule 11.3.2Minimum Misclassification Rate Rule 11.3.3Bayes Decision Rule 11.3.4Discussion 11.4Generative and Discriminative Approaches CHAPTER 12Maximum Likelihood Estimation 12.1Definition 12.2Gaussian Model 12.3Computing the Class—Posterior Probability 12.4Fisher’s Linear Discriminant Analysis (FDA) 12.5Hand—Written Digit Recognition 12.5.1Preparation 12.5.2Implementing Linear Discriminant Analysis 12.5.3Multiclass Classification CHAPTER 13Properties of Maximum Likelihood Estimation 13.1Consistency 13.2Asymptotic Unbiasedness 13.3Asymptotic Efficiency 13.3.1One—Dimensional Case 13.3.2Multidimensional Cases 13.4Asymptotic Normality 13.5Summary CHAPTER 14Model Selection for Maximum Likelihood Estimation 14.1Model Selection 14.2KL Divergence 14.3AIC 14.4Cross Validation 14.5Discussion CHAPTER 15Maximum Likelihood Estimation for Gaussian Mixture Model 15.1Gaussian Mixture Model 15.2MLE 15.3Gradient Ascent Algorithm 15.4EM Algorithm CHAPTER 16Nonparametric Estimation 16.1Histogram Method 16.2Problem Formulation 16.3KDE 16.3.1Parzen Window Method 16.3.2Smoothing with Kernels 16.3.3Bandwidth Selection 16.4NNDE 16.4.1Nearest Neighbor Distance 16.4.2Nearest Neighbor Classifier CHAPTER 17Bayesian Inference 17.1Bayesian Predictive Distribution 17.1.1Definition 17.1.2Comparison with MLE 17.1.3Computational Issues 17.2Conjugate Prior 17.3MAP Estimation 17.4Bayesian Model Selection CHAPTER 18Analytic Approximation of Marginal Likelihood 18.1Laplace Approximation 18.1.1Approximation with Gaussian Density 18.1.2Illustration 18.1.3Application to Marginal Likelihood Approximation 18.1.4Bayesian Information Criterion (BIC) 18.2Variational Approximation 18.2.1Variational Bayesian EM (VBEM) Algorithm 18.2.2Relation to Ordinary EM Algorithm CHAPTER 19Numerical Approximation of Predictive Distribution 19.1Monte Carlo Integration 19.2Importance Sampling 19.3Sampling Algorithms 19.3.1Inverse Transform Sampling 19.3.2Rejection Sampling 19.3.3Markov Chain Monte Carlo (MCMC) Method CHAPTER 20Bayesian Mixture Models 20.1Gaussian Mixture Models 20.1.1Bayesian Formulation 20.1.2Variational Inference 20.1.3Gibbs Sampling 20.2Latent Dirichlet Allocation (LDA) 20.2.1Topic Models 20.2.2Bayesian Formulation 20.2.3Gibbs Sampling …… PART 4DISCRIMINATIVE APPROACH TO STATISTICAL MACHINE LEARNING PART 5FURTHER TOPICS References Index