關(guān)于我們
書單推薦
新書推薦
|
機器學(xué)習(xí):貝葉斯和優(yōu)化方法(英文版·原書第2版)
本書通過講解監(jiān)督學(xué)習(xí)的兩大支柱——回歸和分類——將機器學(xué)習(xí)納入統(tǒng)一視角展開討論。書中首先討論基礎(chǔ)知識,包括均方、*小二乘和*大似然方法、嶺回歸、貝葉斯決策理論分類、邏輯回歸和決策樹。然后介紹新近的技術(shù),包括稀疏建模方法,再生核希爾伯特空間中的學(xué)習(xí)、支持向量機中的學(xué)習(xí)、關(guān)注EM算法的貝葉斯推理及其近似推理變分版本、蒙特卡羅方法、聚焦于貝葉斯網(wǎng)絡(luò)的概率圖模型、隱馬爾科夫模型和粒子濾波。此外,本書還深入討論了降維和隱藏變量建模。全書以關(guān)于神經(jīng)網(wǎng)絡(luò)和深度學(xué)習(xí)架構(gòu)的擴展章節(jié)結(jié)束。此外,書中還討論了統(tǒng)計參數(shù)估計、維納和卡爾曼濾波、凸性和凸優(yōu)化的基礎(chǔ)知識,其中,用一章介紹了隨機逼近和梯度下降族的算法,并提出了分布式優(yōu)化的相關(guān)概念、算法和在線學(xué)習(xí)技術(shù)。
Preface...................................................................iv
Acknowledgments.........................................................vi About the Author...................................................................viii Notation...................................................................ix CHAPTER1 Introduction................................................1 1.1 The Historical Context...........................................1 1.2 Artificia Intelligenceand Machine Learning..........................2 1.3 Algorithms Can Learn WhatIs Hidden in the Data......................4 1.4 Typical Applications of Machine Learning............................6 Speech Recognition......................................6 Computer Vision........................................6 Multimodal Data........................................6 Natural Language Processing...............................7 Robotics..............................................7 Autonomous Cars.......................................7 Challenges for the Future..................................8 1.5 Machine Learning: Major Directions................................8 1.5.1 Supervised Learning.....................................8 1.6 Unsupervised and Semisupervised Learning...........................11 1.7 Structure and a Road Map of the Book...............................12 References....................................................16 CHAPTER2 Probability and Stochastic Processes.............................19 2.1 Introduction...................................................20 2.2 Probability and Random Variables..................................20 2.2.1 Probability.............................................20 2.2.2 Discrete Random Variables................................22 2.2.3 Continuous Random Variables..............................24 2.2.4 Meanand Variance.......................................25 2.2.5 Transformation of Random Variables.........................28 2.3 Examples of Distributions........................................29 2.3.1 Discrete Variables.......................................29 2.3.2 Continuous Variables.....................................32 2.4 Stochastic Processes............................................41 2.4.1 First-and Second-Order Statistics...........................42 2.4.2 Stationarity and Ergodicity.................................43 2.4.3 Power Spectral Density...................................46 2.4.4 Autoregressive Models....................................51 2.5 Information Theory.............................................54 2.5.1 Discrete Random Variables................................56 2.5.2 Continuous Random Variables..............................59 2.6 Stochastic Convergence..........................................61 Convergence Everywhere..................................62 Convergence Almost Everywhere............................62 Convergence in the Mean-Square Sense.......................62 Convergence in Probability................................63 Convergence in Distribution................................63 Problems.....................................................63 References....................................................65 CHAPTER3 Learning in Parametric Modeling: Basic Concepts and Directions.........67 3.1 Introduction...................................................67 3.2 Parameter Estimation: the Deterministic Point of View...................68 3.3 Linear Regression..............................................71 3.4Classifcation..................................................75 Generative Versus Discriminative Learning....................78 3.5 Biased Versus Unbiased Estimation.................................80 3.5.1 Biased or Unbiased Estimation?.............................81 3.6 The Cram閞朢ao Lower Bound....................................83 3.7 Suffcient Statistic..............................................87 3.8 Regularization.................................................89 Inverse Problems:Ill-Conditioning and Overfittin...............91 3.9 The Bias朧ariance Dilemma......................................93 3.9.1 Mean-Square Error Estimation..............................94 3.9.2 Bias朧ariance Tradeoff...................................95 3.10 Maximum Likelihood Method.....................................98 3.10.1 Linear Regression: the Nonwhite Gaussian Noise Case............101 3.11 Bayesian Inference.............................................102 3.11.1 The Maximum a Posteriori Probability Estimation Method.........107 3.12 Curse of Dimensionality.........................................108 3.13 Validation....................................................109 Cross-Validation........................................111 3.14 Expected Loss and Empirical Risk Functions..........................112 Learnability............................................113 3.15 Nonparametric Modeling and Estimation.............................114 Problems.....................................................114 MATLAB?Exercises....................................119 References....................................................119 CHAPTER4 Mean-Square Error Linear Estimation.............................121 4.1 Introduction...................................................121 4.2 Mean-Square Error Linear Estimation: the Normal Equations..............122 4.2.1 The Cost Function Surface.................................123 4.3 A Geometric Viewpoint: Orthogonality Condition......................124 4.4 Extension to Complex-Valued Variables..............................127 4.4.1 Widely Linear Complex-Valued Estimation....................129 4.4.2 Optimizing With Respect to Complex-Valued Variables: Wirtinger Calculus...........................132 4.5 Linear Filtering................................................134 4.6 MSE Linear Filtering: a Frequency Domain Point of View................136 Deconvolution: Image Deblurring............................137 4.7 Some Typical Applications.......................................140 4.7.1 Interference Cancelation..................................140 4.7.2 System Identifcation.....................................141 4.7.3 Deconvolution: Channel Equalization.........................143 4.8 Algorithmic Aspects: the Levinson and Lattice-Ladder Algorithms.........149 Forward and Backward MSE Optimal Predictors................151 4.8.1 The Lattice-Ladder Scheme................................154 4.9 Mean-Square Error Estimation of Linear Models.......................158 4.9.1 The Gauss朚arkov Theorem...............................160 4.9.2 Constrained Linear Estimation: the Beamforming Case...........162 4.10 Time-Varying Statistics: Kalman Filtering............................166 Problems.....................................................172 MATLAB Exercises....................................174 References....................................................176 CHAPTER5 Online Learning: the Stochastic Gradient Descent Family of Algorithms.....179 5.1 Introduction...................................................180 5.2 The Steepest Descent Method.....................................181 5.3 Application to the Mean-Square Error Cost Function....................184 Time-Varying Step Sizes..................................190 5.3.1 The Complex-Valued Case.................................193 5.4 Stochastic Approximation........................................194 Application to the MSE Linear Estimation.....................196 5.5 The Least-Mean-Squares Adaptive Algorithm.........................198 5.5.1 Convergence and Steady-State Performance of the LMS in Stationary Environments...........................................199 5.5.2 Cumulative Loss Bounds..................................204 5.6 The Affne Projection Algorithm...................................206 Geometric Interpretation of APA............................208 Orthogonal Projections....................................208 5.6.1 The Normalized LMS....................................211 5.7 The Complex-Valued Case........................................213 The Widely Linear LMS..................................213 The Widely Linear APA...................................214 5.8 Relatives of the LMS............................................214 The Sign-Error LMS.....................................214 The Least-Mean-Fourth (LMF) Algorithm.....................215 Transform-Domain LMS..................................215 5.9 Simulation Examples............................................218 5.10 Adaptive Decision Feedback Equalization............................221 5.11 The Linearly Constrained LMS....................................224 5.12 Tracking Performance of the LMS in Nonstationary Environments..........225 5.13 Distributed Learning: the Distributed LMS............................227 5.13.1 Cooperation Strategies....................................228 5.13.2 The Diffusion LMS......................................231 5.13.3 Convergence and Steady-State Performance: Some Highlights......237 5.13.4 Consensus-Based Distributed Schemes........................240 5.14 A Case Study: Target Localization..................................241 5.15 Some Concluding Remarks: Consensus Matrix........................243 Problems.....................................................244 MATLAB?Exercises....................................246 References....................................................247 CHAPTER6 The Least-Squares Family......................................253 6.1 Introduction...................................................253 6.2 Least-Squares Linear Regression: a Geometric Perspective................254 6.3 Statistical Properties of the LS Estimator.............................257 The LS Estimator Is Unbiased..............................257 Covariance Matrix of the LS Estimator........................257 The LS Estimator Is BLUE in the Presence of White Noise........258 The LS Estimator Achieves the Cram閞朢ao Bound for White Gaussian Noise.........................................259 Asymptotic Distribution of the LS Estimator...................260 6.4 Orthogonalizing the Column Space of the Input Matrix: the SVD Method....260 Pseudoinverse Matrix and SVD.............................262 6.5 Ridge Regression: a Geometric Point of View.........................265 Principal Components Regression...........................267 6.6 The Recursive Least-Squares Algorithm.............................268 Time-Iterative Computations...............................269 Time Updating of the Parameters............................270 6.7 Newton抯 Iterative Minimization Method.............................271 6.7.1 RLS and Newton抯 Method................................274 6.8 Steady-State Performance of the RLS...............................275 6.9 Complex-Valued Data: the Widely Linear RLS........................277 6.10 Computational Aspects of the LS Solution............................279 Cholesky Factorization....................................279 QR Factorization........................................279 Fast RLS Versions.......................................280 6.11 The Coordinate and Cyclic Coordinate Descent Methods.................281 6.12 Simulation Examples............................................283 6.13 Total Least-Squares.............................................286 Geometric Interpretation of the Total Least-Squares Method........291 Problems.....................................................293 MATLAB瓻xercises....................................296 References....................................................297 CHAPTER7 Classificationa Tour of the Classics..............................301 7.1 Introduction...................................................301 7.2 Bayesian Classificatio..........................................302 The Bayesian Classifie Minimizes the Misclassificatio Error......303 7.2.1 Average Risk...........................................304 7.3 Decision (Hyper) Surfaces........................................307 7.3.1 The Gaussian Distribution Case.............................309 7.4 The Naive Bayes Classifie.......................................315 7.5 The Nearest Neighbor Rule.......................................315 7.6 Logistic Regression.............................................317 7.7 Fisher抯 Linear Discriminant......................................322 7.7.1 Scatter Matrices.........................................323 7.7.2 Fisher抯 Discriminant: the Two-Class Case.....................325 7.7.3 Fisher抯 Discriminant: the Multiclass Case.....................328 7.8 Classifcation Trees.............................................329 7.9 Combining Classifers...........................................333 No Free Lunch Theorem..................................334 Some Experimental Comparisons............................334 Schemes for Combining Classifier..........................335 7.10 The Boosting Approach..........................................337 The Ada Boost Algorithm..................................337 The Log-Loss Function...................................341 7.11 Boosting Trees.................................................343 Problems.....................................................345 MATLAB瓻xercises....................................347 References....................................................349 CHAPTER8 Parameter Learning: a Convex Analytic Path........................351 8.1 Introduction...................................................352 8.2 Convex Sets and Functions.......................................352 8.2.1 Convex Sets............................................353 8.2.2 Convex Functions.......................................354 8.3 Projections Onto Convex Sets.....................................357 8.3.1 Properties of Projections..................................361 8.4 Fundamental The orem of Projections Onto Convex Sets..................365 8.5 A Parallel Version of POCS.......................................369 8.6 From Convex Sets to Parameter Estimation and Machine Learning..........369 8.6.1 Regression.............................................369 8.6.2 Classifcation...........................................373 8.7 Infintely Many Closed Convex Sets: the Online Learning Case............374 8.7.1 Convergence of APSM....................................376 8.8 Constrained Learning............................................380 8.9 The Distributed APSM..........................................382 8.10 Optimizing Nonsmooth Convex Cost Functions........................384 8.10.1 Subgradients and Subdifferentials............................385 8.10.2 Minimizing Nonsmooth Continuous Convex Loss Functions: the Batch Learning Case..........................................388 8.10.3 Online Learning for Convex Optimization.....................393 8.11 Regret Analysis................................................396 Regret Analysis of the Subgradient Algorithm..................398 8.12 Online Learning and Big Data Applications: a Discussion................399 Approximation, Estimation, and Optimization Errors.............400 Batch Versus Online Learning..............................402 8.13 Proximal Operators.............................................405 8.13.1 Properties of the Proximal Operator..........................407 8.13.2 Proximal Minimization...................................409 8.14 Proximal Splitting Methods for Optimization..........................412 The Proximal Forward-Backward Splitting Operator.............413 Alternating Direction Method of Multipliers (ADMM)............414 Mirror Descent Algorithms................................415 8.15 Distributed Optimization: Some Highlights...........................417 Problems.....................................................417 MATLAB?Exercises....................................420 References....................................................422 CHAPTER9 Sparsity-Aware Learning: Concepts and Theoretical Foundations.........427 9.1 Introduction...................................................427 9.2 Searching for a Norm............................................428 9.3 The Least Absolute Shrinkage and Selection Operator (LASSO)...........431 9.4 Sparse Signal Representation......................................436 9.5 In Search of the Sparsest Solution..................................440 The? Norm Minimizer...................................441 The? Norm Minimizer...................................442 The? Norm Minimizer...................................442 Characterization of the? Norm Minimizer....................443 Geometric Interpretation..................................444 9.6 Uniqueness of the? Minimizer....................................447 9.6.1 Mutua lCoherence.......................................449 9.7 Equivalence of? and? Minimizers: Sufficency Conditions..............451 9.7.1 Condition Implied by the Mutual Coherence Number.............451 9.7.2 The Restricted Isometry Property (RIP).......................452 9.8 Robust Sparse Signal Recovery From Noisy Measurements...............455 9.9 Compressed Sensing: the Glory of Randomness........................456 Compressed Sensing.....................................456 9.9.1 Dimensionality Reduction and Stable Embeddings...............458 9.9.2 Sub-Nyquist Sampling: Analog-to-Information Conversion........460 9.10 A Case Study: Image Denoising....................................463 Problems.....................................................465 MATLAB瓻xercises....................................468 References....................................................469 CHAPTER10 Sparsity-Aware Learning: Algorithms and Applications.................473 10.1 Introduction...................................................473 10.2 Sparsity Promoting Algorithms....................................474 10.2.1 Greedy Algorithms......................................474 10.2.2 Iterative Shrinkage/Thresholding (IST) Algorithms..............480 10.2.3 Which Algorithm? Some Practical Hints......................487 10.3 Variations on the Sparsity-Aware Theme.............................492 10.4 Online Sparsity Promoting Algorithms...............................499 10.4.1 LASSO: Asymptotic Performance...........................500 10.4.2 The Adaptive Norm-Weighted LASSO........................502 10.4.3 Adaptive CoSa MPAlgorithm...............................504 10.4.4 Sparse-Adaptive Projection Subgradient Method................505 10.5 Learning Sparse Analysis Models..................................510 10.5.1 Compressed Sensing for Sparse Signal Representationin Coherent Dictionaries...................................512 10.5.2 Cosparsity.............................................513 10.6 A Case Study: Time-Frequency Analysis.............................516 Gabor Transform and Frames...............................516 Time-Frequency Resolution................................517 Gabor Frames..........................................518 Time-Frequency Analysis of Echolocation Signals Emitted by Bats..519 Problems.....................................................523 MATLAB瓻xercises....................................524 References....................................................525 CHAPTER11 Learningin Reproducing Kernel Hilbert Spaces......................531 11.1 Introduction...................................................532 11.2 Generalized Linear Models.......................................532 11.3 Volterra, Wiener, and Hammerstein Models...........................533 11.4 Cover抯 Theorem: Capacity of a Spacein Linear Dichotomies.............536 11.5 Reproducing Kernel Hilbert Spaces.................................539 11.5.1 Some Properties and Theoretical Highlights....................541 11.5.2 Examples of Kernel Functions..............................543 11.6 Representer Theorem............................................548 11.6.1 Semiparametric Representer Theorem........................550 11.6.2 Nonparametric Modeling: a Discussion.......................551 11.7 Kernel Ridge Regression.........................................551 11.8 Support Vector Regression........................................554 11.8.1 The Linear?Insensitive Optimal Regression...................555 11.9 Kernel Ridge Regression Revisited.................................561 11.10 Optimal Margin Classification Support Vector Machines.................562 11.10.1 Linearly Separable Classes: Maximum Margin Classifier.........564 11.10.2 Nonseparable Classes.....................................569 11.10.3 Performance of SVMs and Applications.......................574 11.10.4 Choice of Hyperparameters................................574 11.10.5 Multiclass Generalizations.................................575 11.11 Computational Considerations.....................................576 11.12 Random Fourier Features.........................................577 11.12.1 Online and Distributed Learningin RKHS.....................579 11.13 Multiple Kernel Learning.........................................580 11.14 Nonparametric Sparsity-Aware Learning: Additive Models...............582 11.15 A Case Study: Authorship Identificatio.............................584 Problems.....................................................587 MATLAB瓻xercises....................................589 References....................................................590 CHAPTER12 Bayesian Learning: Inference and the EM Algorithm...................595 12.1 Introduction...................................................595 12.2 Regression: a Bayesian Perspective.................................596 12.2.1 The Maximum Likelihood Estimator.........................597 12.2.2 The MAP Estimator......................................598 12.2.3 The Bayesian Approach...................................599 12.3 The Evidence Function and Occam抯 Razor Rule.......................605 Laplacian Approximation and the Evidence Function.............607 12.4 Latent Variables and the EM Algorithm..............................611 12.4.1 The Expectation-Maximization Algorithm.....................611 12.5 Linear Regression and the EM Algorithm.............................613 12.6 Gaussian Mixture Models........................................616 12.6.1 Gaussian Mixture Modeling and Clustering....................620 12.7 The EM Algorithm: a Lower Bound Maximization View.................623 12.8 Exponential Family of Probability Distributions........................627 12.8.1 The Exponential Family and the Maximum Entropy Method.......633 12.9 Combining Learning Models: a Probabilistic Pointof View...............634 12.9.1 Mixing Linear Regression Models...........................634 12.9.2 Mixing Logistic Regression Models..........................639 Problems.....................................................641 MATLAB瓻xercises....................................643 References....................................................645 CHAPTER13 Bayesian Learning: Approximate Inferenceand Nonparametric Models.....647 13.1 Introduction...................................................648 13.2 Variational Approximationin Bayesian Learning.......................648 The Mean Field Approximation.............................649 13.2.1 The Case of the Exponential Family of Probability Distributions.....653 13.3 A Variational Bayesian Approachto Linear Regression..................655 Computation of the Lower Bound............................660 13.4 A Variational Bayesian Approach to Gaussian Mixture Modeling...........661 13.5 When Bayesian Inference Meets Sparsity.............................665 13.6 Sparse Bayesian Learning(SBL)...................................667 13.6.1 The Spike and Slab Method................................671 13.7 The Relevance Vector Machine Framework...........................672 13.7.1 Adopting the Logistic Regression Model for Classificatio.........672 13.8 Convex Duality and Variational Bounds..............................676 13.9 Sparsity-Aware Regression: a Variational Bound Bayesian Path............681 Sparsity-Aware Learning: Some Concluding Remarks............686 13.10 Expectation Propagation.........................................686 Minimizing the KL Divergence.............................688 The Expectation Propagation Algorithm.......................688 13.11 Nonparametric Bayesian Modeling.................................690 13.11.1 The Chinese Restaurant Process.............................691 13.11.2 Dirichlet Processes.......................................692 13.11.3 The Stick Breaking Construction of a DP......................697 13.11.4 Dirichlet Process Mixture Modeling..........................698 Inference..............................................699 13.11.5 The Indian Buffet Process.................................701 13.12 Gaussian Processes.............................................710 13.12.1 Covariance Functions and Kernels...........................711 13.12.2 Regression.............................................712 13.12.3 Classifcation...........................................716 13.13 A Case Study: Hyperspectral Image Unmixing.........................717 13.13.1 Hierarchical Bayesian Modeling.............................719 13.13.2 Experimental Results.....................................720 Problems.....................................................721 MATLAB瓻xercises....................................726 References....................................................727 CHAPTER14 Monte Carlo Methods.........................................731 14.1 Introduction...................................................731 14.2 Monte Carlo Methods: the Main Concept.............................732 14.2.1 Random Number Generation...............................733 14.3 Random Sampling Based on Function Transformation...................735 14.4 Rejection Sampling.............................................739 14.5 Importance Sampling............................................743 14.6 Monte Carlo Methods and the EM Algorithm..........................745 14.7 Markov Chain Monte Carlo Methods................................745 14.7.1 Ergodic Markov Chains...................................748 14.8 The Metropolis Method..........................................754 14.8.1 Convergence Issues......................................756 14.9 Gibbs Sampling................................................758 14.10 In Search of More Efficien Methods: a Discussion.....................760 Variational Inferenceor Monte Carlo Methods..................762 14.11 A Case Study: Change-Point Detection..............................762 Problems.....................................................765 MATLAB瓻xercise.....................................767 References....................................................768 CHAPTER15 Probabilistic Graphical Models: PartI.............................771 15.1 Introduction...................................................771 15.2 The Need for Graphical Models....................................772 15.3 Bayesian Networks and the Markov Condition.........................774 15.3.1 Graphs: Basic Definition..................................775 15.3.2 Some Hintson Causality..................................779 15.3.3 d-Separation...........................................781 15.3.4 Sigmoidal Bayesian Networks..............................785 15.3.5 Linear Gaussian Models...................................786 15.3.6 Multiple-Cause Networks..................................786 15.3.7 I-Maps, Soundness, Faithfulness, and Completeness..............787 15.4 Undirected Graphical Models.....................................788 15.4.1 Independencies and I-Mapsin Markov Random Fields............790 15.4.2 The Ising Model and Its Variants............................791 15.4.3 Conditional Random Fields (CRFs)..........................794 15.5 Factor Graphs.................................................795 15.5.1 Graphical Models for Error Correcting Codes...................797 15.6 Moralization of Directed Graphs...................................798 15.7 Exact Inference Methods: Message Passing Algorithms..................799 15.7.1 Exact Inferencein Chains..................................799 15.7.2 Exact Inferencein Trees...................................803 15.7.3 The Sum-Product Algorithm...............................804 15.7.4 The Max-Product and Max-Sum Algorithms...................809 Problems.....................................................816 References....................................................818 CHAPTER16 Probabilistic Graphical Models: PartII............................821 16.1 Introduction...................................................821 16.2 Triangulated Graphs and Junction Trees..............................822 16.2.1 Constructinga Join Tree...................................825 16.2.2 Message Passing in Junction Trees...........................827 16.3 Approximate Inference Methods...................................830 16.3.1 Variational Methods: Local Approximation....................831 16.3.2 Block Methods for Variational Approximation..................835 16.3.3 Loopy Belief Propagation..................................839 16.4 Dynamic Graphical Models.......................................842 16.5 Hidden Markov Models..........................................844 16.5.1 Inference..............................................847 16.5.2 Learning the Parametersin an HMM.........................852 16.5.3 Discriminative Learning...................................855 16.6 Beyond HMMs: a Discussion......................................856 16.6.1FactorialHiddenMarkovModels............................856 16.6.2 Time-Varying Dynamic Bayesian Networks....................859 16.7 Learning Graphical Models.......................................859 16.7.1 Parameter Estimation.....................................860 16.7.2 Learning the Structure....................................864 Problems.....................................................864 References....................................................867 CHAPTER17ParticleFiltering............................................871 17.1 Introduction...................................................871 17.2 Sequential Importance Sampling...................................871 17.2.1 Importance Sampling Revisited.............................872 17.2.2 Resampling............................................873 17.2.3 Sequential Sampling.....................................875 17.3 Kalman and Particle Filtering......................................878 17.3.1 Kalman Filtering:a Bayesian Point of View....................878 17.4 Particle Filtering...............................................881 17.4.1 Degeneracy............................................885 17.4.2 Generic Particle Filtering..................................886 17.4.3 Auxiliary Particle Filtering.................................889 Problems.....................................................895 MATLAB瓻xercises....................................898 References....................................................899 CHAPTER18 Neural Networks and Deep Learning..............................901 18.1 Introduction...................................................902 18.2 The Perceptron................................................904 18.3 Feed-Forward Multilayer Neural Networks...........................908 18.3.1 Fully Connected Networks.................................912 18.4 The Backpropagation Algorithm...................................913 Nonconvexity of the Cost Function...........................914 18.4.1 The Gradient Descent Backpropagation Scheme.................916 18.4.2 Variants of the Basic Gradient Descent Scheme.................924 18.4.3 Beyond the Gradient Descent Rationale.......................934 18.5 Selecting a Cos tFunction........................................935 18.6 Vanishing and Exploding Gradients.................................938 18.6.1 The Rectifie Linear Unit..................................939 18.7 Regularizing the Network........................................940 Dropout...............................................943 18.8 Designing Deep Neural Networks: a Summary.........................946 18.9 Universal Approximation Property of Feed-Forward Neural Networks.......947 18.10 Neural Networks: a Bayesian Flavor................................949 18.11 Shallow Versus Deep Architectures.................................950 18.11.1 The Power of Deep Architectures............................951 18.12 Convolutional Neural Networks....................................956 18.12.1 The Need for Convolutions................................956 18.12.2 Convolution Over Volumes.................................965 18.12.3 The Full CNN Architecture................................968 18.12.4 CNNs: the Epilogue......................................971 18.13 Recurrent Neural Networks.......................................976 18.13.1 Backpropagation Through Time.............................978 18.13.2 Attentionand Memory....................................982 18.14 Adversarial Examples...........................................985 Adversarial Training.....................................987 18.15 Deep Generative Models.........................................988 18.15.1 Restricted Boltzmann Machines.............................988 18.15.2 Pretraining Deep Feed-Forward Networks.....................991 18.15.3 Deep Belief Networks....................................992 18.15.4 Autoencoders...........................................994 18.15.5 Generative Adversarial Networks............................995 18.15.6 Variational Autoencoders..................................1004 18.16 Capsule Networks..............................................1007 Training...............................................1011 18.17 Deep Neural Networks: Some Final Remarks..........................1013 Transfer Learning........................................1013 Multitask Learning.......................................1014 Geometric DeepLearning.................................1015 Open Problems.........................................1016 18.18 A Case Study: Neural Machine Translation...........................1017 18.19 Problems.....................................................1023 Computer Exercises......................................1025 References....................................................1029 CHAPTER19 Dimensionality Reduction and Latent Variable Modeling................1039 19.1 Introduction...................................................1040 19.2 Intrinsic Dimensionality..........................................1041 19.3 Principal Component Analysis.....................................1041 PCA, SVD, and Low Rank Matrix Factorization.................1043 Minimum Error Interpretation..............................1045 PCA and Information Retrieval.............................1045 Orthogonalizing Properties of PCA and Feature Generation........1046 Latent Variables.........................................1047 19.4 Canonical Correlation Analysis....................................1053 19.4.1 Relatives of CCA........................................1056 19.5 Independent Component Analysis..................................1058 19.5.1 ICA and Gaussianity.....................................1058 19.5.2 ICA and Higher-Order Cumulants...........................1059 19.5.3 Non-Gaussianity and Independent Components.................1061 19.5.4 ICA Basedon Mutual Information...........................1062 19.5.5 Alternative Paths to ICA..................................1065 The Cocktail Party Problem................................1066 19.6 Dictionary Learning: the k-SVD Algorithm...........................1069 Whythe Namek-SVD?...................................1072 Dictionary Learning and Dictionary Identifiabilit...............1072 19.7 Nonnegative Matrix Factorization..................................1074 19.8 Learning Low-Dimensional Models: a Probabilistic Perspective............1076 19.8.1 Factor Analysis.........................................1077 19.8.2 Probabilistic PCA.......................................1078 19.8.3 Mixture of Factors Analyzers: a Bayesian View to Compressed Sensing.......................1082 19.9 Nonlinear Dimensionality Reduction................................1085 19.9.1 Kernel PCA............................................1085 19.9.2 Graph-Based Methods....................................1087 19.10 Low Rank Matrix Factorization: a Sparse Modeling Path.................1096 19.10.1 Matrix Completion.......................................1096 19.10.2 Robust PCA............................................1100 19.10.3 Applications of Matrix Completion and ROBUSTPCA...........1101 19.11 A Case Study: FMRI Data Analysis.................................1103 Problems.....................................................1107 MATLAB瓻xercises....................................1107 References....................................................1108 Index....................................................................1116
你還可能感興趣
我要評論
|