Sparse Linear Models

Speaker: Trevor Hastie

Date: Tue, Sep 17, 2013

Location: PIMS, University of British Columbia

Conference: Statistics

Subject: Methodology, Statistics Theory, Statistics

Class: Scientific

Abstract:

In a statistical world faced with an explosion of data, regularization has become an important ingredient. In many problems, we have many more variables than observations, and the lasso penalty and its hybrids have become increasingly useful. This talk presents a general framework for fitting large scale regularization paths for a variety of problems. We describe the approach, and demonstrate it via examples using our R package GLMNET. We then outline a series of related problems using extensions of these ideas. This is joint work with Jerome Friedman, Rob Tibshirani and Noah Simon.

 

Trevor Hastie is noted for his many contributions to the statistician’s toolbox of flexible data analysis methods.  Beginning with his PhD thesis, Trevor developed a nonparametric version of principal components analysis, terming the methodology principal curves and surfaces.  During the years after his PhD, as a member of the AT&T Bell Laboratories statistics and data analysis research group, Trevor developed techniques for linear, generalized linear, and additive models and worked on the development of S, the pre-cursor of R.  Much of this work is contained in the well-known Statistical Computing in S (co-edited with John Chambers, 1991).   In the book Generalized Additive Models (1990) Trevor and co-author Rob Tibshirani modified techniques like multiple linear regression and logistic regression to allow for smooth modeling while avoiding the usual dimensionality problems.  In 1994, Trevor left Bell Labs for Stanford University, to become Professor in Statistics and Biostatistics.  Trevor has applied his skills to research in machine learning.  His book Elements of Statistical Learning (with Rob Tibshirani and Jerry Friedman, Springer 2001; second edition 2009) is famous for providing a readable account of flexible techniques for high dimensional data.  This popular book expertly bridges the philosophical and research gap between computer scientists and statisticians.