# Methodology

## The Lasso: A Brief Review and a New Significance Test

Tibshirani will review the lasso method and show an example of its utility in cancer diagnosis via mass spectometry. He will then consider testing the significance of the terms in a fitted regression, fit via the lasso. He will present a novel test statistic for this problem and show that it has a simple asymptotic null distribution. This work builds on the least angle regression approach for fitting the lasso, and the notion of degrees of freedom for adaptive models (Efron 1986) and for the lasso (Efron et. al 2004, Zou et al 2007). He will give examples of this procedure, discuss extensions to generalized linear models and the Cox model, and describe an R language package for its computation.

This work is joint with Richard Lockhart (Simon Fraser University), Jonathan Taylor (Stanford) and Ryan Tibshirani (Carnegie Mellon).

## Sparse Linear Models

In a statistical world faced with an explosion of data, regularization has become an important ingredient. In many problems, we have many more variables than observations, and the lasso penalty and its hybrids have become increasingly useful. This talk presents a general framework for fitting large scale regularization paths for a variety of problems. We describe the approach, and demonstrate it via examples using our R package GLMNET. We then outline a series of related problems using extensions of these ideas. This is joint work with Jerome Friedman, Rob Tibshirani and Noah Simon.

Trevor Hastie is noted for his many contributions to the statistician’s toolbox of flexible data analysis methods. Beginning with his PhD thesis, Trevor developed a nonparametric version of principal components analysis, terming the methodology principal curves and surfaces. During the years after his PhD, as a member of the AT&T Bell Laboratories statistics and data analysis research group, Trevor developed techniques for linear, generalized linear, and additive models and worked on the development of S, the pre-cursor of R. Much of this work is contained in the well-known Statistical Computing in S (co-edited with John Chambers, 1991). In the book Generalized Additive Models (1990) Trevor and co-author Rob Tibshirani modified techniques like multiple linear regression and logistic regression to allow for smooth modeling while avoiding the usual dimensionality problems. In 1994, Trevor left Bell Labs for Stanford University, to become Professor in Statistics and Biostatistics. Trevor has applied his skills to research in machine learning. His book Elements of Statistical Learning (with Rob Tibshirani and Jerry Friedman, Springer 2001; second edition 2009) is famous for providing a readable account of flexible techniques for high dimensional data. This popular book expertly bridges the philosophical and research gap between computer scientists and statisticians.

- Read more about Sparse Linear Models
- 7149 reads

## Epidemiologic methods are useless. They can only give you answers

The first duty of any epidemiologist is to ask a relevant

question. Learning and applying sophisticated epidemiologic methods is

of little help if the methods are used to answer irrelevant questions.

This talk will discuss the formulation of research questions in the

presence of time-varying treatments and treatments with multiple

versions, including pharmacological treatments and lifestyle

exposures. Several examples will show that discrepancies between

observational studies and randomized trials are often not due to

confounding, but to the different questions asked.

**Brief Biography**

Miguel Hernán is Professor of Department of Epidemiology and Department of Biostatistics at the Harvard School of Public Health (HSPH). His research is focused on the development and application of causal inference methods to guide policy and clinical interventions. He and his collaborators apply statistical methods to observational studies under suitable conditions to emulate hypothetical randomized experiments so that well-formulated causal questions can be investigated properly. His research applied to many areas, including investigation of the optimal use of antiretroviral therapy in patients infected with HIV, assessment of various interventions of kidney disease, cardiovascular disease, cancer and central nervous system diseases. He is Associate Director of HSPH Program on Causal Inference in Epidemiology and Allied Sciences, member of the Affiliated Faculty of the Harvard-MIT Division of Health Sciences and Technology, and an Editor of the journal EPIDEMIOLOGY. He is the author of upcoming highly anticipated textbook "Causal Inference" (Chapman & Hall/CRC, 2013), drafts of selected chapters are available on his website.

## Endogeneity and Discrete Outcomes

This paper studies models for discrete outcomes which permit explanatory variables to be endogenous. In these models there is a single nonadditive latent variate which is restricted to be locally independent of instruments. The models are incomplete; they are silent about the nature of dependence between the latent variate and the endogenous variable and the role of the instrument in this relationship. These single equation IV models which, when an outcome is continuous, can have point identifying power, have only set identifying power when the outcome is discrete. Identification regions vary with the strength and support of instruments and shrink as the support of a discrete outcome grows. The paper extends the analysis of structural quantile functions with endogenous arguments to cases in which there are discrete outcomes.