Lasso paths¶

fit(LassoPath, X, y, d=Normal(), l=canonicallink(d); ...)¶

Fits a linear or generalized linear Lasso path given the design matrix X and response y:

\[\underset{\beta}{\operatorname{argmin}} -\frac{1}{N} \mathcal{L}(y|X,\beta) + \lambda\left[(1-\alpha)\frac{1}{2}\|\beta\|_2^2 + \alpha\|\beta\|_1\right]\]

The optional argument d specifies the conditional distribution of response, while l specifies the link function. Lasso.jl inherits supported distributions and link functions from GLM.jl. The default is to fit an linear Lasso path, i.e., d=Normal(), l=IdentityLink(), or \(\mathcal{L}(y|X,\beta) = -\frac{1}{2}\|y - X\beta\|_2^2 + C\)

Keyword arguments:

name	description	default
wts	Weights for each observation	`ones(length(y))`
offset	Offset of each observation	`zeros(length(y))`
α	Elastic Net parameter in interval [0, 1]. Controls the tradeoff between L1 and L2 regularization. α = 1 fits a pure Lasso model, while α = 0 would fit a pure ridge regression model. Note: Do not set α = 0. There are methods for fitting pure ridge regression models that are substantially more efficient than the coordinate descent procedure used in Lasso.jl.	`1`
λ, nλ, λminratio	Control the values of λ along path at which models are fit. λ can be used to specify a specific set of λ values at which models should be fit. If λ is unspecified, Lasso.jl selects nλ logarithmically spaced λ values from \(\lambda_{\text{max}}\), the smallest λ value yielding a null model, to \(\lambda\text{minratio} * \lambda_{\text{max}}\). If the proportion of deviance explained exceeds 0.999 or the difference between the deviance explained by successive λ values falls below \(10^{-5}\), the path stops early.	`nλ = 100` If more observations than predictors, `λminratio = 1e-4`. Otherwise, `λminratio = 0.001`.
standardize	Whether to standardize predictors to unit standard deviation before fitting.	`true`
intercept	Whether to fit an (unpenalized) model intercept.	`true`
algorithm	Algorithm to use. The NaiveCoordinateDescent algorithm, which iteratively computes the dot product of the predictors with the residuals, as opposed to the CovarianceCoordinateDescent algorithm, which uses a precomputed Gram matrix. NaiveCoordinateDescent is typically faster when there are many predictors that will not enter the model or when fitting generalized linear models.	NaiveCoordinateDescent if more than 5x as many predictors as observations or model is a GLM. CovarianceCoordinateDescent otherwise.
randomize	Whether to randomize the order in which coefficients are updated by coordinate descent. This can drastically speed convergence if coefficients are highly correlated, but is only supported under Julia 0.4.	`true` (if julia >= 0.4)
maxncoef	The maximum number of coefficients allowed in the model. If exceeded, an error will be thrown.	`min(size(X, 2), 2*size(X, 1))`
dofit	Whether to fit the model upon construction. If false, the model can be fit later by calling fit!(model).	`true`
cd_tol	The tolerance for coordinate descent iterations iterations in the inner loop.	`1e-7`
irls_tol	The tolerance for outer iteratively reweighted least squares iterations. This is ignored unless the model is a generalized linear model.	`1e-7`
criterion	Convergence criterion. Controls how `cd_tol` and `irls_tol` are to be interpreted. Possible values are: `:coef`: The model is considered to have converged if the the maximum absolute squared difference in coefficients between successive iterations drops below the specified tolerance. This is the criterion used by glmnet. `:obj`: The model is considered to have converged if the the relative change in the Lasso/Elastic Net objective between successive iterations drops below the specified tolerance. This is the criterion used by GLM.jl.	`:coef`
minStepFac	The minimum step fraction for backtracking line search.	`0.001`
penalty_factor	Separate penalty factor \(\omega_j\) for each coefficient \(j\), i.e. instead of \(\lambda\) penalties become \(\lambda\omega_j\). Note the penalty factors are internally rescaled to sum to the number of variables (following glmnet convention).	`ones(size(X, 2))`

fit returns a LassoPath object describing the fit coefficients and values of λ along the Lasso path. The following fields are intended for external use:

field	description
λ	Vector of λ values corresponding to each fit model along the path
coefs	SparseMatrixCSC of model coefficients. Columns correspond to fit models; rows correspond to predictors
b0	Vector of model intercepts for each fit model
pct_dev	Vector of proportion of deviance explained values for each fit model
nulldev	The deviance of the null model (including the intercept, if specified)
nullb0	The intercept of the null model, or 0 if no intercept was fit
niter	Total number of coordinate descent iterations required to fit all models

For details of the algorithm, see Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1.