Regularization (mathematics)_g the ridge regression from baysian point of view-程序员宅基地

技术标签：统计

Regularization, in mathematics and statistics and particularly in the fields of machine learning and inverse problems, is a process of introducing additional information in order to solve an ill-posed problemor to prevent overfitting.

[hide]

Introduction[edit]

In general, a regularization term $R(f)$ is introduced to a general loss function:

\min _{f}\sum _{i=1}^{n}V(f({\hat {x}}_{i}),{\hat {y}}_{i})+\lambda R(f)

for a loss function $V$ that describes the cost of predicting $f(x)$ when the label is $y$ , such as the square loss or hinge loss, and for the term $\lambda$ which controls the importance of the regularization term. $R(f)$ is typically a penalty on the complexity of $f$ , such as restrictions for smoothness or bounds on the vector space norm.^[1]

A theoretical justification for regularization is that it attempts to impose Occam's razor on the solution, as depicted in the figure. From a Bayesian point of view, many regularization techniques correspond to imposing certain prior distributions on model parameters.

Regularization can be used to learn simpler models, induce models to be sparse, introduce group structure into the learning problem, and more.

The same idea arose in many fields of science. For example, the least-squares method can be viewed as a very simple form of regularization^{[citation needed]}. A simple form of regularization applied to integral equations, generally termed Tikhonov regularization after Andrey Nikolayevich Tikhonov, is essentially a trade-off between fitting the data and reducing a norm of the solution. More recently, non-linear regularization methods, including total variation regularization have become popular.

Generalization[edit]

Main article: Generalization error

Regularization can be motivated as a technique to improve the generalization of a learned model.

The goal of this learning problem is to find a function that fits or predicts the outcome (label) that minimizes the expected error over all possible inputs and labels. The expected error of a function $f_{n}$ is:

I[f_{n}]=\int _{X\times Y}V(f_{n}(x),y)\rho (x,y)\,dx\,dy

Typically in learning problems, only a subset of input data and labels are available, measured with some noise. Therefore, the expected error is unmeasurable, and the best surrogate available is the empirical error over the $N$ available samples:

I_{S}[f_{n}]={\frac {1}{n}}\sum _{i=1}^{N}V(f_{n}({\hat {x}}_{i}),{\hat {y}}_{i})

Without bounds on the complexity of the function space (formally, the reproducing kernel Hilbert space) available, a model will be learned that incurs zero loss on the surrogate empirical error. If measurements (e.g. of $x_{i}$ ) were made with noise, this model may suffer from overfitting and display poor expected error. Regularization introduces a penalty for exploring certain regions of the function space used to build the model, which can improve generalization.

Tikhonov regularization[edit]

Main article: Tikhonov regularization

When learning a linear function, such that $f(x)=w\cdot x$ , the $L_{2}$ norm loss corresponds to Tikhonov regularization. This is one of the most common forms of regularization, is also known as ridge regression, and is expressed as:

\min _{w}\sum _{i=1}^{n}V({\hat {x}}_{i}\cdot w,{\hat {y}}_{i})+\lambda \|w\|_{2}^{2}

In the case of a general function, we take the norm of the function in its reproducing kernel Hilbert space:

\min _{f}\sum _{i=1}^{n}V(f({\hat {x}}_{i}),{\hat {y}}_{i})+\lambda \|f\|_{\mathcal {H}}^{2}

As the $L_{2}$ norm is differentiable, learning problems using Tikhonov regularization can be solved by gradient descent.

Tikhonov regularized least squares[edit]

The learning problem with the least squares loss function and Tikhonov regularization can be solved analytically. Written in matrix form, the optimal $w$ will be the one for which the gradient of the loss function with respect to $w$ is 0.

\min _{w}{\frac {1}{n}}({\hat {X}}w-Y)^{T}({\hat {X}}w-Y)+\lambda \|w\|_{2}^{2}

\nabla _{w}={\frac {2}{n}}{\hat {X}}^{T}({\hat {X}}w-Y)+2\lambda w

\leftarrow

This is the first-order condition for this optimization problem

0={\hat {X}}^{T}({\hat {X}}w-Y)+n\lambda w

w=({\hat {X}}^{T}{\hat {X}}+\lambda nI)^{-1}({\hat {X}}^{T}Y)

By construction of the optimization problem, other values of $w$ would give larger values for the loss function. This could be verified by examining the second derivative $\nabla _{ww}$ .

During training, this algorithm takes $O(d^{3}+nd^{2})$ time. The terms correspond to the matrix inversion and calculating $X^{T}X$ , respectively. Testing takes $O(nd)$ time.

Early stopping[edit]

Main article: Early stopping

Early stopping can be viewed as regularization in time. Intuitively, a training procedure like gradient descent will tend to learn more and more complex functions as the number of iterations increases. By regularizing on time, the complexity of the model can be controlled, improving generalization.

In practice, early stopping is implemented by training on a training set and measuring accuracy on a statistically independent validation set. The model is trained until performance on the validation set no longer improves. The model is then tested on a testing set.

Theoretical motivation in least squares[edit]

Consider the finite approximation of Neumann series for an invertible matrix $A$ where $\|I-A\|<1$ :

\sum _{i=0}^{T-1}(I-A)^{i}\approx A^{-1}

This can be used to approximate the analytical solution of unregularized least squares, if $\gamma$ is introduced to ensure the norm is less than one.

w_{T}={\frac {\gamma }{n}}\sum _{i=0}^{T-1}(I-{\frac {\gamma }{n}}{\hat {X}}^{T}{\hat {X}})^{i}{\hat {X}}^{T}{\hat {Y}}

The exact solution to the unregularized least squares learning problem will minimize the empirical error, but may fail to generalize and minimize the expected error. By limiting $T$ , the only free parameter in the algorithm above, the problem is regularized on time which may improve its generalization.

The algorithm above is equivalent to restricting the number of gradient descent iterations for the empirical risk

I_{s}[w]={\frac {1}{2n}}\|{\hat {X}}w-{\hat {Y}}\|_{\mathbb {R} ^{n}}^{2}

with the gradient descent update:

w_{0}=0

w_{t+1}=(I-{\frac {\gamma }{n}}{\hat {X}}^{T}{\hat {X}})w_{t}+{\frac {\gamma }{n}}{\hat {X}}^{T}{\hat {Y}}

The base case is trivial. The inductive case is proved as follows:

w_{T}=(I-{\frac {\gamma }{n}}{\hat {X}}^{T}{\hat {X}}){\frac {\gamma }{n}}\sum _{i=0}^{T-2}(I-{\frac {\gamma }{n}}{\hat {X}}^{T}{\hat {X}})^{i}{\hat {X}}^{T}{\hat {Y}}+{\frac {\gamma }{n}}{\hat {X}}^{T}{\hat {Y}}

w_{T}={\frac {\gamma }{n}}\sum _{i=1}^{T-1}(I-{\frac {\gamma }{n}}{\hat {X}}^{T}{\hat {X}})^{i}{\hat {X}}^{T}{\hat {Y}}+{\frac {\gamma }{n}}{\hat {X}}^{T}{\hat {Y}}

w_{T}={\frac {\gamma }{n}}\sum _{i=0}^{T-1}(I-{\frac {\gamma }{n}}{\hat {X}}^{T}{\hat {X}})^{i}{\hat {X}}^{T}{\hat {Y}}

Regularizers for sparsity[edit]

Assume that a dictionary $\phi _{j}$ with dimension $p$ is given such that a function in the function space can be expressed as:

f(x)=\sum _{j=1}^{p}\phi _{j}(x)w_{j}

A comparison between the L1 ball and the L2 ball in two dimensions gives an intuition on how L1 regularization achieves sparsity.

Enforcing a sparsity constraint on $w$ can lead to simpler and more interpretable models. This is useful in many real-life applications such as computational biology. An example is developing a simple predictive test for a disease in order to minimize the cost of performing medical tests while maximizing predictive power.

A sensible sparsity constraint is the $L_{0}$ norm $\|w\|_{0}$ , defined as the number of non-zero elements in $w$ . Solving a $L_{0}$ regularized learning problem, however, has been demonstrated to be NP-hard.^[2]

The $L_{1}$ norm can be used to approximate the optimal $L_{0}$ norm via convex relaxation. It can be shown that the $L_{1}$ norm induces sparsity. In the case of least squares, this problem is known as LASSO in statistics and basis pursuit in signal processing.

\min _{w\in \mathbb {R} ^{p}}{\frac {1}{n}}\|{\hat {X}}w-{\hat {Y}}\|^{2}+\lambda \|w\|_{1}

Elastic net regularization

$L_{1}$ regularization can occasionally produce non-unique solutions. A simple example is provided in the figure when the space of possible solutions lies on a 45 degree line. This can be problematic for certain applications, and is overcome by combining $L_{1}$ with $L_{2}$ regularization in elastic net regularization, which takes the following form:

\min _{w\in \mathbb {R} ^{p}}{\frac {1}{n}}\|{\hat {X}}w-{\hat {Y}}\|^{2}+\lambda (\alpha \|w\|_{1}+(1-\alpha )\|w\|_{2}^{2}),\alpha \in [0,1]

Elastic net regularization tends to have a grouping effect, where correlated input features are assigned equal weights.

Elastic net regularization is commonly used in practice and is implemented in many machine learning libraries.

Proximal methods[edit]

Main article: Proximal gradient method

While the $L_{1}$ norm does not result in an NP-hard problem, it should be noted that the $L_{1}$ norm is convex but is not strictly diffentiable due to the kink at x = 0. Subgradient methods which rely on the subderivative can be used to solve $L_{1}$ regularized learning problems. However, faster convergence can be achieved through proximal methods.

For a problem $\min _{w\in H}F(w)+R(w)$ such that $F$ is convex, continuous, differentiable, with Lipschitz continuous gradient (such as the least squares loss function), and $R$ is convex, continuous, and proper, then the proximal method to solve the problem is as follows. First define the proximal operator

\operatorname {prox} _{R}(v)=\operatorname {argmin} \limits _{w\in \mathbb {R} ^{D}}\{R(w)+{\frac {1}{2}}\|w-v\|^{2}\},

and then iterate

w_{k+1}=\operatorname {prox} \limits _{\gamma ,R}(w_{k}-\gamma \nabla F(w_{k}))

The proximal method iteratively performs gradient descent and then projects the result back into the space permitted by $R$ .

When $R$ is the $L_{1}$ regularizer, the proximal operator is equivalent to the soft-thresholding operator,

S_{\lambda }(v)f(n)={\begin{cases}v_{i}-\lambda ,&{\text{if }}v_{i}>\lambda \\0,&{\text{if }}v_{i}\in [-\lambda ,\lambda ]\\v_{i}+\lambda ,&{\text{if }}v_{i}<-\lambda \end{cases}}

This allows for efficient computation.

Group sparsity without overlaps[edit]

Groups of features can be regularized by a sparsity constraint, which can be useful for expressing certain prior knowledge into an optimization problem.

In the case of a linear model with non-overlapping known groups, a regularizer can be defined:

R(w)=\sum _{g=1}^{G}\|w_{g}\|_{g},

where

\|w_{g}\|_{g}={\sqrt {\sum _{j=1}^{|G_{g}|}(w_{g}^{j})^{2}}}

This can be viewed as inducing a regularizer over the $L_{2}$ norm over members of each group followed by an $L_{1}$ norm over groups.

This can be solved by the proximal method, where the proximal operator is a block-wise soft-thresholding function:

(\operatorname {prox} \limits _{\lambda ,R,g}(w_{g}))^{j}={\begin{cases}(w_{g}^{j}-\lambda {\frac {w_{g}^{j}}{\|w_{g}\|_{g}}}),&{\text{if }}\|w_{g}\|_{g}>\lambda \\0&{\text{if }}\|w_{g}\|_{g}\in [-\lambda ,\lambda ]\\(w_{g}^{j}+\lambda {\frac {w_{g}^{j}}{\|w_{g}\|_{g}}}),&{\text{if }}\|w_{g}\|_{g}<-\lambda \end{cases}}

Group sparsity with overlaps[edit]

The algorithm described for group sparsity without overlaps can be applied to the case where groups do overlap, in certain situations. It should be noted that this will likely result in some groups with all zero elements, and other groups with some non-zero and some zero elements.

If it is desired to preserve the group structure, a new regularizer can be defined:

R(w)=\inf \left\{\sum _{g=1}^{G}\|w_{g}\|_{g}:w=\sum _{g=1}^{G}{\bar {w}}_{g}\right\}

For each $w_{g}$ , ${\bar {w}}_{g}$ is defined as the vector such that the restriction of ${\bar {w}}_{g}$ to the group $g$ equals $w_{g}$ and all other entries of ${\bar {w}}_{g}$ are zero. The regularizer finds the optimal disintegration of $w$ into parts. It can be viewed as duplicating all elements that exist in multiple groups. Learning problems with this regularizer can also be solved with the proximal method with a complication. The proximal operator cannot be computed in closed form, but can be effectively solved iteratively, inducing an inner iteration within the proximal method iteration.

Regularizers for semi-supervised learning[edit]

Main article: Semi-supervised learning

When labels are more expensive to gather than input examples, semi-supervised learning can be useful. Regularizers have been designed to guide learning algorithms to learn models that respect the structure of unsupervised training samples. If a symmetric weight matrix $W$ is given, a regularizer can be defined:

R(f)=\sum _{i,j}w_{ij}(f(x_{i})-f(x_{j}))^{2}

If $W_{ij}$ encodes the result of some distance metric for points $x_{i}$ and $x_{j}$ , it is desirable that $f(x_{i})\approx f(x_{j})$ . This regularizer captures this intuition, and is equivalent to:

R(f)={\bar {f}}^{T}L{\bar {f}}

where

L=D-W

is the Laplacian matrix of the graph induced by

W

The optimization problem $\min _{f\in \mathbb {R} ^{m}}R(f),m=u+l$ can be solved analytically if the constraint $f(x_{i})=y_{i}$ is applied for all supervised samples. The labeled part of the vector $f$ is therefore obvious. The unlabeled part of $f$ is solved for by:

\min _{f_{u}\in \mathbb {R} ^{u}}f^{T}Lf=\min _{f_{u}\in \mathbb {R} ^{u}}\{f_{u}^{T}L_{uu}f_{u}+f_{l}^{T}L_{lu}f_{u}+f_{u}^{T}L_{ul}f_{l}\}

\nabla _{f_{u}}=2L_{uu}f_{u}+2L_{ul}Y

f_{u}=L_{uu}^{\dagger }(L_{ul}Y)

Note that the pseudo-inverse can be taken because $L_{ul}$ has the same range as $L_{{uu}}$ .

Regularizers for multitask learning[edit]

Main article: Multi-task learning

In the case of multitask learning, $T$ problems are considered simultaneously, each related in some way. The goal is to learn $T$ functions, ideally borrowing strength from the relatedness of tasks, that have predictive power. This is equivalent to learning the matrix $W:T\times D$ .

Sparse regularizer on columns[edit]

R(w)=\sum _{i=1}^{D}\|W\|_{2,1}

This regularizer defines an L2 norm on each column and an L1 norm over all columns. It can be solved by proximal methods.

Nuclear norm regularization[edit]

R(w)=\|\sigma (W)\|_{1}

where

\sigma (W)

is the eigenvalues in the singular value decomposition of

W

Mean-constrained regularization[edit]

R(f_{1}\cdots f_{T})=\sum _{t=1}^{T}\|f_{t}-{\frac {1}{T}}\sum _{s=1}^{T}f_{s}\|_{H_{k}}^{2}

This regularizer constrains the functions learned for each task to be similar to the overall average of the functions across all tasks. This is useful for expressing prior information that each task is expected to share similarities with each other task. An example is predicting blood iron levels measured at different times of the day, where each task represents a different person.

Clustered mean-constrained regularization[edit]

R(f_{1}\cdots f_{T})=\sum _{r=1}^{C}\sum _{t\in I(r)}\|f_{t}-{\frac {1}{I(r)}}\sum _{s\in I(r)}f_{s}\|_{H_{k}}^{2}

where

I(r)

is a cluster of tasks.

This regularizer is similar to the mean-constrained regularizer, but instead enforces similarity between tasks within the same cluster. This can capture more complex prior information. This technique has been used to predict Netflix recommendations. A cluster would correspond to a group of people who share similar preferences in movies.

Graph-based similarity[edit]

More general than above, similarity between tasks can be defined by a function. The regularizer encourages the model to learn similar functions for similar tasks.

R(f_{1}\cdots f_{T})=\sum _{t,s=1,t\neq s}^{T}\|f_{t}-f_{s}\|^{2}M_{ts}

for a given symmetric similarity matrix

M

Other uses of regularization in statistics and machine learning[edit]

Bayesian learning methods make use of a prior probability that (usually) gives lower probability to more complex models. Well-known model selection techniques include the Akaike information criterion (AIC), minimum description length (MDL), and the Bayesian information criterion (BIC). Alternative methods of controlling overfitting not involving regularization include cross-validation.

Examples of applications of different methods of regularization to the linear model are:

Model	Fit measure	Entropy measure^[1]^[3]
AIC/BIC	$\\|Y-X\beta \\|_{2}$	$\\|\beta \\|_{0}$
Ridge regression^[4]	$\\|Y-X\beta \\|_{2}$	$\\|\beta \\|_{2}$
Lasso^[5]	$\\|Y-X\beta \\|_{2}$	$\\|\beta \\|_{1}$
Basis pursuit denoising	$\\|Y-X\beta \\|_{2}$	$\lambda \\|\beta \\|_{1}$
Rudin–Osher–Fatemi model (TV)	$\\|Y-X\beta \\|_{2}$	$\lambda \\|\nabla \beta \\|_{1}$
Potts model	$\\|Y-X\beta \\|_{2}$	$\lambda \\|\nabla \beta \\|_{0}$
RLAD^[6]	$\\|Y-X\beta \\|_{1}$	$\\|\beta \\|_{1}$
Dantzig Selector^[7]	$\\|X^{\top }(Y-X\beta )\\|_{\infty }$	$\\|\beta \\|_{1}$
SLOPE^[8]	$\\|Y-X\beta \\|_{2}$	$\sum _{i=1}^{p}\lambda _{i}\|\beta \|_{(i)}$

Notes[edit]

^ Jump up to: ^a ^b Bishop, Christopher M. (2007). Pattern recognition and machine learning (Corr. printing. ed.). New York: Springer. ISBN 978-0387310732.
Jump up ^ Natarajan, B. (1995-04-01). "Sparse Approximate Solutions to Linear Systems". SIAM Journal on Computing. 24 (2): 227–234. doi:10.1137/S0097539792240406. ISSN 0097-5397.
Jump up ^ Duda, Richard O. (2004). Pattern classification + computer manual : hardcover set (2. ed.). New York [u.a.]: Wiley. ISBN 978-0471703501.
Jump up ^ Arthur E. Hoerl; Robert W. Kennard (1970). "Ridge regression: Biased estimation for nonorthogonal problems". Technometrics. 12 (1): 55–67. doi:10.2307/1267351.
Jump up ^ Tibshirani, Robert (1996). "Regression Shrinkage and Selection via the Lasso" (PostScript). Journal of the Royal Statistical Society, Series B. 58 (1): 267–288. MR 1379242. Retrieved 2009-03-19.
Jump up ^ Li Wang, Michael D. Gordon & Ji Zhu (2006). "Regularized Least Absolute Deviations Regression and an Efficient Algorithm for Parameter Tuning". Sixth International Conference on Data Mining. pp. 690–700. doi:10.1109/ICDM.2006.134.
Jump up ^ Candes, Emmanuel; Tao, Terence (2007). "The Dantzig selector: Statistical estimation when p is much larger than n". Annals of Statistics. 35 (6): 2313–2351. arXiv:math/0506081. doi:10.1214/009053606000001523. MR 2382644.
Jump up ^ Małgorzata Bogdan, Ewout van den Berg, Weijie Su & Emmanuel J. Candes (2013). "Statistical estimation and testing via the ordered L1 norm". arXiv:1310.1969.

References[edit]

A. Neumaier, Solving ill-conditioned and singular linear systems: A tutorial on regularization, SIAM Review 40 (1998), 636–666. Available in pdf from author's website.
Rosasco, L. Regularized Least Squares, Class Notes from MIT 9.520. Link
L. Rosasco, T. Poggio, A Regularization Tour of Machine Learning, MIT-9.520 Lectures Notes (book draft), 2015.
Rosasco, L. Early Stopping, Class Notes from MIT 9.520. http://www.mit.edu/~9.520/fall15/Classes/early_stopping.html
Rosasco, L. Sparsity, Class Notes from MIT 9.520. http://www.mit.edu/~9.520/fall15/Classes/sparsity.html
Rosasco, L. Proximal Methods, Class Notes from MIT 9.520. http://www.mit.edu/~9.520/fall15/Classes/proxy.html

本文链接：https://blog.csdn.net/ThinkingDou/article/details/68922050

原作者删帖不实内容删帖广告或垃圾文章投诉

智能推荐

wen zi gun dong-程序员宅基地

文章浏览阅读89次。html<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><hea..._"intid[i] = setinterval(function(){ if(_this.find(\".123\").height()<=_this"

C语言动态函数调用_c语言动态调用函数-程序员宅基地

文章浏览阅读2.6k次。在远程调用中，服务器在收到请求后，需要通过查符号的手段，获取函数指针，然后调用客户端请求的函数。然而，不同函数参数个数、类型皆不相同，函数指针在定义时就需要明确类型，因此，没有一种定义，可以满足所有函数的调用。_c语言动态调用函数

Android程序员：为了跳槽刷完1307页的面试真题，没想到老板直接给我升职了-程序员宅基地

文章浏览阅读744次，点赞23次，收藏18次。最后为了帮助大家深刻理解Android相关知识点的原理以及面试相关知识，这里放上相关的我搜集整理的14套腾讯、字节跳动、阿里、百度等2021最新面试真题解析，我把技术点整理成了视频和PDF（实际上比预期多花了不少精力），包知识脉络 + 诸多细节。网上学习 Android的资料一大堆，但如果学到的知识不成体系，遇到问题时只是浅尝辄止，不再深入研究，那么很难做到真正的技术提升。本文已被CODING开源项目：《Android学习笔记总结+移动架构视频+大厂面试真题+项目实战源码》收录。

代理模式的简单理解_代理模式作用简洁-程序员宅基地

文章浏览阅读134次。简介代理模式是程序员前辈们总结出来的一种设计模式，它出现的作用是降低程序的耦合度，更利于程序的扩展，它的要求：① 必须有一个接口（抽象主题角色）② 有一个实现类实现这个接口（真实主题角色）③ 然后再来一个实现类也去实现接口（代理主题角色）④ 第二个实现类中持有第一个实现类的对象作为属性⑤ 第二个实现类中也要重写接口的所有抽象方法，但是在重写方法的时候，调用第一个实现类对象的对应方法，..._代理模式作用简洁

OPENCV2.4.7+VS2010+海康威视摄像头_sadp_lib-程序员宅基地

文章浏览阅读1k次。准备：VS2010,OpenCV2.4.7,海康威视网络PTZ摄像头，Win10操作系统。一．摄像头的安装1.按照说明书安装好摄像头，用网线连接在电脑上，配置电脑IP或者摄像头IP，保证摄像头和电脑在同一个网段，这时摄像头会提醒成功连接网络。2.从海康威视官网上下载SADP并安装（这个版本的SADP我下载下来以后装上了却用不了，后来我就下了比这个低一个版本的，可以使用），按照说明书在S..._sadp_lib

Web前端开发快速学习，0基础前端开发，web前端开发工程师找工作，web开发学什么-程序员宅基地

文章浏览阅读604次，点赞10次，收藏8次。自我介绍一下，小编13年上海交大毕业，曾经在小公司待过，也去过华为、OPPO等大厂，18年进入阿里一直到现在。深知大多数前端工程师，想要提升技能，往往是自己摸索成长或者是报班学习，但对于培训机构动则几千的学费，着实压力不小。自己不成体系的自学效果低效又漫长，而且极易碰到天花板技术停滞不前！因此收集整理了一份《2024年Web前端开发全套学习资料》，初衷也很简单，就是希望能够帮助到想自学提升又不知道该从何学起的朋友，同时减轻大家的负担。

随便推点

尚硅谷最新版JavaScript基础全套教程完整版（p48-p65）_尚硅谷javascript新书大纲-程序员宅基地

文章浏览阅读237次。尚硅谷最新版JavaScript基础全套教程完整版(140集实战教学,JS从入门到精通)一、基本数据类型和引用数据类型1.基本数据类型-string 、 number 、 Boolean 、null 、undefined2.引用数据类型-object3.区别-JS中的变量都是保存到栈内存中的，基本数据类型的值直接在栈内存中存储，值与值之间是独立存在的，修改一个变量不会影响另外一个变量。-引用数据类型（对象）是保存到堆内存中的，每创建一个新的对象，就会在堆内存中开辟出一个新的空间，而变量保存_尚硅谷javascript新书大纲

ACM--HDOJ 2072--单词数--字符串--水_java acm单词数问题 #结束-程序员宅基地

文章浏览阅读1.2k次。HDOJ题目地址：传送门单词数Time Limit: 1000/1000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others)Total Submission(s): 44934 Accepted Submission(s): 10992Problem Descr_java acm单词数问题 #结束

uniapp自定义tabbar必看_uniapp custom-tab-bar-程序员宅基地

文章浏览阅读9.5k次。方式一：实验证明，在根目录下新建custom-tab-bar目录，在目录中新建index.vue是行不通的，vue文件不会被编译方式二：将小程序四大法宝（wxss,json,wxml,js）直接搬过来，虽然tabbar有渲染在小程序上了，但是切换是没有效果的，所以还是行不通方式三（行得通）经过上面的两个尝试，还是乖乖的以vue的做法吧，用单页面的形式，通过v-show控制组件的隐藏和显示注意：v-show有时没有效果，因为v-show是通过display:none来控制的，它的权重没_uniapp custom-tab-bar

树莓派系列-3-连接到树莓派_树莓派片选接到-程序员宅基地

文章浏览阅读1.3k次。我们用树莓派，估计是没有人会接着屏幕使用的，但是如果有需求也可以使用。如果我们不用屏幕来使用树莓派，那么就得使用SSH、VNC、还有我们的Windows远程工具了。1.SSH 需要在树莓派中开启SSH支持，开启SSH支持有多种方式，这里我就说说我用的，第一种就是在我们烧写有系统的时候，在boot的分区里面新建一个不带任何后缀的ssh文件，最简单的方式就是新建txt文件，完了重命名为s..._树莓派片选接到

【毕业设计】STM32化工厂系统-程序员宅基地

文章浏览阅读32次。整个系统以STM32 单片机作为核心控制器，通过DHT11检测温湿度，通过CO传感器检测CO浓度，通过火焰传感器检测火焰，通过红外传感器检测人，通过RFID模块检测刷卡，检测到的数据通过OLED显示并通过无线传输模块上传数据到手机APP，通过继电器控制水阀，通过蜂鸣器报警。

CSS三角、界面样式(cursor、input输入边框不改变颜色、textarea拖拽不改变大小)、vertical-align、溢出文字省略号显示、CSS初始化_html css input::cue-程序员宅基地

文章浏览阅读1.5k次，点赞2次，收藏7次。vertical-align的可选值为：1. bottom: 图片的底线和文字的底线对齐，2. baseline：默认，图片的底线和文字的基线对齐，3. middle: 图片的中线和文字的中线对齐，4. top：图片的顶线和文字的顶线对齐。不同浏览器对有些标签的默认值是不同的，为了消除不同浏览器对HTML文本呈现的差异，所以需要进行CSS初始化。当我们选择input输入框，进行文字输入的时候，边框会改变颜色。textarea默认可以在右下角进行拖拽，改变输入框的大小。CSS初始化参考如下。_html css input::cue

Model	Fit measure	Entropy measure^[1]^[3]
AIC/BIC	$\\|Y-X\beta \\|_{2}$	$\\|\beta \\|_{0}$
Ridge regression^[4]	$\\|Y-X\beta \\|_{2}$	$\\|\beta \\|_{2}$
Lasso^[5]	$\\|Y-X\beta \\|_{2}$	$\\|\beta \\|_{1}$
Basis pursuit denoising	$\\|Y-X\beta \\|_{2}$	$\lambda \\|\beta \\|_{1}$
Rudin–Osher–Fatemi model (TV)	$\\|Y-X\beta \\|_{2}$	$\lambda \\|\nabla \beta \\|_{1}$
Potts model	$\\|Y-X\beta \\|_{2}$	$\lambda \\|\nabla \beta \\|_{0}$
RLAD^[6]	$\\|Y-X\beta \\|_{1}$	$\\|\beta \\|_{1}$
Dantzig Selector^[7]	$\\|X^{\top }(Y-X\beta )\\|_{\infty }$	$\\|\beta \\|_{1}$
SLOPE^[8]	$\\|Y-X\beta \\|_{2}$	$\sum _{i=1}^{p}\lambda _{i}\|\beta \|_{(i)}$