Regularization (mathematics)_g the ridge regression from baysian point of view-程序员宅基地

技术标签: 统计  

Regularization, in mathematics and statistics and particularly in the fields of machine learning and inverse problems, is a process of introducing additional information in order to solve an ill-posed problemor to prevent overfitting.

Contents

   [hide

Introduction[edit]

In general, a regularization term R(f)R(f) is introduced to a general loss function:

minf∑i=1nV(f(x^i),y^i)+λR(f)\min _{f}\sum _{i=1}^{n}V(f({\hat {x}}_{i}),{\hat {y}}_{i})+\lambda R(f)

for a loss function VV that describes the cost of predicting f(x)f(x) when the label is yy, such as the square loss or hinge loss, and for the term λ\lambda  which controls the importance of the regularization term. R(f)R(f) is typically a penalty on the complexity of ff, such as restrictions for smoothness or bounds on the vector space norm.[1]

A theoretical justification for regularization is that it attempts to impose Occam's razor on the solution, as depicted in the figure. From a Bayesian point of view, many regularization techniques correspond to imposing certain prior distributions on model parameters.

Regularization can be used to learn simpler models, induce models to be sparse, introduce group structure into the learning problem, and more.

The same idea arose in many fields of science. For example, the least-squares method can be viewed as a very simple form of regularization[citation needed]. A simple form of regularization applied to integral equations, generally termed Tikhonov regularization after Andrey Nikolayevich Tikhonov, is essentially a trade-off between fitting the data and reducing a norm of the solution. More recently, non-linear regularization methods, including total variation regularization have become popular.

Generalization[edit]

Main article:  Generalization error

Regularization can be motivated as a technique to improve the generalization of a learned model.

The goal of this learning problem is to find a function that fits or predicts the outcome (label) that minimizes the expected error over all possible inputs and labels. The expected error of a function fnf_{n} is:

I[fn]=∫X×YV(fn(x),y)ρ(x,y)dxdy{\displaystyle I[f_{n}]=\int _{X\times Y}V(f_{n}(x),y)\rho (x,y)\,dx\,dy}

Typically in learning problems, only a subset of input data and labels are available, measured with some noise. Therefore, the expected error is unmeasurable, and the best surrogate available is the empirical error over the NN available samples:

IS[fn]=1n∑i=1NV(fn(x^i),y^i)I_{S}[f_{n}]={\frac {1}{n}}\sum _{i=1}^{N}V(f_{n}({\hat {x}}_{i}),{\hat {y}}_{i})

Without bounds on the complexity of the function space (formally, the reproducing kernel Hilbert space) available, a model will be learned that incurs zero loss on the surrogate empirical error. If measurements (e.g. of xix_{i}) were made with noise, this model may suffer from overfitting and display poor expected error. Regularization introduces a penalty for exploring certain regions of the function space used to build the model, which can improve generalization.

Tikhonov regularization[edit]

Main article:  Tikhonov regularization

When learning a linear function, such that f(x)=w⋅xf(x)=w\cdot x, the L2L_{2} norm loss corresponds to Tikhonov regularization. This is one of the most common forms of regularization, is also known as ridge regression, and is expressed as:

minw∑i=1nV(x^i⋅w,y^i)+λ∥w∥22\min _{w}\sum _{i=1}^{n}V({\hat {x}}_{i}\cdot w,{\hat {y}}_{i})+\lambda \|w\|_{2}^{2}

In the case of a general function, we take the norm of the function in its reproducing kernel Hilbert space:

minf∑i=1nV(f(x^i),y^i)+λ∥f∥H2\min _{f}\sum _{i=1}^{n}V(f({\hat {x}}_{i}),{\hat {y}}_{i})+\lambda \|f\|_{\mathcal {H}}^{2}

As the L2L_{2} norm is differentiable, learning problems using Tikhonov regularization can be solved by gradient descent.

Tikhonov regularized least squares[edit]

The learning problem with the least squares loss function and Tikhonov regularization can be solved analytically. Written in matrix form, the optimal ww will be the one for which the gradient of the loss function with respect to ww is 0.

minw1n(X^w−Y)T(X^w−Y)+λ∥w∥22{\displaystyle \min _{w}{\frac {1}{n}}({\hat {X}}w-Y)^{T}({\hat {X}}w-Y)+\lambda \|w\|_{2}^{2}}
∇w=2nX^T(X^w−Y)+2λw{\displaystyle \nabla _{w}={\frac {2}{n}}{\hat {X}}^{T}({\hat {X}}w-Y)+2\lambda w}        \leftarrow This is the  first-order condition for this optimization problem
0=X^T(X^w−Y)+nλw{\displaystyle 0={\hat {X}}^{T}({\hat {X}}w-Y)+n\lambda w}
w=(X^TX^+λnI)−1(X^TY){\displaystyle w=({\hat {X}}^{T}{\hat {X}}+\lambda nI)^{-1}({\hat {X}}^{T}Y)}

By construction of the optimization problem, other values of ww would give larger values for the loss function. This could be verified by examining the second derivative ∇ww{\displaystyle \nabla _{ww}}.

During training, this algorithm takes O(d3+nd2)O(d^{3}+nd^{2}) time. The terms correspond to the matrix inversion and calculating XTXX^{T}X, respectively. Testing takes O(nd)O(nd) time.

Early stopping[edit]

Main article:  Early stopping

Early stopping can be viewed as regularization in time. Intuitively, a training procedure like gradient descent will tend to learn more and more complex functions as the number of iterations increases. By regularizing on time, the complexity of the model can be controlled, improving generalization.

In practice, early stopping is implemented by training on a training set and measuring accuracy on a statistically independent validation set. The model is trained until performance on the validation set no longer improves. The model is then tested on a testing set.

Theoretical motivation in least squares[edit]

Consider the finite approximation of Neumann series for an invertible matrix AA where ∥I−A∥<1{\displaystyle \|I-A\|<1}:

∑i=0T−1(I−A)i≈A−1\sum _{i=0}^{T-1}(I-A)^{i}\approx A^{-1}

This can be used to approximate the analytical solution of unregularized least squares, if γ\gamma  is introduced to ensure the norm is less than one.

wT=γn∑i=0T−1(I−γnX^TX^)iX^TY^w_{T}={\frac {\gamma }{n}}\sum _{i=0}^{T-1}(I-{\frac {\gamma }{n}}{\hat {X}}^{T}{\hat {X}})^{i}{\hat {X}}^{T}{\hat {Y}}

The exact solution to the unregularized least squares learning problem will minimize the empirical error, but may fail to generalize and minimize the expected error. By limiting TT, the only free parameter in the algorithm above, the problem is regularized on time which may improve its generalization.

The algorithm above is equivalent to restricting the number of gradient descent iterations for the empirical risk

Is[w]=12n∥X^w−Y^∥Rn2I_{s}[w]={\frac {1}{2n}}\|{\hat {X}}w-{\hat {Y}}\|_{\mathbb {R} ^{n}}^{2}

with the gradient descent update:

w0=0w_{0}=0
wt+1=(I−γnX^TX^)wt+γnX^TY^w_{t+1}=(I-{\frac {\gamma }{n}}{\hat {X}}^{T}{\hat {X}})w_{t}+{\frac {\gamma }{n}}{\hat {X}}^{T}{\hat {Y}}

The base case is trivial. The inductive case is proved as follows:

wT=(I−γnX^TX^)γn∑i=0T−2(I−γnX^TX^)iX^TY^+γnX^TY^w_{T}=(I-{\frac {\gamma }{n}}{\hat {X}}^{T}{\hat {X}}){\frac {\gamma }{n}}\sum _{i=0}^{T-2}(I-{\frac {\gamma }{n}}{\hat {X}}^{T}{\hat {X}})^{i}{\hat {X}}^{T}{\hat {Y}}+{\frac {\gamma }{n}}{\hat {X}}^{T}{\hat {Y}}
wT=γn∑i=1T−1(I−γnX^TX^)iX^TY^+γnX^TY^w_{T}={\frac {\gamma }{n}}\sum _{i=1}^{T-1}(I-{\frac {\gamma }{n}}{\hat {X}}^{T}{\hat {X}})^{i}{\hat {X}}^{T}{\hat {Y}}+{\frac {\gamma }{n}}{\hat {X}}^{T}{\hat {Y}}
wT=γn∑i=0T−1(I−γnX^TX^)iX^TY^w_{T}={\frac {\gamma }{n}}\sum _{i=0}^{T-1}(I-{\frac {\gamma }{n}}{\hat {X}}^{T}{\hat {X}})^{i}{\hat {X}}^{T}{\hat {Y}}

Regularizers for sparsity[edit]

Assume that a dictionary ϕj\phi _{j} with dimension pp is given such that a function in the function space can be expressed as:

f(x)=∑j=1pϕj(x)wjf(x)=\sum _{j=1}^{p}\phi _{j}(x)w_{j}
A comparison between the L1 ball and the L2 ball in two dimensions gives an intuition on how L1 regularization achieves sparsity.

Enforcing a sparsity constraint on ww can lead to simpler and more interpretable models. This is useful in many real-life applications such as computational biology. An example is developing a simple predictive test for a disease in order to minimize the cost of performing medical tests while maximizing predictive power.

A sensible sparsity constraint is the L0L_{0} norm ∥w∥0\|w\|_{0}, defined as the number of non-zero elements in ww. Solving a L0L_{0}regularized learning problem, however, has been demonstrated to be NP-hard.[2]

The L1L_{1} norm can be used to approximate the optimal L0L_{0} norm via convex relaxation. It can be shown that the L1L_{1} norm induces sparsity. In the case of least squares, this problem is known as LASSO in statistics and basis pursuit in signal processing.

minw∈Rp1n∥X^w−Y^∥2+λ∥w∥1\min _{w\in \mathbb {R} ^{p}}{\frac {1}{n}}\|{\hat {X}}w-{\hat {Y}}\|^{2}+\lambda \|w\|_{1}
Elastic net regularization

L1L_{1} regularization can occasionally produce non-unique solutions. A simple example is provided in the figure when the space of possible solutions lies on a 45 degree line. This can be problematic for certain applications, and is overcome by combining L1L_{1}with L2L_{2} regularization in elastic net regularization, which takes the following form:

minw∈Rp1n∥X^w−Y^∥2+λ(α∥w∥1+(1−α)∥w∥22),α∈[0,1]\min _{w\in \mathbb {R} ^{p}}{\frac {1}{n}}\|{\hat {X}}w-{\hat {Y}}\|^{2}+\lambda (\alpha \|w\|_{1}+(1-\alpha )\|w\|_{2}^{2}),\alpha \in [0,1]

Elastic net regularization tends to have a grouping effect, where correlated input features are assigned equal weights.

Elastic net regularization is commonly used in practice and is implemented in many machine learning libraries.

Proximal methods[edit]

Main article:  Proximal gradient method

While the L1L_{1} norm does not result in an NP-hard problem, it should be noted that the L1L_{1} norm is convex but is not strictly diffentiable due to the kink at x = 0. Subgradient methods which rely on the subderivative can be used to solve L1L_{1} regularized learning problems. However, faster convergence can be achieved through proximal methods.

For a problem minw∈HF(w)+R(w)\min _{w\in H}F(w)+R(w) such that FF is convex, continuous, differentiable, with Lipschitz continuous gradient (such as the least squares loss function), and RR is convex, continuous, and proper, then the proximal method to solve the problem is as follows. First define the proximal operator

proxR⁡(v)=argminw∈RD⁡{R(w)+12∥w−v∥2},{\displaystyle \operatorname {prox} _{R}(v)=\operatorname {argmin} \limits _{w\in \mathbb {R} ^{D}}\{R(w)+{\frac {1}{2}}\|w-v\|^{2}\},}

and then iterate

wk+1=proxγ,R⁡(wk−γ∇F(wk)){\displaystyle w_{k+1}=\operatorname {prox} \limits _{\gamma ,R}(w_{k}-\gamma \nabla F(w_{k}))}

The proximal method iteratively performs gradient descent and then projects the result back into the space permitted by RR.

When RR is the L1L_{1} regularizer, the proximal operator is equivalent to the soft-thresholding operator,

Sλ(v)f(n)={vi−λ,if vi>λ0,if vi∈[−λ,λ]vi+λ,if vi<−λ{\displaystyle S_{\lambda }(v)f(n)={\begin{cases}v_{i}-\lambda ,&{\text{if }}v_{i}>\lambda \\0,&{\text{if }}v_{i}\in [-\lambda ,\lambda ]\\v_{i}+\lambda ,&{\text{if }}v_{i}<-\lambda \end{cases}}}

This allows for efficient computation.

Group sparsity without overlaps[edit]

Groups of features can be regularized by a sparsity constraint, which can be useful for expressing certain prior knowledge into an optimization problem.

In the case of a linear model with non-overlapping known groups, a regularizer can be defined:

R(w)=∑g=1G∥wg∥g,{\displaystyle R(w)=\sum _{g=1}^{G}\|w_{g}\|_{g},} where  ∥wg∥g=∑j=1|Gg|(wgj)2\|w_{g}\|_{g}={\sqrt {\sum _{j=1}^{|G_{g}|}(w_{g}^{j})^{2}}}

This can be viewed as inducing a regularizer over the L2L_{2} norm over members of each group followed by an L1L_{1} norm over groups.

This can be solved by the proximal method, where the proximal operator is a block-wise soft-thresholding function:

(proxλ,R,g⁡(wg))j={(wgj−λwgj∥wg∥g),if ∥wg∥g>λ0if ∥wg∥g∈[−λ,λ](wgj+λwgj∥wg∥g),if ∥wg∥g<−λ{\displaystyle (\operatorname {prox} \limits _{\lambda ,R,g}(w_{g}))^{j}={\begin{cases}(w_{g}^{j}-\lambda {\frac {w_{g}^{j}}{\|w_{g}\|_{g}}}),&{\text{if }}\|w_{g}\|_{g}>\lambda \\0&{\text{if }}\|w_{g}\|_{g}\in [-\lambda ,\lambda ]\\(w_{g}^{j}+\lambda {\frac {w_{g}^{j}}{\|w_{g}\|_{g}}}),&{\text{if }}\|w_{g}\|_{g}<-\lambda \end{cases}}}

Group sparsity with overlaps[edit]

The algorithm described for group sparsity without overlaps can be applied to the case where groups do overlap, in certain situations. It should be noted that this will likely result in some groups with all zero elements, and other groups with some non-zero and some zero elements.

If it is desired to preserve the group structure, a new regularizer can be defined:

R(w)=inf{∑g=1G∥wg∥g:w=∑g=1Gw¯g}{\displaystyle R(w)=\inf \left\{\sum _{g=1}^{G}\|w_{g}\|_{g}:w=\sum _{g=1}^{G}{\bar {w}}_{g}\right\}}

For each wgw_{g}w¯g{\bar {w}}_{g} is defined as the vector such that the restriction of w¯g{\bar {w}}_{g} to the group gg equals wgw_{g} and all other entries of w¯g{\bar {w}}_{g} are zero. The regularizer finds the optimal disintegration of ww into parts. It can be viewed as duplicating all elements that exist in multiple groups. Learning problems with this regularizer can also be solved with the proximal method with a complication. The proximal operator cannot be computed in closed form, but can be effectively solved iteratively, inducing an inner iteration within the proximal method iteration.

Regularizers for semi-supervised learning[edit]

Main article:  Semi-supervised learning

When labels are more expensive to gather than input examples, semi-supervised learning can be useful. Regularizers have been designed to guide learning algorithms to learn models that respect the structure of unsupervised training samples. If a symmetric weight matrix WW is given, a regularizer can be defined:

R(f)=∑i,jwij(f(xi)−f(xj))2R(f)=\sum _{i,j}w_{ij}(f(x_{i})-f(x_{j}))^{2}

If WijW_{ij} encodes the result of some distance metric for points xix_{i} and xjx_{j}, it is desirable that f(xi)≈f(xj)f(x_{i})\approx f(x_{j}). This regularizer captures this intuition, and is equivalent to:

R(f)=f¯TLf¯R(f)={\bar {f}}^{T}L{\bar {f}} where  L=D−WL=D-W is the  Laplacian matrix of the graph induced by  WW.

The optimization problem minf∈RmR(f),m=u+l\min _{f\in \mathbb {R} ^{m}}R(f),m=u+l can be solved analytically if the constraint f(xi)=yif(x_{i})=y_{i} is applied for all supervised samples. The labeled part of the vector ff is therefore obvious. The unlabeled part of ff is solved for by:

minfu∈RufTLf=minfu∈Ru{fuTLuufu+flTLlufu+fuTLulfl}\min _{f_{u}\in \mathbb {R} ^{u}}f^{T}Lf=\min _{f_{u}\in \mathbb {R} ^{u}}\{f_{u}^{T}L_{uu}f_{u}+f_{l}^{T}L_{lu}f_{u}+f_{u}^{T}L_{ul}f_{l}\}
∇fu=2Luufu+2LulY\nabla _{f_{u}}=2L_{uu}f_{u}+2L_{ul}Y
fu=Luu†(LulY)f_{u}=L_{uu}^{\dagger }(L_{ul}Y)

Note that the pseudo-inverse can be taken because Lul{\displaystyle L_{ul}} has the same range as LuuL_{​{uu}}.

Regularizers for multitask learning[edit]

Main article:  Multi-task learning

In the case of multitask learning, TT problems are considered simultaneously, each related in some way. The goal is to learn TT functions, ideally borrowing strength from the relatedness of tasks, that have predictive power. This is equivalent to learning the matrix W:T×DW:T\times D .

Sparse regularizer on columns[edit]

R(w)=∑i=1D∥W∥2,1R(w)=\sum _{i=1}^{D}\|W\|_{2,1}

This regularizer defines an L2 norm on each column and an L1 norm over all columns. It can be solved by proximal methods.

Nuclear norm regularization[edit]

R(w)=∥σ(W)∥1R(w)=\|\sigma (W)\|_{1} where  σ(W)\sigma (W) is the eigenvalues in the  singular value decomposition of  WW.

Mean-constrained regularization[edit]

R(f1⋯fT)=∑t=1T∥ft−1T∑s=1Tfs∥Hk2{\displaystyle R(f_{1}\cdots f_{T})=\sum _{t=1}^{T}\|f_{t}-{\frac {1}{T}}\sum _{s=1}^{T}f_{s}\|_{H_{k}}^{2}}

This regularizer constrains the functions learned for each task to be similar to the overall average of the functions across all tasks. This is useful for expressing prior information that each task is expected to share similarities with each other task. An example is predicting blood iron levels measured at different times of the day, where each task represents a different person.

Clustered mean-constrained regularization[edit]

R(f1⋯fT)=∑r=1C∑t∈I(r)∥ft−1I(r)∑s∈I(r)fs∥Hk2{\displaystyle R(f_{1}\cdots f_{T})=\sum _{r=1}^{C}\sum _{t\in I(r)}\|f_{t}-{\frac {1}{I(r)}}\sum _{s\in I(r)}f_{s}\|_{H_{k}}^{2}} where  I(r)I(r) is a cluster of tasks.

This regularizer is similar to the mean-constrained regularizer, but instead enforces similarity between tasks within the same cluster. This can capture more complex prior information. This technique has been used to predict Netflix recommendations. A cluster would correspond to a group of people who share similar preferences in movies.

Graph-based similarity[edit]

More general than above, similarity between tasks can be defined by a function. The regularizer encourages the model to learn similar functions for similar tasks.

R(f1⋯fT)=∑t,s=1,t≠sT∥ft−fs∥2Mts{\displaystyle R(f_{1}\cdots f_{T})=\sum _{t,s=1,t\neq s}^{T}\|f_{t}-f_{s}\|^{2}M_{ts}} for a given symmetric similarity matrix  MM.

Other uses of regularization in statistics and machine learning[edit]

Bayesian learning methods make use of a prior probability that (usually) gives lower probability to more complex models. Well-known model selection techniques include the Akaike information criterion (AIC), minimum description length (MDL), and the Bayesian information criterion (BIC). Alternative methods of controlling overfitting not involving regularization include cross-validation.

Examples of applications of different methods of regularization to the linear model are:

Model Fit measure Entropy measure[1][3]
AIC/BIC ∥Y−Xβ∥2\|Y-X\beta \|_{2} ∥β∥0\|\beta \|_{0}
Ridge regression[4] ∥Y−Xβ∥2\|Y-X\beta \|_{2} ∥β∥2\|\beta \|_{2}
Lasso[5] ∥Y−Xβ∥2\|Y-X\beta \|_{2} ∥β∥1\|\beta \|_{1}
Basis pursuit denoising ∥Y−Xβ∥2\|Y-X\beta \|_{2} λ∥β∥1\lambda \|\beta \|_{1}
Rudin–Osher–Fatemi model (TV) ∥Y−Xβ∥2\|Y-X\beta \|_{2} λ∥∇β∥1\lambda \|\nabla \beta \|_{1}
Potts model ∥Y−Xβ∥2\|Y-X\beta \|_{2} λ∥∇β∥0\lambda \|\nabla \beta \|_{0}
RLAD[6] ∥Y−Xβ∥1\|Y-X\beta \|_{1} ∥β∥1\|\beta \|_{1}
Dantzig Selector[7] ∥X⊤(Y−Xβ)∥∞\|X^{\top }(Y-X\beta )\|_{\infty } ∥β∥1\|\beta \|_{1}
SLOPE[8] ∥Y−Xβ∥2\|Y-X\beta \|_{2} ∑i=1pλi|β|(i)\sum _{i=1}^{p}\lambda _{i}|\beta |_{(i)}

See also[edit]

Notes[edit]

  1. Jump up to: a b Bishop, Christopher M. (2007). Pattern recognition and machine learning (Corr. printing. ed.). New York: Springer. ISBN 978-0387310732.
  2. Jump up ^ Natarajan, B. (1995-04-01). "Sparse Approximate Solutions to Linear Systems"SIAM Journal on Computing24 (2): 227–234. doi:10.1137/S0097539792240406ISSN 0097-5397.
  3. Jump up ^ Duda, Richard O. (2004). Pattern classification + computer manual : hardcover set (2. ed.). New York [u.a.]: Wiley. ISBN 978-0471703501.
  4. Jump up ^ Arthur E. Hoerl; Robert W. Kennard (1970). "Ridge regression: Biased estimation for nonorthogonal problems". Technometrics12 (1): 55–67. doi:10.2307/1267351.
  5. Jump up ^ Tibshirani, Robert (1996). "Regression Shrinkage and Selection via the Lasso" (PostScript)Journal of the Royal Statistical Society, Series B58 (1): 267–288. MR 1379242. Retrieved 2009-03-19.
  6. Jump up ^ Li Wang, Michael D. Gordon & Ji Zhu (2006). "Regularized Least Absolute Deviations Regression and an Efficient Algorithm for Parameter Tuning". Sixth International Conference on Data Mining. pp. 690–700. doi:10.1109/ICDM.2006.134.
  7. Jump up ^ Candes, EmmanuelTao, Terence (2007). "The Dantzig selector: Statistical estimation when p is much larger than n". Annals of Statistics35 (6): 2313–2351. arXiv:math/0506081Freely accessibledoi:10.1214/009053606000001523MR 2382644.
  8. Jump up ^ Małgorzata Bogdan, Ewout van den Berg, Weijie Su & Emmanuel J. Candes (2013). "Statistical estimation and testing via the ordered L1 norm". arXiv:1310.1969Freely accessible.

References[edit]


版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/ThinkingDou/article/details/68922050

智能推荐

获取大于等于一个整数的最小2次幂算法(HashMap#tableSizeFor)_整数 最小的2的几次方-程序员宅基地

文章浏览阅读2w次,点赞51次,收藏33次。一、需求给定一个整数,返回大于等于该整数的最小2次幂(2的乘方)。例: 输入 输出 -1 1 1 1 3 4 9 16 15 16二、分析当遇到这个需求的时候,我们可能会很容易想到一个"笨"办法:..._整数 最小的2的几次方

Linux 中 ss 命令的使用实例_ss@,,x,, 0-程序员宅基地

文章浏览阅读865次。选项,以防止命令将 IP 地址解析为主机名。如果只想在命令的输出中显示 unix套接字 连接,可以使用。不带任何选项,用来显示已建立连接的所有套接字的列表。如果只想在命令的输出中显示 tcp 连接,可以使用。如果只想在命令的输出中显示 udp 连接,可以使用。如果不想将ip地址解析为主机名称,可以使用。如果要取消命令输出中的标题行,可以使用。如果只想显示被侦听的套接字,可以使用。如果只想显示ipv4侦听的,可以使用。如果只想显示ipv6侦听的,可以使用。_ss@,,x,, 0

conda activate qiuqiu出现不存在activate_commandnotfounderror: 'activate-程序员宅基地

文章浏览阅读568次。CommandNotFoundError: 'activate'_commandnotfounderror: 'activate

Kafka 实战 - Windows10安装Kafka_win10安装部署kafka-程序员宅基地

文章浏览阅读426次,点赞10次,收藏19次。完成以上步骤后,您已在 Windows 10 上成功安装并验证了 Apache Kafka。在生产环境中,通常会将 Kafka 与外部 ZooKeeper 集群配合使用,并考虑配置安全、监控、持久化存储等高级特性。在生产者窗口中输入一些文本消息,然后按 Enter 发送。ZooKeeper 会在新窗口中运行。在另一个命令提示符窗口中,同样切换到 Kafka 的。Kafka 服务器将在新窗口中运行。在新的命令提示符窗口中,切换到 Kafka 的。,应显示已安装的 Java 版本信息。_win10安装部署kafka

【愚公系列】2023年12月 WEBGL专题-缓冲区对象_js 缓冲数据 new float32array-程序员宅基地

文章浏览阅读1.4w次。缓冲区对象(Buffer Object)是在OpenGL中用于存储和管理数据的一种机制。缓冲区对象可以存储各种类型的数据,例如顶点、纹理坐标、颜色等。在渲染过程中,缓冲区对象中存储的数据可以被复制到渲染管线的不同阶段中,例如顶点着色器、几何着色器和片段着色器等,以完成渲染操作。相比传统的CPU访问内存,缓冲区对象的数据存储和管理更加高效,能够提高OpenGL应用的性能表现。_js 缓冲数据 new float32array

四、数学建模之图与网络模型_图论与网络优化数学建模-程序员宅基地

文章浏览阅读912次。(1)图(Graph):图是数学和计算机科学中的一个抽象概念,它由一组节点(顶点)和连接这些节点的边组成。图可以是有向的(有方向的,边有箭头表示方向)或无向的(没有方向的,边没有箭头表示方向)。图用于表示各种关系,如社交网络、电路、地图、组织结构等。(2)网络(Network):网络是一个更广泛的概念,可以包括各种不同类型的连接元素,不仅仅是图中的节点和边。网络可以包括节点、边、连接线、路由器、服务器、通信协议等多种组成部分。网络的概念在各个领域都有应用,包括计算机网络、社交网络、电力网络、交通网络等。_图论与网络优化数学建模

随便推点

android 加载布局状态封装_adnroid加载数据转圈封装全屏转圈封装-程序员宅基地

文章浏览阅读1.5k次。我们经常会碰见 正在加载中,加载出错, “暂无商品”等一系列的相似的布局,因为我们有很多请求网络数据的页面,我们不可能每一个页面都写几个“正在加载中”等布局吧,这时候将这些状态的布局封装在一起就很有必要了。我们可以将这些封装为一个自定布局,然后每次操作该自定义类的方法就行了。 首先一般来说,从服务器拉去数据之前都是“正在加载”页面, 加载成功之后“正在加载”页面消失,展示数据;如果加载失败,就展示_adnroid加载数据转圈封装全屏转圈封装

阿里云服务器(Alibaba Cloud Linux 3)安装部署Mysql8-程序员宅基地

文章浏览阅读1.6k次,点赞23次,收藏29次。PS: 如果执行sudo grep 'temporary password' /var/log/mysqld.log 后没有报错,也没有任何结果显示,说明默认密码为空,可以直接进行下一步(后面设置密码时直接填写新密码就行)。3.(可选)当操作系统为Alibaba Cloud Linux 3时,执行如下命令,安装MySQL所需的库文件。下面示例中,将创建新的MySQL账号,用于远程访问MySQL。2.依次运行以下命令,创建远程登录MySQL的账号,并允许远程主机使用该账号访问MySQL。_alibaba cloud linux 3

excel离散度图表怎么算_excel离散数据表格-Excel 离散程度分析图表如何做-程序员宅基地

文章浏览阅读7.8k次。EXCEL中数据如何做离散性分析纠错。离散不是均值抄AVEDEV……=AVEDEV(A1:A100)算出来的是A1:A100的平均数。离散是指各项目间指标袭的离散均值(各数值的波动情况),数值较低表明项目间各指标波动幅百度小,数值高表明波动幅度较大。可以用excel中的离散公式为STDEV.P(即各指标平均离散)算出最终度离散度。excel表格函数求一组离散型数据,例如,几组C25的...用exc..._excel数据分析离散

学生时期学习资源同步-JavaSE理论知识-程序员宅基地

文章浏览阅读406次,点赞7次,收藏8次。i < 5){ //第3行。int count;System.out.println ("危险!System.out.println(”真”);System.out.println(”假”);System.out.print(“姓名:”);System.out.println("无匹配");System.out.println ("安全");

linux 性能测试磁盘状态监测:iostat监控学习,包含/proc/diskstats、/proc/stat简单了解-程序员宅基地

文章浏览阅读3.6k次。背景测试到性能、压力时,经常需要查看磁盘、网络、内存、cpu的性能值这里简单介绍下各个指标的含义一般磁盘比较关注的就是磁盘的iops,读写速度以及%util(看磁盘是否忙碌)CPU一般比较关注,idle 空闲,有时候也查看wait (如果wait特别大往往是io这边已经达到了瓶颈)iostatiostat uses the files below to create ..._/proc/diskstat

glReadPixels读取保存图片全黑_glreadpixels 全黑-程序员宅基地

文章浏览阅读2.4k次。问题:在Android上使用 glReadPixel 读取当前渲染数据,在若干机型(华为P9以及魅族某魅蓝手机)上读取数据失败,glGetError()没有抓到错误,但是获取到的数据有误,如果将获取到的数据保存成为图片,得到的图片为黑色。解决方法:glReadPixels实际上是从缓冲区中读取数据,如果使用了双缓冲区,则默认是从正在显示的缓冲(即前缓冲)中读取,而绘制工作是默认绘制到后缓..._glreadpixels 全黑