splunk 索引过程-程序员宅基地

技术标签: ViewUI  前端  大数据  

术语

Event :Events are records of activity in log files, stored in Splunk indexes. 简单说,处理的日志或话单中中一行记录就是一个Event;
Source type: 来源类型,identifies the format of the data,简单说,一种特定格式的日志,可以定义为一种source type;Splunk默认提供有500多种确定格式数据的type,包括apache log、常见OS的日志、Cisco等网络设备的日志等;
Index: The index is the repository for Splunk Enterprise data. Splunk transforms incoming data into events, which it stores in indexes. 有两层含义:一是数据物理存储上的表达,也是一个数据处理的动作表达:Splunk indexes your data,这个过程会产生两类数据:
The raw data in compressed form (rawdata)
Indexes that point to the raw data, plus some metadata files (index files)
Indexer: An indexer is a Splunk Enterprise instance that indexes data. 通常说的索引概念,也是对Splunk中“Indexer”这个特定模块的称呼,是一种Splunk Enterprise Instance;
Bucket: Index储存的两类数据按照age组织为不同的目录,称为buckets;

职责——具体再见后文图

Search Head:前端搜索;
Deployment Server:相当于配置管理中心,对其它节点统一管理;

Forwarder:负责收集、预处理和前转数据至Indexer(consume data and forward it on to indexers),配合构成类似Flume的Agent和Collector的机制;动作包括:
· Tagging of metadata (source, sourcetype, and host)
· Configurable buffering
· Data compression
· SSL security
· Use of any available network ports
· Running scripted inputs locally

注意:转发器可以传输三种类型的数据:原始、未解析、已解析。转发器可以发送的数据类型取决于转发器类型以及配置方式。通用转发器和轻型转发器可以发送原始或未解析

的数据。重型转发器可以发送原始或解析的数据。

Indexer:负责对数据“索引化”处理,即indexing process,也可称为event processing;包括:
· Separating the datastream into individual, searchable events.(分行)
· Creating or identifying timestamps. (识别时间戳)
· Extracting fields such as host, source, and sourcetype. (外置公共字段处理)
· Performing user-defined actions on the incoming data, such as identifying custom fields, masking sensitive data, writing new or modified keys, applying breaking rules for multi-line events, filtering unwanted events, and routing events to specified indexes or servers.

 

Parts of an indexer cluster——分布式部署

An indexer cluster is a group of Splunk Enterprise instances, or nodes, that, working in concert, provide a redundant indexing and searching capability. Each cluster has three types of nodes:

  •  A single master node to manage the cluster.
  •  Several to many peer nodes to index and maintain multiple copies of the data and to search the data.
  •  One or more search heads to coordinate searches across the set of peer nodes.

The master node manages the cluster. It coordinates the replicating activities of the peer nodes and tells the search head where to find data. It also helps manage the configuration of peer nodes and orchestrates remedial activities if a peer goes down.

The peer nodes receive and index incoming data, just like non-clustered, stand-alone indexers. Unlike stand-alone indexers, however, peer nodes also replicate data from other nodes in the cluster. A peer node can index its own incoming data while simultaneously storing copies of data from other nodes. You must have at least as many peer nodes as the replication factor. That is, to support a replication factor of 3, you need a minimum of three peer nodes.

The search head runs searches across the set of peer nodes. You must use a search head to manage searches across indexer clusters.——将搜索请求发给indexer节点,然后合并搜索请求

For most purposes, it is recommended that you use forwarders to get data into the cluster.

Here is a diagram of a basic, single-site indexer cluster, containing three peer nodes and supporting a replication factor of 3:

Simplified basic cluster 60.png

This diagram shows a simple deployment, similar to a small-scale non-clustered deployment, with some forwarders sending load-balanced data to a group of indexers (peer nodes), and the indexers sending search results to a search head. There are two additions that you don't find in a non-clustered deployment:

  •  The indexers are streaming copies of their data to other indexers.
  •  The master node, while it doesn't participate in any data streaming, coordinates a range of activities involving the search peers and the search head.

How indexing works

Splunk Enterprise can index any type of time-series data (data with timestamps). When Splunk Enterprise indexes data, it breaks it into events, based on the timestamps.

Event processing

Event processing occurs in two stages, parsing and indexing. All data that comes into Splunk Enterprise enters through the parsing pipeline as large (10,000 bytes) chunks. During parsing, Splunk Enterprise breaks these chunks into events which it hands off to the indexing pipeline, where final processing occurs.

While parsing, Splunk Enterprise performs a number of actions, including:

  •  Extracting a set of default fields for each event, including hostsource, and sourcetype.
  •  Configuring character set encoding.
  •  Identifying line termination using linebreaking rules. While many events are short and only take up a line or two, others can be long.
  •  Identifying timestamps or creating them if they don't exist. At the same time that it processes timestamps, Splunk identifies event boundaries.
  •  Splunk can be set up to mask sensitive event data (such as credit card or social security numbers) at this stage. It can also be configured toapply custom metadata to incoming events.

In the indexing pipeline, Splunk Enterprise performs additional processing, including:

  •  Breaking all events into segments that can then be searched upon. You can determine the level of segmentation, which affects indexing and searching speed, search capability, and efficiency of disk compression.
  •  Building the index data structures.
  •  Writing the raw data and index files to disk, where post-indexing compression occurs.

The breakdown between parsing and indexing pipelines is of relevance mainly when deploying forwardersHeavy forwarders can parse data and then forward the parsed data on to indexers for final indexing. Some source types - those that reference structured data - require configuration on the forwarder prior to indexing. See "Extract data from files with headers".

For more information about events and what happens to them during the indexing process, see the chapter "Configure event processing" in the Getting Data In Manual.

Note: Indexing is an I/O-intensive process.

This diagram shows the main processes inherent in indexing:

Datapipeline 60.png

Note: This diagram represents a simplified view of the indexing architecture. It provides a functional view of the architecture and does not fully describe Splunk Enterprise internals. In particular, the parsing pipeline actually consists of three pipelines: parsingmerging, and typing, which together handle the parsing function. The distinction can matter during troubleshooting, but does not generally affect how you configure or deploy Splunk Enterprise.

How indexer acknowledgment works

In brief, indexer acknowledgment works like this: The forwarder sends data continuously to the receiving peer, in blocks of approximately 64kB. The forwarder maintains a copy of each block in memory until it gets an acknowledgment from the peer. While waiting, it continues to send more data blocks.

If all goes well, the receiving peer:

1. receives the block of data, parses and indexes it, and writes the data (raw data and index data) to the file system.

2. streams copies of the raw data to each of its target peers.

3. sends an acknowledgment back to the forwarder.

The acknowledgment assures the forwarder that the data was successfully written to the cluster. Upon receiving the acknowledgment, the forwarder releases the block from memory.

If the forwarder does not receive the acknowledgment, that means there was a failure along the way. Either the receiving peer went down or that peer was unable to contact its set of target peers. The forwarder then automatically resends the block of data. If the forwarder is using load-balancing, it sends the block to another receiving node in the load-balanced group. If the forwarder is not set up for load-balancing, it attempts to resend data to the same node as before.

Important: To ensure end-to-end data fidelity, you must explicitly enable indexer acknowledgment for each forwarder that's sending data to the cluster, as described earlier in this topic. If end-to-end data fidelity is not a requirement for your deployment, you can skip this step.

For more information on how indexer acknowledgment works, read "Protect against loss of in-flight data" in the Forwarding Data manual.

 

转载于:https://www.cnblogs.com/bonelee/p/6233868.html

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/djph26741/article/details/101521455

智能推荐

记一次mariadb安装后无法启动的解决方法_mariadb.service: main process exited, code=exited,-程序员宅基地

文章浏览阅读3.6k次。配置mariadb.service后,通过systemctl start mariadb,提示异常。操作系统:CentOS Linux release 7.6.1810 (Core)1、检查/usr/local/mysql/bin/mariadbd 权限。执行 journalctl -xe 后提示 没有权限。发现/opt/ 下 soft 目录没有其他人的权限。若提示没有mysql,检查下mysql环境变量。给 /opt/soft添加 其他人权限。查看mariadb的服务状态已正常。mariadb安装目录。_mariadb.service: main process exited, code=exited, status=1/failure

时间序列预测 | Matlab基于麻雀算法(SSA)优化门控循环单元(SSA-GRU)的时间序列预测-程序员宅基地

文章浏览阅读27次。时间序列预测 | Matlab基于麻雀算法(SSA)优化门控循环单元(SSA-GRU)的时间序列预测

CWDM、DWDM、FWDM、MWDM、LWDM概述讲解_tdm cwdm lwdm mwdm-程序员宅基地

文章浏览阅读274次。与CWDM类似,它也是一种基于波分复用技术的光纤传输技术,但它的通道数更多、信道间隔更小,可以实现更高的传输容量。它是一种基于波分复用技术的光纤传输技术,可将来自不同来源的多个光信号,按照不同的波长,通过同一根光纤进行传输。顾名思义,它是一种基于滤波器的波分复用技术,通过精细的滤波器设计,将不同波长的光信号分离出来进行传输。它与CWDM和DWDM类似,但波长范围更偏向于长波长光信号的传输,适用于某些特定的光纤传输应用场景。_tdm cwdm lwdm mwdm

线上打包和本地打包的区别_vue项目 linux打包和本地打包有区别嘛-程序员宅基地

文章浏览阅读3.3k次。本地打的包对依赖的jar包是从本地仓库中取,所以如果多模块项目中自己写的被依赖的模块要保证私服中始终是最新的代码,及时安装到本地。线上使用jenkins打的包是从私服上拉代码,所以要保证本地修改在打包前一定要提交到私服上。jenkins打的包和本地不一样是,考虑以下方面,看看服务器时间是否不一样,或者在jenkins的代码拉取是加上@head。..._vue项目 linux打包和本地打包有区别嘛

EOS的BFT与DPOS-程序员宅基地

文章浏览阅读80次。EOS的共识机制是DPoS(Delegated Proof of Stake),而不是BFT。EOS采用DPoS的设计,它侧重于代币持有者的投票,代表人轮流生成区块,而不同于传统的BFT机制。虽然EOS的DPoS也旨在确保网络的安全性和一致性,但它的工作方式和设计思路与BFT不同。BFT共识机制是一种用于分布式系统的共识算法,可以确保在存在故障或恶意行为的情况下系统仍能达成一致的决策。总之,EOS的共识机制是DPoS,而不是传统的BFT,这两者在设计和工作原理上有很大的不同。

[力扣c++实现] 221. 最大正方形_最大正方形c++-程序员宅基地

文章浏览阅读527次。在一个由 ‘0’ 和 ‘1’ 组成的二维矩阵内,找到只包含 ‘1’ 的最大正方形,并返回其面积。_最大正方形c++

随便推点

VUE 富文本编辑器 tinymce - - - - 对懒人 非常友好 一看就懂_vue富文本编辑器-程序员宅基地

文章浏览阅读9.8k次,点赞19次,收藏89次。vue 富文本 编辑器 tinymce 对懒人 及其友好,五分钟使用 功能齐全,可自定义性高,尤其是美观_vue富文本编辑器

snownlp:自定义训练样本与模型保存_snownlp如何加载训练模型-程序员宅基地

文章浏览阅读1.5w次,点赞32次,收藏193次。本文介绍了snownlp包的情感分析模型训练、保存以及如何使用自己训练的模型,从文件结构、源码设置等角度进行了描述。按照此方法,可以轻松玩转snownlp的情感分析。_snownlp如何加载训练模型

python 通用文件_python 创建通用文档-程序员宅基地

文章浏览阅读165次。from numpy.random import randnimport numpy as npnp.random.seed(123)import osimport matplotlib.pyplot as pltimport pandas as pdplt.rc('figure', figsize=(10, 6))np.set_printoptions(precision=4)pd.option..._python 创建通用文档

c语言 求特殊方程的正整数解_求特殊方程的正整数解c语言-程序员宅基地

文章浏览阅读3.3k次,点赞4次,收藏11次。*题目:  要求对任意给定的正整数 N,求方程 X2+Y2=NX2+Y2=NX2+Y2=N 的全部正整数解。输入格式:  输入在一行中给出正整数 N(≤10000)。输出格式:  输出方程 X2+Y2=NX2+Y2=NX2+Y2=N 的全部正整数解,其中 X≤Y。每组解占 1 行,两数字间以 1 空格分隔,按 X 的递增顺序输出。如果没有解,则输出 No Solution..._求特殊方程的正整数解c语言

2023年3月青少年C/C++软件编程(五级)等级考试试卷及答案解析_青少年软件编程c++五级真题-程序员宅基地

文章浏览阅读454次。2023年3月青少年C/C++软件编程(五级)等级考试试卷及答案解析_青少年软件编程c++五级真题

c语言和c++的相互调用_c调用c++-程序员宅基地

文章浏览阅读1.9w次,点赞13次,收藏102次。在实际项目开发中,c和c++代码的相互调用是常见的,c++能够兼容c语言的编译方式,但是c++编译器g++默认会以c++的方式编译程序,而c程序编译器gcc会默认以c的方式编译它,所以c和c++的相互调用存在一定的技巧。1.c方式编译和c++方式编译一般.cpp文件是采用g++去编译,.c文件是采用gcc编译,然而这不是绝对的。 (1)gcc和g++都可以编译.c文件,也都可以编译.cpp文件。g_c调用c++

推荐文章

热门文章

相关标签