Acronyms SVA β Splunk Validated Architecture Links https://www.aplura.com/splunk-best-practices/ https://www.splunk.com/en_us/blog/learn/splunk-cheat-sheet-query-spl-regex-commands.html?locale=en_us https://crontab.guru/ Splunk Enterprise Components Processing Search Head β The Search Head is a Splunk component that allows users to search, visualize, and analyze indexed data through Splunk Web or CLI. Ex. A DevOps engineer uses the Search Head to search for HTTP 500 errors across multiple web servers. Forwarders β Forwarders are lightweight Splunk agents installed on source machines to send data to indexers. Forwarders forward data from one Splunk component to another From source system to indexer From source system to a search head directly Universal Forwarder (UF): Light agent for forwarding raw data Heavy Forwarder (HF): Capable of parsing and indexing before forwarding Ex. A UF on a Linux server forwards syslog data to a centralized Splunk indexer. Indexer β An Indexer is a core Splunk component server that receives, processes, and stores data so it can be searched. Parses and indexes incoming data Creates and maintains searchable indexes Executes distributed search commands sent from Search Heads Ex. An indexer ingests and stores firewall logs received from multiple forwarders for future analysis. Clustered indexes are known as Peer Nodes Management Indexer Cluster Manager (Master Node) β A control node in a Splunk Indexer Cluster responsible for managing and coordinating peer nodes to ensure high availability, data replication, and indexing integrity. Oversees the entire indexer cluster Assigns data replication and indexing tasks Tracks the health/status of peer nodes and indexes Handles cluster configuration and bucket fixups Ex. In a 5-node indexer cluster, the manager ensures every event is stored on at least 3 nodes (Replication Factor = 3) and searchable from 2 (Search Factor = 2). Indexer Cluster Node (Peer Node) β An indexer node that participates in an indexer cluster by storing, replicating, and serving indexed data under the coordination of the Cluster Manager. Receives and indexes incoming data Stores primary or replicated bucket copies Responds to search head queries in distributed searches Periodically checks in with the Cluster Manager Ex. A peer node indexes a copy of web server logs and replicates them to 2 other nodes as assigned by the Cluster Manager. Monitoring Console (MC) β A centralized dashboard to monitor the health, performance, and status of a Splunk deployment. Prebuilt dashboards for deployment health Tracks indexing rate, latency, and resource usage Can be set up on a Search Head or standalone node Ex. An admin uses the Monitoring Console to identify that one indexer is lagging in bucket replication. Deployment Server β A Splunk component used to manage configuration files and app deployments across multiple forwarders. Centralized configuration manager Manages deployment apps for clients Pushes configuration updates to forwarders Uses server classes to target specific clients Centralized control over distributed agents Ex. A Deployment Server pushes a new TA (technology add-on) to 100 UFs for new log collection settings. License Manager β The component that enforces licensing limits and tracks daily indexed data volume. Centralized License Manager License Slaves β Clients Manages License Pools & Stacks Monitors indexed data volume per day Alerts when license limits are exceeded Required for all Splunk deployments Ex. The License Manager alerts the admin when daily indexing exceeds 50 GB on a free-tier license. Deployer β A management node responsible for distributing configurations and apps to Search Head Cluster members. Search Head Cluster Deployer β Manages baselines & apps for search head cluster members This is how splunk scales Ex. A deployer pushes updated search macros to all three nodes in a Search Head Cluster. Deployer Configuration # Deployer Conf # $SPLUNK_HOME/etc/system/local/servers.conf [shclustering] # Define cluster master role (for deployer) cluster_master = true # Specify the security key for communication secret = MySecureClusterKey # Define the label for the cluster cluster_label = my_sh_cluster Initialize Cluster Members sudo ./splunk init shcluster-config -auth <username>:password -mgmt-uri <URI>:<management_port> -replication_port <port> -replication_factor <n> -conf_deploy_fetch_url <URL>:<management_port> -secret <security_key> -shcluster_label <label> Bootstrap the Captain ./splunk bootstrap shcluster-captain -server_list "<URI>:<management_port>,..>" -auth <username>:<password> Data Pipeline Input β Forwarders have the data β Data = Streams Input Types β Files & Directories, Network Traffic, Log Files, HEC Source β Path of the Data, Method to collect the data Ex. access.log, secure.log Host β Who sent the data Ex. web1 Source Type β Data format Ex. cisco:wsa:squid, streams:http Functions β Upload, Monitor, TCP/UDP, Scripted, FIFO General Input Categories: File & Directory Inputs: Monitor files & directories β Locally & Remotely Can monitor compressed files β Splunk uncompresses the files before processing them. Upload to examine using Splunk Used for one-time analysisi MonitorNoHandle β Available for Windows hosts only Monitors files & directories that system rotates automatically Network Inputs Data from TCP & UDP syslog Data from SNMP events Windows Inputs Windows event logs Registry Active Directory WMI β Windows Management Instrumentation β allows access to system management information, such as hardware and operating system data Performance Monitoring (perfmon) Other Data Sources Metrics FIFO queues Scripted inputs β APIs, Data Message Queues Modular Inputs β Allow to ingest unique data inputs like querying a specific database or handling sensitive or complex data β Used for masking data for security purposes HTTP Event Collector Ways to configure inputs: Through an app β Many apps have preconfigured inputs Splunk Web β Settings > data inputs, Settings > add data CLI β ./splunk add monitor <path> inputs.conf β Add a stanza for each input Guided Data Onboarding (GDO) β Data input wizardΖ Parsing β Processing of Data Source & event typing Character set normalization Line termination identification Timestamp Identification Regex transforms, parse-time field creation License Usage β License meter check Indexing β Data is written to Disk β Data = Compressed Segmentation Index building Search Knowledge Objects Knowledge Objects are configurations within Splunk that help define, categorize, and enhance the way data is searched, interpreted, and visualized. They help organize, categorize, and enrich event data, enabling users to perform more meaningful searches and visualizations. Ex. Field Extractions, Event Types, Tags, Lookups, Macros, Data Models, Fields Permissions: Private β Only the person who created the object can use or edit it This app Only β Objects persists in the context of a specific app All Apps β Objects persists globally across all apps Splunk Syntax & Colors Orange β Command Modifiers Ex. OR, NOT, AND, as, by Blue β Commands Ex. stats, table, rename, dedup, sort, timechart Green β Arguments Ex. limit, span Purple β Functions tostring, sum, values, min, max, avg Transform top β Finds the top common values of a field in a table Top 10 results by defaults Can use with arguments rare β Finds the least common values of a field in a table Opposite of top stats β Calculates statistics Transaction refers to a set of events that belong together logically. Transactions in Splunk allow you to define a set of related events based on the time span and logical grouping of the events. Ex. - A user logs into a web application and navigates through several pages. You can use the transaction command to group all the events related to the userβs session. All events in a transaction must be related by one or more fields. maxspan β Defines the maximum time span between the first and last event of a transaction. If the time between events exceeds this threshold, the transaction is ended. maxpause β Defines the maximum pause (in seconds) allowed between consecutive events in a transaction. If the gap between two events exceeds this time, the transaction is split. startswith β Specifies the starting event for a transaction. This is typically a condition or field that matches the beginning of the transaction. endswith β Specifies the ending event for a transaction. This defines the condition or field that marks the completion of the transaction. Fields created by Transaction duration β Time difference (in seconds) between the first and last event in the transaction. eventcount β Number of events combined into this transaction firstTime / _time (alias) β Timestamp of the first event in the transaction. lastTime β Timestamp of the last event in the transaction. Optional fields based on parameters (not automatically created unless used): maxspan β Maximum allowed duration of a transaction. maxpause β Maximum allowed gap between events in the transaction. REGEX rex β A Splunk command used to extract fields from event data using regular expressions (regex) rex field=_raw "src_ip=(?<source_ip>\d+\.\d+\.\d+\.\d+)" erex β A Splunk command used to automatically generate regex patterns based on example values you provide, simplifying field extraction for users who may not be familiar with regex. erex field=_raw "source_ip" examples="192.168.1.1, 10.0.0.5" Require option β only events with the highlighted string are included in the extraction. Lookup Lookups are used to map values from Splunkβs internal data (e.g., IP addresses, hostnames, user IDs) to external datasets (e.g., geo-location, user directories, or threat intelligence feeds). Visualizations stats β Aggregates data using statistical operations (like sum, avg, count, etc.) and produces tabular results that can later be visualized. timechart β A specialized version of stats designed to aggregate data over time, producing time series visualizations. Plots and trends data over time. β’ _time is always the x-axis chart β Similar to stats but groups the results into columns or bars based on one or more fields, making it suitable for creating different types of charts like bar or column charts. Includes both an over & by clause to divide results into sub-groupings iplocation β This command is used to enrich IP addresses with geolocation data, such as country, city, and latitude/longitude. index=web_logs | iplocation src_ip geostats β This command aggregates data based on geographic locations (e.g., coordinates or country), allowing you to visualize data on maps or based on geographic regions. index=web_logs | geostats count by country addtotals β This command is used to calculate row totals or column totals in a table and add them as new fields. index=sales_data | stats sum(amount) as total_sales by region | addtotals trendline β This command applies a linear regression model to your data to identify trends over time, making it useful for forecasting or identifying data patterns. index=web_logs | timechart span=1d count | trendline sma5(count) as smoothed_count sma β Simple Moving Average ema β Exponential Moving Average wma β Weighted Moving Average Reports ReportsΒ are saved searches. You can run reports on an ad hoc basis, schedule reports to run on a regular interval, or set a scheduled report to generate alerts when the results meet particular conditions. Reports can be added to dashboards as dashboard panels. Alerts Scheduled β Run at intervals Rolling Window β Conditions met over a continuous time period Evaluates the search condition over a continuously moving time window, Perfect for detecting thresholds over a specific time period Real Time β Designed to trigger immediately when the search condition is met, providing instant notification for critical events. Pre-Result β Triggers for every matching event Tags A tag in Splunk is a label or metadata that you assign to events, fields, or specific data to make it easier to find, categorize, and analyze. Tags provide a way to label events with keywords or descriptions, making it easier to search for related events across your data. If youβre working with web server logs and want to track failed login attempts, you might tag events with the βfailed_loginβ tag. index=web_logs "login failed" | eval tags="failed_login" | table _time, user, tags In this example, all events where the login failed will be tagged with βfailed_loginβ, allowing for easy tracking and search later. Types: Event Tags: Tags applied to specific events. For example, you can apply the tag βerrorβ to all events related to application errors. Field Tags: Tags applied to specific fields within the data. For example, you could tag IP addresses associated with certain network activity. Predefined Tags: Splunk comes with a few predefined tags for commonly used conditions, such as βerrorβ or βlogin_attemptβ, but you can define custom tags based on your specific needs. index=security_logs tag=failed_login Splunk tags are case-sensitive by default, so capitalization matters when defining and searching tags. Searching for Tags β tag::<field>=<tagname> Macros A macro in Splunk is a reusable search snippet that allows you to define and encapsulate parts of a search query. Macros enable the reuse of commonly used search logic, improving search efficiency and reducing redundancy in search queries. Ex. If you often search for events where the status code is 404 and the event type is βerror,β instead of writing the same search every time, you can define a macro to reuse it. index=web_logs sourcetype=apache_access status=404 event_type="error" Macro β index=web_logs search_error_404 Workflows A workflow in Splunk is a sequence of predefined actions or searches that automate the process of data analysis, visualization, and response, allowing users to streamline recurring tasks or operational processes. Ex. Suppose you want to monitor failed login attempts and alert the IT team if there are more than 5 failures in 10 minutes: index=security_logs sourcetype=login_attempts action=failure | stats count by user Steps Creating a GET Workflow Action Settings > Fields > Workflow actions > New Workflow Action Select the app Name the workflow action with no spaces or special characters (A) Define the label, which will appear in the Event Action menu (C) Determine if your workflow action applies to a field or event type From the Show action in dropdown list, select Event menu From Action type dropdown list, select link Enter the URI of where the user will be directed (B) Specify if the link should open in a New window or Current window Select the Link method of GET Save Data Normalization Field Alias β A field alias in Splunk is a mapping that allows one field name to be referenced by another name, making search queries more flexible and standardized without changing the original data. Calculated Field β A calculated field in Splunk is a new field created at search time based on a mathematical, logical, or string expression using existing fields in your events. Data Models A data model in Splunk is a structured, hierarchical representation of your data, optimized for pivoting, reporting, and accelerated searches. Hierarchical β Parent & Child relationship Database Searching β Select the specific datamodel and dataset you want to search Normalization Tool β CIM compliance β Data mapping to a model that fits that type of data Large Data Searches β Search larger amounts of data faster, with tstats and accelerated datamodels Serve as the foundation for Splunkβs CIM (Common Information Model) and apps like ITSI and ES. Data models make Splunk powerful for visual analysis and operational intelligence, enabling users to abstract complex SPL searches into reusable, structured representations of data. Data Models are composed of: Event Datasets β The simplest dataset type; contains raw events that match a constraint. Basic searches and pivoting. All_Web object with constraint sourcetype=access_combined. Search Datasets β Dataset based on a saved search. Reuse existing search logic within a data model. High_Error_Web_Requests dataset using search status>=500 Transaction Datasets β Dataset that groups related events into a transaction based on transaction rules. Track multi-event sessions or workflows. User_Sessions grouping events by user and session_id Child Datasets (Object Inheritance) β Datasets inheriting from a parent object with additional constraints or fields. Refine or specialize a dataset for pivot or reporting. Failed_Logins as a child of All_Authentication with constraint outcome=failure Syntax: | datamodel [data model name] [dataset name] [search mode {search, flat, accelerate_search}] You can create datasets for your data model by define their constraints and fields. Common Information Model A shared, standardized schema in Splunk that normalizes field names, event types, and data structures across multiple data sources for consistent analysis and correlation. CIM maps diverse data sources to a common set of field names and event categories. Suppose you have login events from multiple sources: web_logs: user_ip, status_code firewall_logs: src_ip, action Using CIM: Map user_ip & src_ip β src Map status_code & action β outcome Now, a CIM-compliant search can detect failed logins across all sources using standardized fields: authentication outcome=failure | stats count by src, user CIM is the Life or Death CIM Add-On β A CIM Add-on is a Splunk app or add-on that maps raw data from specific sources to the Common Information Model (CIM), ensuring the data is normalized and CIM-compliant for use in searches, dashboards, and apps like Enterprise Security (ES). Suppose you have multiple firewall logs with different field names for IP addresses: src_ip, sourceAddress, client_ip A CIM Add-on for Firewalls would: Map all these fields to the CIM-standard field src Define event types like network_traffic Enable Enterprise Security correlation searches to detect threats consistently The CIM Add-on Builder is a Splunk app that allows users to create their own CIM-compliant add-ons for custom or unsupported data sources. You have logs from a custom IoT sensor: Fields: deviceIP, sensorStatus, temperature Using CIM Add-on Builder: Map deviceIP β CIM src Map sensorStatus β CIM outcome Define event type: iot_event Package as CIM-compliant add-on for deployment Data Models included in CIM Add-on Alerts, Authentication, Certificates, Change, Compute Inventory, Configuration Management, Databases, Email, Endpoint, Inventory, Intrusion Detection, JVM (Java Virtual Machine), Malware, Network Resolution (DNS), Network Sessions, Network Traffic, Performance, Ticket Management, Updates, Vulnerabilities, Web Splunk Data Pipeline Input β Forwarded data, uploaded data, network data, scripts Parsing β Examines the data, adds metadata Indexing β Data divided into events. Write the data to the disk in βbucketsβ Searching β User interaction with the data Deployment Models Splunk Platform Splunk Cloud Splunk Enterprise Splunk Enterprise S1 Architecture β Single Server Deployment Splunk Enterprise S1 (Single Server) architecture refers to a deployment model where all core Splunk components (Indexing, Search, and Forwarding) are running on a single server. Single node deployment where Splunk handles indexing, search, and forwarder tasks on the same machine Non Critical Data Daily Ingest Limit β 500 GB Splunk Enterprise C1 Architecture β Distributed Environment for Single Site Splunk Enterprise C1 architecture refers to a distributed Splunk deployment where multiple components (Indexer, Search Head, and Forwarder) are deployed across different machines or nodes to ensure scalability, high availability, and data redundancy. Clustered Deployment where core Splunk components are split across multiple nodes to handle large volumes of data and user queries. Splunk Enterprise C3 Architecture β Distributed Clustered Environment with search head clustering for a single site Splunk Enterprise C3 architecture refers to a distributed and clustered Splunk deployment where Search Head Clustering (SHC) is used in a single site, along with Indexer Clustering, to provide scalability, redundancy, and high availability within a single data center or site. A single-site deployment that leverages distributed indexing and search head clustering to handle large data volumes with high availability, load balancing, and fault tolerance. Search Head Clustering (SHC) ensures that multiple search heads can share the same configuration, provide load balancing, and maintain high availability of search functions. Splunk Cloud Platform Splunk Cloud Platform is a fully managed, cloud-based version of Splunk Enterprise that provides the same powerful features as Splunk Enterprise but with a scalable, high-availability infrastructure hosted on cloud services (e.g., AWS, Google Cloud, Azure). It simplifies the deployment, management, and maintenance of Splunk, allowing organizations to focus on data analysis rather than infrastructure management. Data Storage Index A repository for Splunk data Splunk transforms incoming data into events and stores it in indexes Default Index β main Event β A single row of data made up of fields Fields β Key-Value Pairs Buckets Hot Bucket β Newly indexed data is stored. It is actively written to as data is ingested by Splunk. Only writable bucket Data in this bucket is being written to and is immediately available for searches. Data remains in Hot bucket until it reaches the size limit or its age surpasses the retention threshold. Searchable and accessible for real-time searches. Dir Path β $SPLUNK_HOME/var/lib/splunk/defaultdb/db/* Warm Bucket β - After a Hot bucket reaches its size or age limit, the data moves to the Warm bucket. The data in this bucket is no longer being written to, but it is still regularly queried. Searchable with slower access compared to Hot buckets. Dir Path β $SPLUNK_HOME/var/lib/splunk/defaultdb/db/* Cold Bucket β Contain older, infrequently accessed data that is no longer frequently searched but still needed for long-term retention or compliance. Data in Cold buckets is rarely queried but can be accessed when necessary. Data is compressed for storage optimization. Searchable, but access is slower compared to Hot and Warm buckets. Dir Path β $SPLUNK_HOME/var/lib/splunk/defaultdb/colddb/* Frozen Bucket β Contains data that has surpassed its retention period and is no longer needed for regular searches. Data in this bucket is either deleted or archived for long-term storage. Not searchable Dir Path β Location you specify Thawed Bucket β A Thawed bucket is a Frozen bucket that has been restored to a searchable state. This is typically done when thereβs a need to query historical data that has been archived. Dir Path β $SPLUNK_HOME/var/lib/splunk/defaultdb/thaweddb/* SmartStore SmartStore is a Splunk feature that optimizes data storage management by offloading indexed data to external object storage (e.g., Amazon S3, Google Cloud Storage, or Azure Blob Storage), while maintaining local indexes for fast searches. It combines the benefits of cloud storage scalability with Splunkβs local indexing performance, enhancing storage efficiency and search capabilities. Enterprise Licensing Volume Based β A licensing model where the cost of the license is determined by the amount of data indexed per day (measured in gigabytes per day, GB/day). Daily Indexing volume is measured from midnight to midnight by the clock on the license manager It is designed to give organizations the flexibility to scale their usage based on their data ingestion needs, rather than purchasing a fixed number of users or hardware. Infrastructure Based β refers to the licensing model that is based on the number of machines or hosts monitored by Splunk This model typically applies to Splunk Enterprise and Splunk Cloud when you want to license based on the infrastructure youβre managing. Anything that indexes data β Needs a license License Manager β Heavy Forwarders, Search Head, Indexers License Pooling β are created from License Stacks Pools are sized for specific purposes Managed by the license manager Ports Forwarder β 9997 Search Head β Indexer β 8089 Splunk Management β 8089 License Manager Distributed Search Splunk Deployment Server Splunk Web β 8000 Splunk HTTP Event Collector (HEC) β 8088 Splunk Monitoring Console β 8065 Splunk Enterprise Security β 8065 JMX Monitoring (Java) β 8191 Syslog β 514 License Types Standard β aka βStandardβ production license The full Splunk Enterprise license used for production; enables all Enterprise features and is measured/managed by your licensed capacity (volume/day or, in some offerings, workload/compute). Ex. A 200-GB/day Enterprise license allocated across multiple pools for different business units. Enterprise Trial β A built-in, time-limited trial that lets you use Enterprise features and index up to 500 MB/day for 60 days. After expiry, the instance moves to the Free group unless a new license is applied. Fresh install used to evaluate indexer clustering and RBAC under the 60-day/500 MB/day trial. Sales Trial β A trial license issued by Splunk Sales for evaluations larger/longer than the default trial; belongs to the Enterprise/Sales Trial license group. Ex. Extended POCs requiring more data or time than the default trial. Dev/Test β Personalized Dev/Test A no-cost, non-production Enterprise license for existing Splunk customers, typically 50 GB/day for six months, issued per user. Use Case β App upgrades, TA onboarding, configuration testing outside prod. Ex. QA team uses a 50 GB/day Dev/Test license for six months to validate new data sources. Free β A perpetual, standalone/single-instance license with limited features (no auth/multi-user, no distributed features) and a small daily ingest cap. Not for production; ingestion cap applies (varies by version). Ex. Standalone lab indexing small log samples under Free. Industrial IOT β Splunk for Industrial IoT / IAI A product-specific license for Splunk for Industrial IoT (legacy bundle including Splunk Enterprise + IAI + OPC add-on + MLTK) that must be applied to use IIoT features. IAI β Industrial Asset Intelligence Use case β Asset health monitoring, anomaly detection, predictive maintenance in OT. Ex. Apply the IIoT license in a plant monitoring deployment using OPC UA data via the OPC add-on. Forwarder β Forwarding License License for forwarders used to collect and send data; universal forwarder has a preinstalled forwarder license. Universal forwarder: forward-only, no local indexing, forwarder license auto-applied. Steps: Configure receiving on the Splunk Indexer Download & install the UF Start the UF Configure the UF to send data Configure the UF to collect data from the host system Heavy forwarder: if it parses/forwards only, Forwarder license suffices; if it indexes or runs searches, it needs Enterprise license access. Supports conditional routing Ex. UF on Linux hosts forwarding syslog to indexers; HF using Forwarder license to parse and forward without storing. sudo ./splunk add forward-server 192.168.1.122:9997 License Violation A state reached after too many daily license warnings in a rolling window Warning β warnings occur when you exceed your licensed daily indexing volume (measured midnight-to-midnight). When in violation, indexing continues but search is disabled for the offending license pool/peers until youβre back under the limit or apply a reset/new license. What triggers it: Each day you exceed your allowed ingest creates a license warning; accumulate enough warnings β violation. Thresholds (Enterprise family): 5+ warnings in 30 days β violation (Enterprise, Enterprise Trial, Dev/Test, Sales Trial). Threshold (Free): 3+ warnings in 30 days β violation; search disabled until warnings drop below 3. Enforcement note (<100 GB stacks): Conditional enforcement can disable search sooner on small stacks (Splunk Enterprise 8.1+ policy). What happens: Indexing continues; search is blocked for violating pools/peers until you recover/apply capacity. Ex. Enterprise Threshold β Your environment exceeds licensed volume on 5 distinct days within 30 days β search disabled on the violating pool, but indexers keep ingesting; search returns once the rolling 30-day count drops below 5 or you add/reset license capacity. View license messages β splunk list licenser-messages License Terminology License Manager β formerly βLicense Masterβ A Splunk Enterprise instance that serves as the central license repository; other Splunk instances (βlicense peersβ) talk to it for entitlement and enforcement. Orchestrates groups β stacks β pools and assigns peers to pools. You install licenses here, then point peers at it. Ex. Install Enterprise licenses on a dedicated manager; configure indexers/search heads as license peers to that manager. License Peers β formerly βLicense Slavesβ A Splunk Enterprise instance configured to contact the license manager for access to features and (if it indexes) a slice of license volume. Any Splunk component (e.g., indexer, search head) can be a peer to the manager. Configure in Splunk Web: Settings βΈ Licensing βΈ Change to Peer and provide the manager URI. Use case β Point all indexers/search heads in a distributed deployment to the central manager. Ex. On an indexer: Settings β Licensing β Change to Peer β enter manager URI β assign to the appropriate pool. License Stack β A collection of compatible licenses combined into one aggregate daily capacity; you can carve pools out of a stack. (Enterprise & Enterprise Sales Trial can stack; Free/Enterprise Trial cannot.) βStackingβ aggregates volumes from multiple keys. Stacks feed one or more license pools. Use case β Combine multiple Enterprise keys into a single capacity number. Ex. Stack a 100-GB/day and 50-GB/day Enterprise key β one 150-GB/day stack, then split into pools (e.g., 120 GB prod, 30 GB QA). License Group β A set of one or more license stacks; only one group is active at a time in a Splunk Enterprise deployment. (Examples include the Enterprise/Sales Trial group, Free group, etc.) Groups separate incompatible license types; the active group governs enforcement. A stack may belong to one group only. Use case β Transitioning from Trial/Free to Enterprise (change the active group). Ex. After applying Enterprise keys, the Enterprise/Sales Trial group becomes active and enforcement uses those stacks/pools. License Pool β A quantity of license volume allocated from a stack and assigned to one or more license peers; pools are managed by the license manager. Lets you divide a stack across teams, sites, or environments. Peers must be assigned to a pool to consume capacity. Pools can be created/edited in Splunk Web or managed via CLI/REST. Ex. Allocate 120 GB/day to βProd-Indexersβ and 30 GB/day to βQA-Indexers.β Settings βΈ Licensing βΈ Add pool β choose stack & amount β select peers (indexers) that can draw from that pool. Configuration Files inputs.conf β Defines data inputs (file, TCP, UDP, scripts) Define data inputs (file/directory monitors, TCP/UDP, scripts, WinEventLog, HEC). outputs.conf β Sets where and how to forward data (indexer IPs, load balancing) Control how forwarders send data to receivers (indexers/other forwarders), including SSL, load-balancing, and routing. props.conf β Controls field extraction, line breaking, timestamping, and transforms Set sourcetype behaviors: line-breaking, timestamping/encoding, index- vs search-time extractions, routing hooks to transforms. indexes.conf β Defines index storage paths, retention, and size limits server.conf β Configures Splunk server identity, clustering, SSL, and general behavior transform.conf β Define regex-based field extractions, routing (INDEXED/queue), and metadata rewrites (host, sourcetype, source) eventtypes.conf β Define event types (saved search predicates with names) for reuse and tagging. indexes.conf β Create/manage indexes, storage paths, retention, TSIDX settings, and data types (events vs metrics). savedsearches.conf β Store scheduled/ondemand searches, reports, and alerts (search string, schedule, actions, permissions). macros.conf β Define reusable SPL fragments (search macros) with arguments & eval context. web.conf β Splunk Web settings (HTTPS, reverse proxy, UI features) authentication.conf β Configure auth backends and policies (e.g., Splunk native, LDAP, SAML; password policy) btool btool is Splunkβs CLI utility that simulates the merge of all on-disk .conf files (respecting layering/precedence) and shows the effective settings, with --debug revealing the exact file and line each setting comes from. Use case β Find where a setting βwinsβ (file path + stanza) when multiple apps define it. Confirm index-time or search-time behavior (e.g., line-breaking, routing, retention) before rollout. Validate a deployment or troubleshoot βwhy didnβt my change apply?β issues Ex. Show merged props with file origins (all stanzas): splunk btool props list --debug Narrow to one sourcetype stanza: splunk btool props list my_sourcetype --debug Find where an index is defined: splunk btool indexes list my_index --debug Inspect inputs on a UF/HF with origins: splunk btool inputs list monitor:///var/log/app/*.log --debug Validate all configs for format errors before restarting: splunk btool check Indexes Logical data stores on indexers that hold your data on disk as buckets containing compressed rawdata and index (tsidx) files; created and managed by the indexer. Separate data by retention, security, or workload (e.g., βprodβ, βsecβ, βpciβ). Types of data Event Data β free-form text broken into events during parsing; searched with SPL over events. logs, security telemetry, audit trails. Metric Data β highly structured numeric values (metric, value, dimensions); searched as aggregates. system/IoT performance time series, SRE/observability KPIs. Ex. Convert logs to metrics (L2M) to store CPU utilization as metric_name=cpu.utilization, value=73, host=web01 Types of Indexes: Event Index β Default index type that stores event data (log-style records) with compressed rawdata and tsidx for fast search. Supports broad SPL over text, field extraction, and ad-hoc searches Uses bucket lifecycle for retention/archival (hot/warm/cold/frozen). Use case β Application logs, auth events, network and security data lakes. Ex. Define indexes.conf with an events index auth for authentication logs; searches run against index=auth Metric Index β Index type optimized for metrics; stores numeric measurements with metric name and dimensions. You cannot convert an events index to a metrics index (or vice versa). Structured format for high-volume/low-latency metrics workloads. Metrics data searched as aggregates and behaves differently than events. Each metric event has a capped size (per-event measurement) policy. Use cases: Infra/IoT telemetry, app performance KPIs, log-to-metrics conversions (L2M). Send metrics directly to a metrics index via HEC. Ex. {"time": 1724102400, "event":"metric", "source":"hec", "fields":{"metric_name":"cpu.util","host":"web01","region":"us-west","_value":73}} β indexed into index=metrics_iot Data β Process β Index β Bucket Integrity β Splunk Double Hash Computes a hash on newly indexed data β Level 1 Hash L1 (per-slice): As data is ingested, Splunk writes SHA-256 digests for each rawdata slice to l1Hashes Computes another hash on the same data when it moves buckets β Level 2 Hash L2 (per-bucket): On hotβwarm roll, Splunk hashes the contents of l1Hashes and stores it as l2Hash, βsealingβ the bucket. Both l1Hashes and l2Hash live under the bucketβs rawdata/ directory. Check Hashes to validate data β ./splunk check-integrity -bucket-path [path] [-verbose] Verify entire index β $SPLUNK_HOME/bin/splunk check-integrity -index secure_logs Configure data integrity control β enableDataIntegrityControl=true indexes.conf Options Global β applies to all indexes unless overridden Settings placed outside any stanza or in the [default] stanza of indexes.conf They act as defaults for every index on that indexer (including SmartStore defaults), until a per-index stanza overrides them. Use [default] to seed common retention/size settings once (e.g., frozenTimePeriodInSecs, maxTotalDataSizeMB). Any index stanza can override these defaults locally. Per Index β event/metrics indexes stored locally or with SmartStore Settings that apply to a single index stanza like [web], [auth], etc.βcovering paths, retention, size, clustering replication, data type, and (if used) SmartStore placement. Options: Paths: homePath (hot+warm), coldPath, thawedPath Retention / size: frozenTimePeriodInSecs (time) and maxTotalDataSizeMB (size). Either limit can freeze (age-out) data. Clustering: repFactor sets copy policy for that index in an indexer cluster. Index type: datatype = event | metric SmartStore (per index): set/override remotePath to put this index in a remote volume. Local paths must still be present. Per Provider Family β Virtual Indexing / External Providers A provider family groups shared settings for multiple external data providers used by Virtual Indexes (Hadoop/HDFS/S3-based archives, etc.). Family stanzas look like [provider-family:<name>] Family stanzas capture common knobs (for example, Java/command used to launch external processes). Providers inherit from their family; if a setting exists in both, the providerβs value wins. Used with Virtual Indexes (VIX) and archive/search on Hadoop/S3 features. Use Cases: Maintain one place for Hadoop client options shared by many providers. Keep consistent execution mode/command for all Hadoop providers Per Provider β Virtual Index external systems A provider defines connection/runtime settings to an external system (for example, a specific HDFS/EMR/S3 endpoint) used by a Virtual Index. Stanzas look like [provider:<name>] Key attributes: vix.family (which family to inherit from), environment variables (e.g., vix.env.JAVA_HOME, vix.env.HADOOP_HOME), and vix.command.arg.N (jars/args). Every virtual index references exactly one provider. Use Cases: Point different virtual indexes to different Hadoop clusters/EMR accounts. Tune provider-specific classpaths or auth. Per Virtual Index (VIX) β A virtual index is a search-time construct that makes external data (for example, in Hadoop/S3) look like a Splunk index. Stanzas are named for the virtual index and use vix.* keys. Core properties include: vix.provider β which provider to use, vix.family (often inherited), vix.input.<N>.* β where the data lives (paths, regexes, time extraction hints). Commonly used for searching archived buckets in Hadoop/S3 or for Splunk Analytics for Hadoop-style access Use cases: Define an archive you can search without re-ingesting (Hadoop/S3). Segment different external datasets into separate VIX βnamespaces.β Fish Bucket An internal directory/database where Splunk tracks how far it has read each file monitor input (CRC/seek checkpoints) so it can detect new data and avoid duplicate re-ingest. CRC β Cyclic Redundancy Check β Used to detect changes in the data seekaddress β Refers to the current position within a monitored file that Splunk has read. Splunk uses the seekaddress to resume reading a file from where it left off after a restart or interruption seekcrc β When the Splunk monitors a file, it calculates a CRC (checksum) based on the initial bytes (by default, first 256) of the file. This crc is called as seekcrc Splunk stores the seekcrc in the Fishbucket to identify if a file has been seen before How it works: When the Splunk starts monitoring a file, it calculates the seekcrc from the beginning of the file. It then checks the Fishbucket to see if this seekcrc is already known If the seekcrc is found, it means Splunk has encountered this file before (possibly under a different name) Splunk then uses the stored seekaddress to determine where to resume reading the file from. If the seekcrc is not found, itβs considered a new file, and Splunk begins indexing it from the start. As Splunk processes the file, it updates the seekaddress in the Fishbucket to reflect the current position. Typically located at $SPLUNK_DB/fishbucket/splunk_private_db/ Used by monitor inputs to store per-file checkpoints (not for all source types). Lets Splunk know whatβs already indexed and where to resume in each file. Default path on Enterprise deployments: $SPLUNK_DB/fishbucket/splunk_private_db. Not a user data index; historically not searchable as β_fishbucket/_thefishbucketβ Can be inspected/modified via btprobe (support tool); changes take effect after a restart and Splunk should be stopped before use Use cases: Re-ingest one file after fixing sourcetype/props: reset that fileβs checkpoint in fishbucket. Force a forwarder to re-index everything (lab/testing): remove the fishbucket (with Splunk stopped). Use with cautionβlicense impact. Troubleshoot βwhy didnβt this file index?β: validate/check the fileβs checkpoint record. Ex. Check single fileβs checkpoint: $SPLUNK_HOME/bin/splunk cmd btprobe -d $SPLUNK_DB/fishbucket/splunk_private_db --file /var/log/app.log --validate Reset a single file so Splunk re-reads it β $SPLUNK_HOME/bin/splunk cmd btprobe -d $SPLUNK_DB/fishbucket/splunk_private_db --file /var/log/app.log --reset btprobe to query/reset entries; requires restart. User Types Admin β A Splunk Admin has full access to all configuration, data, and management functions within the system. Admins are responsible for configuring the system, managing users, and overseeing all administrative tasks. Capabilities: Full access to all indexes, search heads, data sources, apps, and knowledge objects. Can perform system-level tasks such as adding/removing apps, configuring indexing, managing storage paths, installing/setting up distributed environments, etc. Create/edit/delete user roles and assign permissions. Access to Splunkβs internal logs and configuration files. Delete data and manage retention settings (e.g., frozen data). Use cases: System configuration: Admins perform initial setup, configure indexing, and manage large-scale Splunk instances. User management: Granting roles/permissions to users, managing authentication settings. Security/Compliance: Admins are responsible for ensuring data security, retention policies, and compliance within the system. Ex. Adding new users and assigning roles like Power or User with permissions to search specific data, Configuring a new Splunk deployment and managing cluster configurations. Power β A Power user has access to search and view most data, but they do not have the full system administrative privileges of an Admin. Power users are often used for analysts who need access to data but should not be able to modify the system configuration. Capabilities: Search data across the platform, run reports, and create dashboards. Can schedule saved searches and manage their own alerts. Access knowledge objects, including event types, tags, and field extractions. Cannot modify system configurations or manage users/permissions. Cannot delete data or modify indexing. Use cases: Data analysis: Power users can explore data, create complex searches, and generate actionable insights for the organization. Reporting: Power users create and schedule reports for operational monitoring. Ex. An analyst using the Splunk search interface to pull logs and create dashboards for network monitoring. User β A User has basic access, primarily for searching, viewing, and interacting with dashboards and reports. Their permissions are typically limited to specific roles and datasets. Users cannot edit, create, or manage knowledge objects or settings. Capabilities: Run searches and view results, but no modification of searches. Can view and interact with dashboards and reports created by Power or Admin users. Limited data access: Can access only the data for which they have been explicitly granted permissions. Cannot delete data or modify system settings. Use cases: View data: Basic users can be restricted to viewing specific datasets or dashboards for operational oversight. Read-only access: Non-administrative roles that need read-only access to data for analysis. Ex. A sales team member using dashboards to view sales performance metrics but unable to modify queries or create new reports. Can Delete β A Can Delete user has additional permissions to delete data within Splunk, often required for users managing logs, compliance, or dealing with sensitive data. This is a role that can be assigned in conjunction with other base roles like User, Power, or Admin. Capabilities: Delete data from the system (either individual events or entire datasets). Can remove data from specific indexes based on retention policies or manual deletions. Cannot configure system settings or modify access control. Use cases: Data management: Deleting logs or sensitive data that is no longer needed. Data pruning: Used to manually remove expired or archived data from indexes. Ex. A compliance officer deleting logs that are older than the required retention period (as per internal policies). Viewer β A Viewer role is typically a read-only role for non-interactive users who need to see data without making changes, modifications, or running their own searches. Itβs often used for dashboards or reports consumption. Capabilities: View dashboards, reports, and pre-built queries. No search or creation access; canβt save, modify, or create any data or reports. Canβt delete data, configure settings, or access raw event data directly. Use cases: Read-only reporting: Users that only need to see specific reports or dashboards. Non-interactive data consumption: Those who donβt need to interact with or manipulate data directly. Ex. A manager viewing pre-built dashboards showing system status and performance metrics without being able to modify the underlying data or queries. authorize.conf β Roles can be added or removed in this file or Splunk web LDAP Authentication Splunkβs LDAP (Lightweight Directory Access Protocol) integration allows you to authenticate and authorize users using an external LDAP directory service (such as Microsoft Active Directory, OpenLDAP, etc.) for centralized identity management. It enables Single Sign-On (SSO) and user role assignment based on LDAP group membership. Authentication: Splunk uses LDAP credentials for user login. Authentication is handled by the LDAP server. Authorization: Splunk can map LDAP groups to Splunk roles, assigning permissions based on group membership. Syncing users: Splunk can import LDAP users and assign roles automatically or manually based on LDAP group membership. SSL support: Splunk supports LDAP over SSL for secure connections (LDAPS). Search mode: Splunk can configure search parameters based on group membership authentication.conf β The configuration file where LDAP authentication settings are defined. authorize.conf β Where role mappings from LDAP groups to Splunk roles are set. LDAP over SSL (LDAPS): Use port 636 for secure LDAP communication. x.500 Style Directory β Defines a hierarchical database of objects on the network. Objects can be users, groups, printers, disks, almost anything else on a network Attributes: DN β Distinguished Name, an entryβs unique identifier. Composed of multiple attributes CN β Common/Canonical Name OU β Organizational Unit DC β Domain Component SN β Surname Splunk LDAP Attributes: bindDN β The Distinguished Name (DN) that Splunk uses to bind to the LDAP server and authenticate the user. This is a service account with access to search for users in the LDAP directory. bindDN = cn=admin,dc=example,dc=com bindDNPassword β The password for the bindDN account. Itβs used by Splunk to authenticate against the LDAP server. bindDNPassword = password123 userNameAttribute β The LDAP attribute that holds the username for authentication. Commonly used attributes are uid or sAMAccountName userNameAttribute = uid searchBase β The LDAP base DN to search for users. It defines the starting point in the LDAP directory for the user search query. searchBase = ou=users,dc=example,dc=com searchFilter β The LDAP search filter that determines how users are located within the LDAP directory. The %s placeholder will be replaced by the actual username being authenticated. searchFilter = (uid=%s) Example: [LDAP] authType = LDAP ldapURI = ldap://ldap.example.com:389 bindDN = cn=admin,dc=example,dc=com bindDNPassword = password123 searchBase = ou=users,dc=example,dc=com searchFilter = (uid=%s) userNameAttribute = uid groupSearchBase = ou=groups,dc=example,dc=com groupSearchFilter = (member=%s) roleMappingAttribute = memberOf roleMap = "admins" => admin, "power_users" => power sslBind = true Attributes LDAP Connection Settings Host Port Bind CN User Settings User base CN User base filter User name attribute Real name attribute Email attribute Group mapping attribute Group Settings Group base DN Static group search filter Group name attribute Static member attribute Dynamic Group Settings Dynamic member attribute Dynamic group search filter Distributed Search Distributed Search in Splunk allows you to search across multiple Splunk instances (indexers, search heads) that are part of a distributed deployment. Search Heads: These are the instances that run the search jobs and aggregate results from multiple indexers. Splunk can scale horizontally by adding more search heads. Indexers: These handle the actual data storage and indexing. In a distributed environment, indexers store data and perform searches on it when queried by the search heads. Provides Horizontal Scaling, Access Control, Geo-dispersed Data Search Head Clustering (SHC) is a distributed architecture in Splunk that provides high availability, scalability, and load balancing for search heads. A deployer is a special Splunk instance used to distribute configuration files and apps to all search heads in a Search Head Cluster. How it works: Search heads submit search queries to search peers (indexers) for processing. The master node in the search head cluster coordinates the search jobs and aggregates results from multiple search heads. The deployer keeps the search heads synchronized by distributing configuration files and app updates. Indexing Phases Data Input β Forwarding, Monitoring, Network, Scripted Parsing β Breaks the data stream into individual events Indexing β Writes the parsed data into index buckets Configuring Forwarders Steps: Configure receiving on an indexer or cluster Install the UF Start the UF & accept the license agreement Configure the UF to send data Configure the UF to collect data from the host it is on Intermediate Forwarders β A forwarder that receives data from one or more forwarders and then forwards it on to indexers (or another forwarder). In practice this role is almost always a Heavy Forwarder (Splunk Enterprise) because only Splunk Enterprise instances can be configured as receivers. Lets you route & filter events (props/transforms) and optionally drop/mask or split traffic before it reaches indexers. Configured with outputs.conf to define target indexer groups, load-balancing, SSL, and (optional) indexer acknowledgments. Multiple Pipeline Sets β The forwarder can process multiple events at the same time A feature that lets an indexer (or forwarder) run more than one end-to-end ingestion pipeline set in parallelβeach pipeline set receives data, parses it, and writes to diskboosting ingest throughput on multi-core hosts. Useful for machines with more than one core Setup β /etc/system/local/server.conf [general] parallelIngestionPipelines=2 Forwarder Management Splunkβs Forwarder Management is the Splunk Web UI on a deployment server that lets you group deployment clients into server classes and map deployment apps to them for centralized config/app distribution and lifecycle control. It writes to serverclass.conf and coordinates clients that βphone homeβ via the management port. Deployment Server A Splunk Enterprise instance that centrally distributes apps and configuration updates to groups of deployment clients (UF/HF/Enterprise nodes). The Forwarder Management UI on the deployment server maps server classes β deployment apps, and clients phone home over the management port to fetch changes. Groups Splunk components by common characteristics Distribute content based on those groups Windows Server Group Linux Server Group Email Group Deployment Server Architecture Deployment Server β A Splunk instance that acts as a centralized configuration manager Refresh Command β splunk reload deploy-server Deployment Client β A Splunk instance remotely configured by a deployment server Deployment App β A set of content (including configuration files) maintained on the deployment server A packaged set of configs/content placed on the DS (under etc/deployment-apps/<appname>/) that the DS distributes to matching clients; once downloaded, the app is installed under etc/apps/<appname>/ on the client. You define which clients get which apps via server classes in serverclass.conf. Server Classes β A group of deployment clients that receive particular deployment apps Grouping can be based on many different criteria A client can belong to more than one server class How it works: Each deployment client calls home periodically The deployment server determines the set of deployment apps for the client, based on which server classes the client belongs to The deployment server gives the client the list of apps that belongs to it The client compares the app info from the deployment server with its own app info, to determine whether there are any new or changed apps that it needs to download If there are new or updated apps, the deployment client downloads them Depending on the configuration for a given app, the client might restart itself before the app changes take effect (Mgmt port 8089) βββββββββββββββββββββββββββ phones home / fetches apps β Deployment Server βββββββββββββββββββββββββββββββββ β β’ Forwarder Mgmt (UI) β β β β’ serverclass.conf β β β β’ etc/deployment-apps/ β β βββββββββββββ¬ββββββββββββββ β β maps apps to server classes β β β βββββββββββββΌββββββββββββ ββββββββββββββββββββββββΌβββββββββββ β Server Class A β β Server Class B β β β App(s): TA_linux β β β App(s): TA_windows, outputs β βββββββββββββ¬ββββββββββββ βββββββββββββ¬βββββββββββββββββββββββ β β ββββββββββββΌββββββββββ βββββββββββΌββββββββββ β Deployment client β β Deployment client β β (UF/HF/Enterprise) β β (UF/HF/Enterprise) β ββββββββββββββββββββββ ββββββββββββββββββββββ Network Inputs Network inputs let Splunk listen on TCP or UDP ports for incoming data (raw streams or Splunk-to-Splunk traffic from forwarders) via stanzas in inputs.conf TCP is recommended Splunk can act as a syslog server or syslog message sender Splunk recommends an intermediate universal forwarder to listen on the UDP port SNMP β Simple Network Management Protocol Write it to a file on the Splunk Enterprise server, then monitor that file Scripted Inputs A scripted input is an executable (shell/Python/PowerShell, etc.) that emits events, which Splunk ingests on a schedule or continuouslyβconfigured with [script://...] stanzas in inputs.conf Splunk allows to prepare data from a non-standard source so Splunk properly parses it Types β Streaming & Writing to a file What they do: Run scripts to pull from APIs, CLIs, message queues, DBs, or to emit host metrics (vmstat/iostat). Where to run: On forwarders (UF/HF) or full Splunk Enterprise; forwarders usually forward, not index. How to configure: Place script in app bin/, define [script://./bin/<script>] with interval (seconds) or run continuously with interval=0. # Poll an API every 60s [script://./bin/poll_api.sh] interval = 60 sourcetype = api:json index = ingest disabled = 0 Agentless Inputs Ways to get data into Splunk without installing a Splunk forwarder on the source. Windows Input Types Windows Event Logs perfmon WMI β Windows Management Infrastructure Registry Active Directory HEC (HTTP/HTTPS push): Applications/services send JSON over HTTP(S) with a HEC token; supports indexer acknowledgment for reliability. Allows to send data directly to Splunk over HTTP(S) Especially useful for client-side web application monitoring(SPA) Process: Logging Data Using a token with the data Sending the data request Verifying the token Indexing the data Format to send data to HEC β <protocol>://<host>:<port>/<endpoint> Send To: /services/collector /services/collector/raw <protocol>://<host>:<port>/<endpoint> -H "Authorization: Splunk <token>" -d Syslog / raw network ports: Splunk listens on TCP/UDP (e.g., 514) or S2S (splunktcp); Splunk recommends SC4S for scalable syslog collection. WMI (Windows agentless): A Splunk instance (often an HF) remotely polls Windows hosts for Event Logs/Perf via WMIβno UF required on targets. Fine Tuning Inputs Practical knobs in inputs.conf (and a few related configs) to control what, how, and how fast data is ingestedβcovering file monitors, network listeners, scripted inputs, and HEC. Setting a sourcetype β When you add data, you can specify or create a new sourcetype Application Context β Which app Splunk will write the .conf files to Configure host value β The default value is the DNS name of the machine You can set it to use the IP address, or specify your own hostname Index β You can use a built-in index, or create a new index Character Set Encoding Splunk defaults to UTF-8 character encoding If the source does not use UTF-8, Splunk attempts to convert it You can specify a character set to use in props.conf using charset=<string> Parsing Phase & Data Break β Splunk splits the incoming stream into events using line-breaking rules defined per sourcetype in props.conf (on the parsing tier: HF/indexer). Best practice: disable line-merge and set an explicit LINE_BREAKER SHOULD_LINEMERGE = false LINE_BREAKER = ([\r\n]+)\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}\s [\r\n]+ β Regex that Splunk uses to search for line breaks in raw data \r β Carriage return \n β Line feed [] β Character set + β Match one of more of the preceding token(s) Timestamps β For each new event, Splunk finds and parses the time to set _time. Control with: TIME_PREFIX (where to start) β TIME_PREFIX = ^\[ MAX_TIMESTAMP_LOOKAHEAD (how far to read) β MAX_TIMESTAMP_LOOKAHEAD = 30 TIME_FORMAT (the pattern) β TIME_FORMAT = %Y-%m-%d %H:%M:%S,%3N optional TZ β Time Zone Splunk Default TZ for Timestamps β Time zone of the Splunk Instance Values in _time field are stored in UNIX time Splunk assumes that any data indexed is in the time zone of the Splunk instance Annotate events β Add metadata Events are stamped with default fieldsβhost, source, sourcetype (and target index)βderived from inputs and/or rules. You can override at parse time with props.conf + transforms.conf using _MetaData:* targets. Transform β manipulate or route or drop Regex-based transforms run during parsing to mask, rewrite, route, or drop events. Tools: TRANSFORMS-* stanzas (to set metadata or queue = nullQueue), and SEDCMD-... for inline redaction. Parsing = Break β Timestamp β Annotate β Transform, then events move to the indexing phase for storage. Manipulating Raw Data Index-time controls that change, route, or drop events before theyβre written to diskβimplemented with props.conf + transforms.conf (parsing tier: HF/indexer) Why Transform Masking sensitive data before it gets indexed HIPAA, PCI, GDPR Event Routing β Route specific data to specific indexes Methods to transform / route / mask SEDCMD (props.conf): Inline sed-style substitution to redact/clean text (fast, simple). Parsing tier only. TRANSFORMS (props+transforms): Regex-driven actions to route (_MetaData:Index), retag (_MetaData:*), or drop (queue = nullQueue). INGEST_EVAL: Add/alter fields (e.g., hash a value) at ingest using eval functions. Scope & where: Apply these on Heavy Forwarders / indexers (the parsing tier), not UFs. Transformations with props.conf & transforms.conf Anonymize Data β Masks or rewrites sensitive substrings in the event body before indexing. SEDCMD-hide_pwd = s/password=\S+/password=********/g Override sourcetype or host β Sets or corrects sourcetype and host at parse timeβglobally or per-event. Use a transform that writes to _MetaData:Sourcetype and/or _MetaData:Host. For per-event ST overrides, see βAdvanced sourcetype overrides.β Route events to specific indexes β Sends some events (by pattern, source, or host) to a particular index. Write _MetaData:Index in a transform; attach the transform in props.conf. Prevent unwanted events from being indexed β Drops matched events during parsing so they never reach disk. Send them to queue = nullQueue via a transform. You can combine βdrop allβ then βkeep someβ by chaining transforms. SEDCMD β SEDCMD-<label> is a props.conf setting that applies a sed-style substitution to the eventβs _raw text at index time (on the parsing tier: HF or indexer). Itβs commonly used to mask/anonymize, remove, or reformat substrings before the event is written. SED β UNIX / Linux Stream Editor β Edits data in real time, as itβs coming in(streaming) Replace strings(s) & substitute characters(y) SEDCMD β Key-Value pairs into props.conf In inputs.conf, we tell Splunk where the data is Mask a password value β SEDCMD-mask_pwd = s/password=\S+/password=********/g Format β s/<regex>/<replacement>/g \g Flag β Indicates a Global Search Splunk Platform Directory Structure SPLUNK_HOME βββ bin β βββ Splunk commands βββ etc β βββ apps β β βββ Config files packaged as apps β βββ users β β βββ User-specific config files β βββ system β βββ System-wide config files βββ var β βββ log β β βββ Log files β βββ lib β βββ Indexes Splunk App Directory Structure $SPLUNK_HOME/etc/apps/<app> βββ bin β βββ Scripts βββ default β βββ Configuration Files β βββ app.conf β βββ limits.conf β βββ ... βββ local β βββ Configuration Files β βββ app.conf β βββ limits.conf β βββ ... βββ metadata βββ default.meta βββ local.meta Index-time Precedence Index-time Precedence βββ etc β βββ system β βββ local (1) β βββ default βββ apps βββ app1 β βββ local (3) β βββ default (5) βββ search βββ local (2) βββ default (4) Search-time Precedence Search-time Precedence βββ etc β βββ system β βββ local (6) βββ apps βββ app1 β βββ local (2) β βββ default (3) βββ search β βββ local (4) β βββ default (5) βββ users βββ admin β βββ app1 (7) βββ john(username) βββ local (1) Notes Training Splunk makes use of dashboards to find patterns in data Configuration files (.conf) contain stanzas and settings (or attributes) Deployer manages the search head clusters Splunk Enterprise components fall into two categories: Processing & Management Splunk Home Directory β /opt Default Splunk Index β main Universal Forwarders can be configured to send data to Splunk Cloud platform using the forwarder credential app Timestamps are converted to UNIX time & stored in the _time field The Search Job Manager (accessible via Activity > Jobs) allows you to monitor, pause, inspect and manage active & completed search jobs Lookup Definitions β Settings β Lookups β Lookup Definitions Automatic Lookups β Can be configured in props.conf using the LOOKUP-<lookup-name> stanza specifying the lookup definition & the fields to match & add Many user-specific preferences, including the default time range, can be customized in your user profile settings values() β Returns a list of unique values for a given field of each group. _time β _time field extraction & format are primarily defined by configurations associated with the sourcetype in props.conf last() β Returns the value of the specified field from the latest event in a group latest() β is an alias for last() Denote a β Alphanumeric Values (both characters & numbers) # β Only contains numbers " β Only characters, no numbers Process of creating Automatic Lookups Settings β Lookups Add New Choose the destination app Give your automatic lookup a unique name Select the lookup table you want to use In the Apply to menu, select a host, source or source type value to apply the lookup & give it a name in the named field Under the lookup input fields provide one or more pairs of input fields Under the lookup output, fields provide one or more pairs of output fields Select Overwrite field values to overwrite existing field values in events when the lookup runs Click save By default, Splunk retains a completed search job for 10 minutes after it finishes. Can be extended to 7 days β by scheduling a report Scheduled reports & alerts can only run as Owner If you share a report so that it runs as User & then schedule that report, its permissions change to run it as owner inputlookup β Command used to review the contents of a specified static lookup file top command β Common Constraints β limit, showperc, countfield Naming Convention for Dashboards β Group_Object_Description | dbinspect index=* β Shows bucketId & other information like state, etc. A pipe can always follow a macro in Splunk. This allows users to further manipulate and analyze data output from the macro using additional commands in the search pipeline. Splunk Field Extractor (FX) delimiters β Tabs, Pipes, Spaces, Colons Fields extracted using the Field Extractor persist as knowledge objects. Users are the group of individuals who would most likely use pivots in Splunk. Priority is the correct factor that determines the color displayed for an event in Splunk when multiple event types with different color values are assigned to the same event. Events with higher priority values will be displayed with the corresponding color assigned to them. Event Actions > Extract Fields β Automatically identifies the data type, source type, and sample event By default, acceleration is turned off in the Splunk Common Information Model (CIM) add-on. This means that data models and datasets are not pre-built for faster search performance unless explicitly enabled by the user. tostring Formats: βautoβ β Default. Converts numeric to string automatically based on value type. βcommasβ β Adds thousands separators to numbers. Example: 1234567 β β1,234,567β hex β Converts number to hexadecimal string. Example: 255 β β0xffβ oct β Converts number to octal string. Example: 8 β β010β βbinβ β Converts number to binary string. Example: 5 β β101β βdurationβ β Converts seconds to human-readable time. Example: 3661 β β1h 1m 1sβ βroundβ β Rounds numeric value to nearest integer before converting to string. βfixed=β β Converts number to string with fixed decimal places. Example: 3.14159 β tostring(value,βfixed=2β) β β3.14β βscientificβ β Converts number to scientific notation string. Example: 12345 β β1.2345e4β fillnull β Use fillnull to replace null values in fields Use value=string to specify a string you want displayed instead Example: fillnull value=NULL If no value= clause, default replacement value is 0 You can create datasets for your data model by define their constraints and fields. Bubble Chart β Bubble chart provides a visual way to view a three dimensional series. Each bubble plots against two dimensions on the X and Y axes. The size of the bubble represents the value for the third dimension. Splunk Home Directory β /opt/splunk Splunk supports configuration management through third party tools like β SaltStack, Puppet, Chef Batched files are ingested, then deleted Splunk stores data in 64k block size Default Host Value β DNS name of the machine splunk validate files β Command used to validate the syntax of the config files without need to restart the Splunk instance Replication Factor β Determines the number of copies of data across indexer peers for redundancy Search Factor in Splunk Indexer Cluster β Used to specify the number of searchable copies of data available in the cluster β Determines how many copies are searchable Determines the minimum number of searchable copies needed for data availability during searches. splunk btool β Command used to display the effective configuration settings for a specific file transforms.conf β Used to define rules for transforming & manipulating data during indexing HTTP Event Collector: Data is sent to a specific HTTP endpoint on the Splunk Instance HEC uses tokens for authentication Problem: Recent data not appearing in searches Indexers have reached their license capacity & are blocking indexing splunkd process on the indexers is not running or unhealthy Disk space on the indexersβ data volumes are full Attributes for Data Retention Policy in indexers.conf maxWarmDBCount coldPath frozenTimePeriodInSecs _internal index β Pre-configured with the Splunk Enterprise serverclass.conf β File maintained by the Deployment Server(DS) that contains the mapping details of apps to forwarders It defines which forwarders receive which apps based on specific criteria Indexers stored at β $SPLUNK_HOME/var/lib/splunk Once data is written to a Splunk index β It generally considered immutable Splunk Developer License Allows 10GB / day β Exceeding triggers license warnings Issued for 6 months It cannot be stacked with any other licenses The priority of Splunk Config Files depends on β Contexts Global Context User/App Context distsearch.conf β Used to setup distributed search groups Splunk Re-index β splunk clean eventdata -index _fishbucket repFactor=auto β Configure in indexes.conf on all peer nodes to activate replication for an index in an indexer cluster How to remove missing forwarders from Monitoring Console: Manually remove Individual Forwarders Rebuild Forwarder Assets In network input, hostname β Default host attribute collections.conf β Used to define KV Store collection Minimum Reference Server for Indexer β 12 CPU Cores, 12GB RAM, 800 IOPS deploymentclient.conf β This file on forwarder stores the necessary information to establish a connection with a deployment server Naming Convention for Warm Buckets db_<earliesttime>_<latesttime>_uniqueid persistentQueueSize β Enables the storage of data on disk when the in-memory queue is full Splunk License Violation Free β Immediate block if daily cap exceeded Search is disabled until next day Enterprise Trail β 5 warnings / 30 days β violation Search is disabled until resolved Enterprise <100 GB/day β >=45 warnings in 30 days β Violation Search is disabled until resolved Enterprise >100 GB/day β Unlimited warnings Search never blocked _TCP_ROUTING β Config in inputs.conf β Allows to selectively forward data to specific indexer(s) The Native Splunk Authentication scheme takes precedence over any external schemes User Role Inheritance β Splunk allows a child role to inherit capabilities & index access from a parent role Persistent Queues(PO) β A feature in Splunk that help prevent data loss when a forwarder or indexer is temporarily unable to send or process data There is no hard limit to the max number of SHC members that the KV store can support KV Store lookups are backed by a configuration called collection _introspection β This index stores Splunk resource consumption & performance External Lookup β also known as scripted lookups because they utilize scripts to fetch data from external sources during search time autoLBFrequency β outputs.conf file β Specifies the number of seconds the forwarder should wait before performing a load-balancing check & potentially switching to another indexer in the list splunk set deploy-poll <deploy-server:port> β the deploymentserver.conf file is created in the $SPLUNK_HOME/etc/system/local directory * β wildcard ... β Recurses through subdirectories Stanzas in transforms.conf to manipulate or remove events β REGEX, DEST_KEY, FORMAT License Location β $SPLUNK_HOME/etc/licenses Sever Class β The attribute in the forwarder management interface that determines which apps client install Bucket Storage β colddb, db, thaweddb Data Integrity Checking β indexes.conf