On Wednesday, February 28th, Aditum presented a webinar titled “How to Maximize Splunk Search Queries.” This webinar presented a number of ways to drive Splunk query optimization to improve response time and drive faster insight.
The presentation included an overview of the basics of Splunk query optimization, examples of SPL best practices, and techniques for searching really big data. If you were unable to attend the webinar but would like to review the content, you can download the on-demand version here.
Writing Better SPL
During the presentation, we detailed 7 Splunk Search Processing Language (SPL) best practices for faster search. As a rule of thumb, it’s best to be as specific as possible when writing queries. After that, keep the practices below in mind:
- Filter data as early and as much as possible. This means cutting things down right away instead of cutting them down later.
- Avoid wildcards
- Use macros and subsearches instead of wildcards for list filtering
- Avoid using “NOT” – because the way Splunk implements NOT is NOT the way you might expect
- Avoid tags and eventtypes when writing an optimized search
- Run searches in FAST mode when using the Search bar to do interactive searching
- Position streaming commands, especially distributable streaming commands, before transforming commands.
You can learn more details on how to implement each of these best practices by viewing the on-demand version of the webinar. We also provided a quick reference guide with 20 Tips for Writing Better SPL. This helpful resource is ideal for printing, posting, and using in your day-to-day Splunk searches.
Key Questions on Splunk Query Optimization
After the webinar, we set aside time for a question and answer session. Below is a review of the key questions and answers.
Q: Can you explain the difference between a Summary Index and Data Model Acceleration? Why use one over the other?
With Summary Indexing, the key difference is the nuts and bolts of how it works. You’re running a scheduled search that you have defined, and the results of that search are being saved in a Splunk index just like any indexed Splunk data. Summary Indexing was used years ago before Data Model Acceleration was released, but if Splunk was down or needed to be restarted during the time the scheduled search was supposed to run, there would be gaps in the data, because the search results didn’t populate for that time frame. It was difficult to manage Summary Indexing to be sure that results were up-to-date, accurate, and complete.
Data Model Acceleration solved the problem by automating all of this for you. If Splunk misses one of its search intervals, it will catch up the next time the search is run. This ensures that the data is complete over time. Data Model acceleration is an evolution of Summary Indexing. In most cases, like if you’re using a premium product like ES or ITSI, it’s the preferred way to summarize your data.
Q: Is it better to search for something like 127.0.0.1 THEN in a later step, src_ip=127.0.0.1 (to further restrict it)?
There’s a bit of nuance here. On one hand, the raw search power of Splunk is going to be faster, to just search on src_ip=127.0.0.1 in your initial search. But if you limit it later to src_ip=127.0.0.1, then that regex still has to be run. It can depend on the specifics of your search string, but in general, this is the best approach. It may not make much of a difference in terms of search performance, but it may make a difference in terms of better limiting the search.
Q: What is the preferred approach for dealing with nested JSON objects? Mvexpand works well for a short time ranges but uses a lot of memory.
Nested JSON data can be tricky. When you have a JSON event in Splunk, there can be arrays contained in the event, and you’ll end up with multi-valued fields. You might use something like mvexpand to expand those events, but that can be a huge memory eater. For many data sources, a good first step is to try to break the arrays on ingestion. When I ingest that type of data, I might write a response handler in Python and break those events out when possible. If that doesn’t work there are other techniques like spath. But depending on how big your JSON is, you’re probably going to end up with memory issues unless you can find a way to divide that event. It depends on the specifics of your situation.
Splunk query optimization is a large topic and there are many different areas to explore. It’s important to familiarize yourself with Splunk documentation and Splunk reference pages on this topic. These are listed below, along with additional information on query optimization:
- Splunk Docs on Search
- A Quick Guide to Search Optimization
- Aditum’s Quick-Tips for Splunk Search Optimization
- Splunk Security Ninjutsu Part Four (VERY Advanced)
Splunk Query Optimization – One Size Does Not Fit All
When it comes to Splunk query optimization, one size (or one best practice) does not necessarily fit every situation. Organizations implement Splunk in different manners, and what works well for one deployment may not be the best fit for another. That’s where Aditum can add value to your organization’s Splunk deployment.
Aditum’s Splunk Professional Services consultants can assist your team with best practices to optimize your Splunk deployment and get more from Splunk. Our certified Splunk Architects and Splunk Consultants manage successful Splunk deployments, environment upgrades and scaling, dashboard, search, and report creation, and Splunk Health Checks. Aditum also has a team of accomplished Splunk Developers that focus on building Splunk apps and technical add-ons.
Contact us directly to learn more.