Unfortunately, the &persistent
attribute for variables in Zeek has been deprecated. Even though several people in the Zeek community set out good arguments for the retention of this attribute, the Zeek development team has remained completely silent and has chosen to remove the attribute from Zeek.
What's the Problem?
Certainly, we should all be thankful that this wonderful tool remains open source and that active development continues apace! Still, I (and others) view this as a real loss for users. &persistent
provided a very simple and elegant interface that allowed a script author to mark variables that should persist automatically across runs:
global importantCoolData: set[addr] &persistent;
With that simple &persistent
attribute, Zeek would automatically load the data for that variable from disk (a file in a hidden .bst
directory in the Zeek execution directory) and store that state back into the file when execution ended.
The Zeek team chose to eliminate this option in favor of SQLlite backed data stores implemented through the Broker enabled cluster communication module. Unfortunately, for mundane tasks, this represents a significant upfront learning curve for users with no easy-to-use canned interface that directly replaces &persistent
.
Solutions?
While the broker enabled cluster communication system is robust and has tremendous possibilities in terms of external interfaces between Zeek and other tools, I thought it would be worthwhile to explore the feasibility of some boiler-plate code that could be used to replicate &persistent
.
Storing data out seems as though it should be trivial; after all, there is a file
interface that can be used to create and write to arbitrary files. Unfortunately, this has some big limitations. The first is that there is no facility for reading files using this interface. Personally, I have to wonder if this has more to do with Corelight's obvious desire to completely harden their appliances from user tampering; in a practical sense this means that anything that could potentially be used to circumvent those local protections on the sensor have been restricted within the Zeek programming language. So much for the commercial version having "more" or "better" features. It's one of the first times I've been horrified to discover that the commercial version is more restrictive than the open source version of a tool!
Returning to the problem, however, this means that we must rely on the Input Framework to retrieve our stored data. While this does require that the file is structured much like a Zeek log file, it should represent no real challenge:
#fields ip port
8.8.8.8 53/udp
1.2.3.4 80/tcp
While you certainly can't see it here, the fields must be tab delimited in order for the Input Framework to handle it properly:
An example input file could look like this (note that all fields must be tab-separated):
https://docs.zeek.org/en/stable/frameworks/input.html
No problem, right? We'll just write our file using tab delimited output! Here's a first naive approach (be warned, this won't work):
event writeCurrentData(shutdown: bool)
{
local f = open("data_to_persist.tsv");
print f, "#fields\tip\tport";
for(element in importantCoolData) {
print f, fmt("%s\t%s", element$ip, element$port);
}
close(f);
}
Job done! ... or is it... It turns out that there is a huge limitation in the exposed functionality for handling files that now crops its head up. When you view the written file, you will find the following:
#fields0x13ip0x13port
8.8.8.80x1353/udp
1.2.3.40x1380/tcp
I'm sure you can see the problem... Our tabs have been written as the hex representation of the ASCII value. What's going on here? The documentation explains:
If a string contains non-printable characters (i.e., byte values that are not in the range 32 - 126), then the “print” statement converts each non-printable character to an escape sequence before it is printed.
https://docs.zeek.org/en/stable/script-reference/statements.html#keyword-print
Curses! Well, surely fmt
will have something to say about this. After all, the documentation for print
tells us that fmt
will give us more control. Unfortunately, fmt
actually has exactly the same limitation. Our first approach has been, sadly, foiled.
A Messy Alternative
Is there any alternative to using the Broker solution? Yes, but it's messy. In order to write out tab delimited files, we need to take advantage of the Notice Framework. Consider this very messy example:
module DNSServers;
export {
# The data that we are going to persist
global dnsServers: set[addr];
# Even though we really don't need it for our purposes,
# we need to use a record type to read and write using
# the Input and the Notice frameworks.
type Idx: record {
ip: addr &log;
};
# Add our custom log that will be used to persist the data
redef enum Log::ID += {DNSLog};
}
# Define an event that will be used to update the log, which
# will give us checkpoints of the data. The boolean argument
# is intended to give us a way to tell the event not to reschedule
# itself if it is being called from zeek_done
event DNSServers::writeDNSLogs(shutdown: bool)
{
for(server in dnsServers){
Log::write(DNSServers::DNSLog, [$ip=server]);
}
# Check how we were invoked. If we weren't told to shut down,
# reschedule ourselves to output the current state in 30 minutes.
if(!shutdown) {
schedule 30min { DNSServers::writeDNSLogs(F) };
}
}
event zeek_init()
{
# Begin by using the Input Framework to read the current/last
# log file into our set.
#
# THIS NEEDS WORK for production use. As this stands, it
# assumes that the log is in the execution directory, which
# it almost certainly would not be in production. This would
# need to have logic added to grab the log path and then check
# for a current log to read from.
Input::add_table([$source="dnsServers.log", $name="dns",
$idx=Idx, $destination=dnsServers]);
Input::remove("dns");
# Create a new log stream named "dnsServers.log" with a
# stream handle of "dns" that links up to the "DNSLog"
# ID that we added to the LOG redef.
Log::create_stream(DNSLog, [$columns=Idx, $path="dnsServers"]);
# Schedule an event to write our status out in 30 minutes.
# If zeek_done() occurs, this will also force an immediate
# flush of the data to the log file.
schedule 30min { DNSServers::writeDNSLogs(F) };
}
# Just a sample event to find DNS servers and add them
# to our set. This is only here to give us something to store.
event dns_message(c: connection, is_orig: bool, msg: dns_msg, len: count)
{
add dnsServers[c$id$resp_h];
}
# On shutdown, force a flush to disk of the current data.
event zeek_done()
{
event DNSServers::writeDNSLogs(T);
}
While this certainly isn't pretty, it does work. Next time, we'll have a look at the Broker Enabled Cluster Data Store capabilities and how to use them.
David Hoelzer is the author and maintainer of the SANS SEC503 Advanced Intrusion Detection course, the leading class for advanced network analysis in the industry. With more than 30 years of experience in information technology and security, he is the author of and a contributor to a number of open source defensive tools. In addition to acting as the Chief of Operations for Enclave Forensics, Inc., an incident response, secure coding, and managed services corporation, David is also the Dean of Faculty for the SANS Technology Institute (STI).