Datadog Agent’s Persistent Cache

September 15, 2022
|
2
min read

When I first started at RapDev, I heard of a mythical feature of the Datadog Agent class. This feature could only be the agent’s persistent caching functions. These functions allow the agent to read and write from a file located on the host, one that will persist agent & host restarts. This is useful in cases where you want to monitor the changes to the state of an event or check, like an open investigation becoming closed, or the last submitted check status vs the current status.

When the Datadog Agent starts a custom check or integration, the `__init__` definition is run to create an instance of said check/integration. Any variables declared within this instance can be read from and written to while the check runs. These variables can also be referenced in subsequent check runs and will report the state last written to them.

This is where the Datadog Agent’s persistent cache function comes into play. Since the init class is initialized at the agent’s runtime, anything written to it is ephemeral, surviving only as long as the agent continues to run. The persistent cache, in my opinion, alleviates this limitation if implemented correctly.

In order to implement the Agent’s persistent cache in your agent check, the following agent base definitions are used to read from, and write to the cache, respectively. The cache itself is a dictionary object and therefore should be handled as so.

def read_persistent_cache(self, key):
        return self._cache.get(key, '')
   def write_persistent_cache(self, key, value):
        self._cache[key] = value  

My recommendation for implementing these functions into your check are to add helper function definitions for handling your variables. Specifically, a helper function to read from cache, one that implements the read_persistent_cache() function and handles the first time initialization of a check as well as normal check runs. In addition to this, I recommend another helper function to write to cache, ensuring to implement the write_persistent_cache() function, and handle conversions from other types into a JSON formatted string.

The following is an example of how I implemented the use of the Agent’s persistent cache in our Rapid7 investigation integration. I used the persistent cache to track which investigations were already submitted to the event stream to ensure there aren’t duplicate events generated.

def read_from_cache(self):
        # READ LASTEST STATE OF DICT FROM PERSISTENT STORAGE AND LOAD AS JSON
        self.persistent = self.read_persistent_cache('dict')
        # TRY TO LOAD CACHE AS JSON
        try:
            self.persistent = json.loads(self.persistent)
        except ValueError:
            self.persistent = {
                'open': {},
            }
            self.log.error(
                f'ERROR READING PERSISTENT DICTIONARY TO JSON\nCREATING NEW DICT: {self.persistent}')
def write_to_cache(self):
        try:
            self.write_persistent_cache('dict', json.dumps(self.persistent))
        except ValueError:
            self.log.error(
                f'ERROR WRITING DICTIONARY TO CACHE')

 I leave you with some warnings in regards to caching, as its incorrect usage could lead to memory and storage issues. What you write into persistent memory should be of a bounded domain. You must ensure that anything written into persistent memory is handled correctly, and eventually be removed (popped) from the cache.

written by
Mitch Nethercott
Mitch Nethercott
Datadog engineer with experience in network administration and configuration, application/network performance monitoring, and automation using configuration management tools. Born and raised in Connecticut, he’s been using computers since preschool and is more than equipped to troubleshoot a wide variety of problems.
Back to main Blog