CPU and Memory monitoring in .NET

(Just as a reminder to myself: this is the first post after my child was born, and now I'm writing while he is cooing and speaking to us in his own baby language.)

Recently, I worked on implementing a throttling mechanism for our service at Microsoft. Typically, services throttle user requests based on a requests per hour limit. For instance, GitHub allows 5000 requests per hour for authenticated users. Our service is called by various internal systems within Microsoft, and we have defined a limit per upstream service on the number of requests they can make per hour. However, we decided to take a slightly different approach to enforcing this policy. Instead of throttling when the number of requests goes above the defined threshold, our throttling enforcement activates when the CPU and memory usage of our pods exceed a threshold, such as 70%. This approach allows our system to handle sudden and transient spikes from our partners without throttling unless resources are constrained.

How do we measure CPU and memory resources of a pod (Yes, we run our service on Kubernetes)? In 2023, the .NET team introduced a package called Microsoft.Extensions.Diagnostics.ResourceMonitoring. One significant advantage of this library is that it provides a uniform interface for measuring resources across different environments and operating systems. While the official documentation illustrates its usage, it misses some internal details about how the library works. I’ve examined their source code and will shed light on its inner workings.

Instantiation

The usage is straightforward. First, we need to instantiate the monitor:

            services.AddResourceMonitoring(builder =>
            {
                builder.ConfigureMonitor(configMonitor =>
                {
                    configMonitor.CpuConsumptionRefreshInterval = TimeSpan.FromSeconds(5.0); ; // optional
                    configMonitor.MemoryConsumptionRefreshInterval = TimeSpan.FromSeconds(5.0); ; // optional
                    configMonitor.SamplingInterval = TimeSpan.FromSeconds(1.0); ; // optional
                });
            });

The above code instantiates a continuous background job that measures CPU and memory resources periodically, as defined by SamplingInterval. What are CpuConsumptionRefreshInterval and MemoryConsumptionRefreshInterval? The reality is that getting CPU and memory percentages is not cheap. So, the library caches the last fetched percentage for the time defined by these intervals. In our example, the CPU and memory percentages will be cached for 5 seconds. Thus, in our example, although the background job runs every second, new values will be reported every 5 seconds.

Usage

How can we read CPU and memory usage? It’s a piece of cake! You just need to inject IResourceMonitor into your class:

public class Foo
{
 private IResourceMonitor _monitor;
 private TimeSpan _utilizationWindow = TimeSpan.FromSeconds(3);

 public Foo(IResourceMonitor monitor)
 {
   _monitor = monitor;
 }

 public int CpuPercentage() => _monitor
                                .GetUtilization(_utilizationWindow)
                                .SystemResources
                                .CpuUsedPercentage;


 public int MemoryPercentage() => _monitor
                                .GetUtilization(_utilizationWindow)
                                .SystemResources
                                .MemoryUsedPercentage;
}

As you can see in the snippet above, the key is in calling GetUtilization. Why do we need to specify a TimeSpan? For calculating the CPU percentage over that time span, the library calculates the total kernel time and user time that your app has consumed and divides it by the total time. For example, if you specify 3 seconds as the utilization window and your service spends 1.5 seconds in both kernel and user space, then the used CPU percentage is 50%.

In our team, we leverage Microsoft.Extensions.Diagnostics.ResourceMonitoring to visualize the health of our system in our dashboards. On top of that, we have defined monitors that will be triggered when pods resource usage goes relatively high.