Net-SNMP Free Memory Custom Trap

theocharotas · Post by **theocharotas** » 2017/08/09 22:12:25

Hello everyone,

I would like your help about the configuration of SNMP Traps regarding free memory. I need to send SNMP traps to a remote server when memory is on critical state. UCD-SNMP-MIB mib includes several different metrics that can be used (memAvailReal,memAvailSwap,memTotalreal) but I think none of them is absolutely indicative of the memory state (as the system uses as much memory as it can also for cache and buffers). I think that the best metric to be checked would be the following:

1. memTotalFree + memBuffer + memCached
2. Committed memory as it is shown by "sar" command (% or in KB).

F.e.
[root@c7-min-ansible mibs]# sar -r 1 1
09:47:43 PM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact kbdirty
09:47:44 PM 1489440 387192 20.63 23220 181464 268564 6.76 165368 113892 0
Average: 1489440 387192 20.63 23220 181464 268564 6.76 165368 113892 0

Is there any possibility to add/subtract values in monitor command of /etc/snmp/snmpd.conf (or to use variables) ? I mean something like that:
monitor -r 60 MymemValue (memTotalFree - memBuffer - memCached) > 2000000
Or to use by any means the values returned by sar commands ? (kbcommit, %commit)

Now I'm testing on an environment that uses (net-snmp-5.7.2-24.el7.x86_64) but I have also tested it in a fully patched server.

Thanks for your help.

aks · Post by **aks** » 2017/08/17 17:55:47

You'll never get a 100% accurate picture of RAM as it is by nature changeable.
At best you can ask for a "snapshot" at this point in time.
What you can do is watch for change over time and then attempt to predict a future consumption based on previous observations. For example at 08h00 every day we "know" (based on past observation) 200MB of RAM is "grabbed" by a process for (say) 2 minutes. We also "know" that other processes grab (say) between 9 and 100KB around the same time. Our current observation indicates we have 190MB of RAM free (including buffers if you are that concerned), but our swap space is 50GB and only 10% used. Should we alert? The system will (probably) swap but not FAIL. Performance will suffer but the system will not FAIL. What you want to know, failure or performance degradation or both?
BTW, cache will be freed to satisfy app requests. Buffers will be freed on close (unless a bug exists....)
I usually use HOST_MIB's hrStorageUsed and hrStorageSize to perform these calculations (you just need to work out what the instance's RAM OID is - look at the index). The frequency of your observations is also quite important - for example if you read every 5 minutes you could miss a huge spike in your app. But I guess we can "toss" that one of the fence and blame the performance testers

Also it worth mentioning that just because SNMP now says memory is X, does that mean memory is X? Probably not, when last did SNMP execute (i.e.: "freshness" of the data)? What is the latency between the target and the monitoring system? How many page refresh/CPU cycles have happened between now and then?
And all these things are app specific. For example, in my day job, I'll happily issue a warning when 75% of RAM is used and "go critical" at 85% for a specific system (i.e.: app or set of apps), because I know that at an undetermined (i.e.: event based) time, the system will consume a whole bunch of RAM (something approaching 20%) - but only for performance reasons - my system is provisioned with sufficient resources for what it is doing (generally) and I'll happily fall back to swapping to keep the system alive. Where as with other application(s) - "better behaved" - my words - I'll warn at 95%. It is a very complex topic and specific to the behaviour of the application(s) over time.
Decide what you are monitoring for. Then sample the behaviours and select specific things that are indicative of future behaviour (so at the first point of call, "test to destruction"). Yes it is hard, but if you can pull it off, people will think that you can pull rabbits out of a hat .... (and no, I can't).

theocharotas · Post by **theocharotas** » 2017/08/18 21:32:04

Thank for your kind response aks about the general perspective I should follow in the future.

On that case, I would be happy just to monitor the summary of this equation MemoryMonitored = memTotalFree - memBuffer - memCached. Does snmpd.conf supports the calculation between metrics that return INTEGER32? Could you please provide me with an example of the format used? I mean something like that (monitor -r 60 MymemValue memTotalFree > 2000000). The main reason of my intention to change this is that "memTotalFree" monitoring produces several false alarms. Could you send an example also about the format you are using for hrStorageSize, hrStorageUsed?

Thanks for your help.

aks · Post by **aks** » 2017/08/22 16:16:40

I don't place them in the pr flags part (in snmpd.conf aka SECTION: Monitor Various Aspects of the Running Host). I just poll - so that won't trap - it's active from the monitor(s) rather than a trap sent by target server. I chose this because (depending on the problem) the trap might not "get there" or might be missed in some manner and RAM is so important for just about all my applications.

First I get the index for the instance I'm interested in. To do so you can query

Code: Select all

HOST-RESOURCES-MIB::hrStorageDescr

table, which will returns strings matching what you after, for example:

Code: Select all

HOST-RESOURCES-MIB::hrStorageDescr.1 = STRING: Physical memory
HOST-RESOURCES-MIB::hrStorageDescr.3 = STRING: Virtual memory
HOST-RESOURCES-MIB::hrStorageDescr.6 = STRING: Memory buffers
HOST-RESOURCES-MIB::hrStorageDescr.7 = STRING: Cached memory
HOST-RESOURCES-MIB::hrStorageDescr.10 = STRING: Swap space
...

Although the type table would yield similar results (albeit OIDs not strings):

Code: Select all

HOST-RESOURCES-MIB::hrStorageType.1 = OID: HOST-RESOURCES-TYPES::hrStorageRam
HOST-RESOURCES-MIB::hrStorageType.3 = OID: HOST-RESOURCES-TYPES::hrStorageVirtualMemory
HOST-RESOURCES-MIB::hrStorageType.6 = OID: HOST-RESOURCES-TYPES::hrStorageOther
....

So if I want Physical memory (which is index 1 in the above example), I can query:

Code: Select all

HOST-RESOURCES-MIB::hrStorageSize.1

for size.

Code: Select all

HOST-RESOURCES-MIB::hrStorageUsed.1

for used.

Code: Select all

HOST-RESOURCES-MIB::hrStorageAllocationUnits.1

for allocation units (usually 1024 bytes for RAM).
All returns are integers.
The important part is to always use the same index value for the instance you're using (so 1 in the above example) across all queries.
Simple eh?

CentOS

Net-SNMP Free Memory Custom Trap

Net-SNMP Free Memory Custom Trap

Re: Net-SNMP Free Memory Custom Trap

Re: Net-SNMP Free Memory Custom Trap

Re: Net-SNMP Free Memory Custom Trap