The following two tabs change content below.
Hi, I have written and developed this site to share my experience and ideas with other colleagues. I also started to prepare interview questions and answers for job seekers. I hope it will help you a lot.

Hi, Today I am going to explain how to monitor Linux/Window/AIX processes/services in zabbix using SNMP. As we know when we are starting a application in system then in that system one process/service are getting started in with a unique service/process id to that application thread.

So in a system we can start one or many services/processes and we can get the list of all processes/services using below main OID.

1.3.6.1.2.1.25.4.2.1 (hrSWRunEntry)

Note: This is main OID to get the list of processes/services and it have different field to get the detail about all services/processes.

Fields are given below:

1.3.6.1.2.1.25.4.2.1.1 (hrSWRunIndex) //Every services/processes have different index for every thread and this attribute will give the help to get name and other attribute of services/processes.

1.3.6.1.2.1.25.4.2.1.2 (hrSWRunName)  //Process or Services name
1.3.6.1.2.1.25.4.2.1.3 (hrSWRunID)        // The product ID of this running piece of software
1.3.6.1.2.1.25.4.2.1.4 (hrSWRunPath)  //Application path from where its running
1.3.6.1.2.1.25.4.2.1.5 (hrSWRunParameters) //Paramater which we passed to run application
1.3.6.1.2.1.25.4.2.1.6 (hrSWRunType)   //its have 4 state (1-unknown, 2-operatingSystem, 3-deviceDriver, 4-application)
1.3.6.1.2.1.25.4.2.1.7 (hrSWRunStatus) //It will show the services/processes status and will have 4 state as mention (1-running, 2-runnable, 3-notRunnable, 4-invalid)

 

Now lets create discovery rule in zabbix. (I hope you already onboarded host for which we are going to monitor services/process)

1.Click on “Create discovery rule” button under “discovery rule” menu.

discovery_rule

2. Now fill the form of discovery rule

discovery_rule_form

 

name: Discovery rule name

type: discovering will happen using SNMP so choosing SNMPv2 Agent

key: any unique string (without space) we can use as a key.

SNMP OID: give the oid which we need to monitor using prototype items like item name/status/parameter. So giving all attribute main oids to get all required details and mapping the value appropriate macro like “{IFNAME}”

discovery[{#SNMPVALUE},1.3.6.1.2.1.25.4.2.1.4,{#IFADMINSTATUS},1.3.6.1.2.1.25.4.2.1.7,{#IFNAME},1.3.6.1.2.1.25.4.2.1.2,{#IFTYPE},1.3.6.1.2.1.25.4.2.1.5,{#STYPE},1.3.6.1.2.1.25.4.2.1.6]

SNMP community: its a token which used for snmp connection with given server.

update interval: after how much time interval again zabbix will do the discovery. normally keep it as mush high you can like i have given 4 hours.

keep lost resource period : If item become invalid and not available on that server anymore then after how much time that item will be deleted. I kept 1 hour.

filter tab: give the filter based on macro if you want to monitor only given processes / services.

Once we done with all these steps click on “add” button.

 

Now lets create the prototype item with services/processes names and lets monitor services/processes status.

“create item prototype” menu link and fill the form like below:

lld_rule_form

Name: I have given ## {#SNMPINDEX} | {#IFNAME} | {#SNMPVALUE} | {#IFTYPE} and used macro to genrate dynamic index and name of processes/services and process/service status (up/down) status.

Type: SNMPv2 agent

Key: should be unique so given {#SNMPINDEX} as it will be unique for all services/processes. Sometime its better to use {#IFNAME} name also but it depends on the situation.

SNMP OID: As we given the name/type/parameter in the name itself so here we are using OID to get the services/processes status “1.3.6.1.2.1.25.4.2.1.7.{#SNMPINDEX}”

SNMP Community : {$SNMP_COMMUNITY}

Type of information: Means what type of information will be return by SNMP OID (1.3.6.1.2.1.25.4.2.1.7.{#SNMPINDEX}). In this example we use character for some other reason but you can use “Numeric(unsigned)” also.

Update interval: After how much time its will again check the status of services/processes. I have given 180 means 3 minute. if you have any criteria means after every week or everyday then you can also give custom interval option.

Storage period : This field value will be used by housekeeper so if you are not using zabbix default housekeeper then you can give any value in this field else it will keep this item history only for specified time.

History storage period: means history storage time period.

Trend storage period: means trends (aggregate) data storage time period.

application: we can create new application and assign this item to that application and also we can assign this item to existing application. application will give the help to make group of application for filter and other stuff.

Create enable: its the main option to enable this so that it will genrate items for all services/processes. If its disable it will not genrate item for services/processes.

Note: if you want to perform some pre-calculation or pre-operation then you can use “preprocessing” tab. which is given beside the “item prototype” on top.

 

One we done with all these details simply click on “add” button to add item prototype. and it will be look like this:

LLD_rule_list

Note: After completing the “time interval” of discovery rule it will discover all services/processes from that particuler system.

Now if you want to fire the trigger if any service/process is down then we also need to create “trigger prototype”. Lets see how to create trigger prototype.

1.Click on “trigger prototype” and again click on “create trigger prototype” button and fill the details as per choice.

trigger_prototype_item

2. Name : Give the name with proper services/processes name because in problem screen this name will be visible with issue status.

## {#SNMPINDEX} | {#IFNAME} | {#SNMPVALUE} | {#IFTYPE}: required process not running

Severity: means how crital this issue like its just information or its major/crital issue.

Problem expression: this is a condition and in this section we are using below format:

{<<HOST>>:<<ITEM_KEY>>.<<FUNCTION>>}<<comparison operator>><<value to compare>>

{<<HOSTNAME>>:process[status.{#SNMPINDEX}].last()}<>1

Note: for this above item using SNMP we will get anyone one from these (1-running, 2-runnable, 3-notRunnable, 4-invalid) so i have used value<>1 means issue.

and in resulation value = 1 means again service/process is back to normal condition.

Recovery expression : {<<HOST>>:process[status.{#SNMPINDEX}].last()}=1

leave the other fields and directly jump on “Allow manual close” check this checkbox if you want to close this event from acknowledge option from problem page.

“create enable” again this is used to genrate trigger for all discover items.

and click on “add” button.

Now we are done with setup and lets wait till discovery execution time. Once discovery completed you will see the discover item with trigger as below:

services_listing

 

 

Thats it. Please share your feedback and comments. Read how to monitoring CPU overall utilization.

 

158 total views, 5 views today

Leave a Reply

Your email address will not be published. Required fields are marked *