Zabbix
Links
- Homepage
- zabbix 4 database schema
- Zabbix compatibility matrix
- https://www.digitalocean.com/community/tutorials/introduction-to-queries-mysql
- compilation instructions
- Documentation
- Examples of Common Queries
- Custom scripts
- Various scripts to automate tasks in Zabbix
- Tuning mysql for zabbix
- https://huyabbix.com
- Migrating zabbix database with minimal downtime
- Zabbix Bug tracker
- Clean up database
- Zabbix and selinux
- Apache/SSL checks
- Zabbix on RHEL/Centos
- Grafana
- https://blog.zabbix.com/zabbix-ha-cluster-setups/8264/ Zabbix HA cluster]
- Active vs Passive
- Very old network diagram
- Fighting zabbix alert floods
- zabbix-cli
Documentation
Triggers
Function str() searches for substrings
AND OR case
not, and and or operators are case-sensitive and must be in lowercase. They also must be surrounded by spaces or parentheses.
Installation
Installing Zabbix from git
git clone https://github.com/zabbix/zabbix.git cd zabbix ./bootstrap.sh ./configure --help autoreconf -fvi
If you don't have a Makefile, try
./config.status Makefile
and then
./configure
again
Zabbix API
Zabbix agent paths
Ubuntu:
/etc/zabbix/zabbix_agentd.conf.d/ /etc/zabbix/zabbix_agentd.conf
Simple check
Zabbix and SNMP
External check
Scripts usually in /usr/lib/zabbix/externalscripts/
Zabbix error codes
Z3005
Database issue
Items
Item dialog
Units
- B
- uptime
- unixtime
- s
proc.mem
proc.mem[<name>,<user>,<mode>,<cmdline>,<memtype>]
name
??
cmdline
regex like php-fpm:
memtype
Item preprocessing
Preprocessing regular expressions
See Regular expressions: example
XML/xpath preprocessing
https://blog.zabbix.com/zabbix-xpath-preprocessing/7936/
NOTE xq -x does not want the number() bit
Incorrect value for field "Prev. time": a relative time is expected.
Prev. Time should be something like
now-30s
Windows performance counters
https://www.zabbix.com/documentation/current/en/manual/config/items/perfcounters
Templates
Template App MySQL
https://github.com/tiramiseb/zabbix-templates/blob/master/Template%20App%20MySQL.txt TODO shouldn't this be user zabbix?
mysql user account:
create user 'monitor'@'localhost' identified by auth_socket; grant PROCESS,SHOW DATABASES,SHOW VIEW on *.* to 'monitor'@'localhost'; flush privileges;
Configuration
Zabbix agent active
On client
Have port 10051 open and:
ActiveServer zabbix.ser.ver
On server
Set Agent IP to 0.0.0.0
Zabbix and SQL
Find hosts with hostmacro defined
select h.host, m.macro, m.value from hosts h, hostmacro m where macro like '%FOO%' and h.hostid = m.hostid;
most frequent items in history_uint
select itemid,count(itemid) as freq from history_uint group by itemid order by freq desc limit 5;
and then
select name from items where itemid = whateveryoufind;
HOWTO
Define discovery filters
LLD with JSON
- LLD with JSON and dependent items
- https://www.zabbix.com/forum/zabbix-help/383827-json-and-lld-understanding
- https://www.zabbix.com/forum/zabbix-troubleshooting-and-problems/456663-lld-macros-with-json
if you want multiple keys, use jsonpath like
$[?(@.share=='{#FSTYPE}' && @.name=='{#NAME}')].size.first()
testing jsonpath preprocessing
In Value paste valid json, then name {#NAME} value somevalue
Test trapper
Reset admin password
Mysql prompt:
select * from user where username='Admin';
bcrypt your new password:
htpasswd -nbBC 10 USER YOURPASSWORD|awk -F ':' '{ print $2 }'
Mysql prompt:
update user set passwd = 'your bcrypted pass' where userid = 1
Zabbix and PSK
See:
FAQ
Where is last userid stored?
ids:
table_name: users field_name: userid nextid: the next id
SERVER
Adjust loglevel
zabbix_server --runtime-control log_level_increase=trapper
Reload zabbix server configuration
You can't, but you might want
zabbix_server -c /etc/zabbix/zabbix_server.conf -R config_cache_reload
No media defined for user
The frontend does not match Zabbix database.
Probably version conflict between frontend and server
value cache working in low memory mode
Increase ValueCacheSize
Message from 1.2.3.4 is missing header. Message ignored.
PROXY
cannot send proxy data to server
empty string received
failed to update local proxy configuration copy: unexpected field "host_inventory.type"
Front end
Round numbers
Preprocessing javascript 2 decimals:
return Math.round(value* 100) / 100
0 decimals:
return Math.round(value)
Visable name vs hostname
Visible name: {HOST.NAME}
Hostname: {HOST.HOST}
Host IP: (as defined in Interface->IP/DNS) {HOST.CONN}
Acknowledge multiple items
Monitor->Problems apply filters, select all, mass update
No permissions to referred object or it does not exist!
Graph no longer exists. Probably items no longer discovered
Maybe you've been editing a template file, remember to replace template name everywhere
Cannot add host
??
Monitoring SNMP
Cannot find host interface on "xxxhost" for item key foo
Might mean you're trying to import an SNMP template before configuring SNMP for the host
No SNMP data
snmp_parse_oid(): cannot parse OID "IF-MIB::ifSpeed.3
Timeout while connecting
Could be wrong community string, remember delay when using proxy.
Agent side ping check
UserParameter=pingtime[*],fping -e $1|sed 's/^.*(\([0-9].*\) ms).*$/\1/g' UserParameter=pingalive[*],fping $1|grep -q alive;echo $?
LLD/Discovery
Discover: value must be a JSON object
Could mean you need to escape slashes, check output with zabbix_get
Cannot create item: item with the same key already exists
make sure the key contains "{#SOMENAME}"
Discovery data example
Output of a discovery script should look like:
{"data":[ {"{#VAR1}":"value11","#{VAR2":"value12"}, {"{#VAR1}":"value21","#{VAR2":"value22"} ]}
IPMI
IPMI Monitoring account for zabbix
https://www.thomas-krenn.com/en/wiki/Configuring_IPMI_under_Linux_using_ipmitool
ipmitool user set name 3 monitor ipmitool user set password 3 ipmitool channel setaccess 1 3 link=on ipmi=on callin=on privilege=2 ipmitool user enable 3
To test these:
ipmitool -I lanplus -H <host> -L USER -U monitor sdr elist full
Zabbix credentials
Privilege Level
User
Authentication algorithm
Default
but what is that?
cannot connect to IPMI host: [22] Operation canceled
Usually temporary because of broken ipmi lib, ignore it
cannot connect to IPMI host: [16777411] Unknown error 16777411
classic, probably authentication problem
cannot connect to IPMI host: [22] Invalid argument
wrong password/credentials?
zabbix_sender
processed: 0; failed: 1
Possible causes:
- incorrect hostname
- incorrect item key
- item not in the server configuration cache yet
- Allowed hosts in trapper item
- phase of moon
- aliens
Testing zabbix_sender
zabbix_sender stuff
Filters
The regular expressions referred to in discovery are found under Administration->General, and then "Regular expressions" in the dropdown at top right of the page
cannot connect to IPMI host: [125] Operation canceled
possibly authentication method issue
Calculated items
See Calculated items explained
Cannot create item: Invalid first parameter
For calculated items use last("youritemkey")
Cannot create item, error in formula
Problably a calculated item, try doublequoting the item key:
last("foo[bar]")
Invalid parameter "/1/params"
Maybe forgot to use last()? You might need to doublequote your items, or prepend with double slashes
Reset trigger/alert
For example when you changed the settings Just disable, wait a bit and enable again.
Install recent zabbix on CentOS/RHEL
rpm -ivh https://repo.zabbix.com/zabbix/3.4/rhel/7/x86_64/zabbix-release-3.4-2.el7.noarch.rpm yum install zabbix-agent
Backing up tables
https://www.zabbix.org/wiki/Docs/howto/mysql_backup_script
cannot send list of active checks
Verify hostname:
/usr/sbin/zabbix_agentd -t 'agent.hostname'
If in agent log: most likely ServerActive is defined in agent config, while not used at all
It is also possible agent is sending some active check to server while host is monitored via proxy.
In proxy/server log: most likely Hostname in agent config does not match hostname used on server.
cannot send list of active checks to "127.0.0.1": host [Zabbix server] not found
??
active check configuration update started to fail
??
Latest 20 issues
DEFAULT_LATEST_ISSUES_CNT in/usr/share/zabbix/include/defines.inc.php
Zabbix unreachable poller processes more than 75% busy
Increase StartPollersUnreachable
Zabbix poller processes more than 75% busy
another mystery
More than 100 items having missing data for more than 10 minutes
Could be high load. Also check Administration->Queue
Zabbix escalator processes more than 75% busy
probably high system load overall
Check agent
zabbix_get -s my.host.com -k agent.version
ZBX_NOTSUPPORTED
Could be anything, enable logging on agent. It could be version mismatch. Check
zabbix_get -s yourhost -k agent.version
If that works, you're calling for an undefined or unsupported key.
Incorrect trigger expression. Host "xx" does not exist or you have no access to this host.
Means there's no related item.
zabbix_get returns nothing
best look at log on agent side
run playbook on single host
ansible_playbook -l somehost somplay.yml
Zabbix server is not running: the information displayed may not be current
Might be selinux: http://sysads.co.uk/2013/11/zabbix-server-running-alert/
Monitoring vmware
vmware.hv.cpu.usage[{$URL},{HOST.HOST}]" became not supported: Couldn't resolve host name
Set macro {$URL} to https://your.ip/sdk/ (shouldn't discovery figure that out from {$HOST} ?
Couldn't resolve host name
Sometimes it's a matter of waiting a few hours
vmware events collector returned empty result
???
No "vmware collector" processes started.
Check StartVMwareCollectors on server or proxy
unsupported item key
This might mean it's expecting a value from the script you're calling.
echo 1
remember: not supported is not disabled, server/proxy will try again after interval
became not supported: Not supported by Zabbix Agent
probably output by userparameter/script
ansible or API not showing host groups
Permissions!! See Administration->User Groups
failed to update local proxy configuration copy: invalid field name "items.lastlogsize"
check everything :)
Received value [11] is not suitable for value type [Numeric (unsigned)] and data type [Decimal]
This probably means the agent returned 1\n1
database is down: retrying in 10 seconds
try upping max_connections
[Incorrect key file for table 'items'; try to repair it
Could be something /tmp related
another network error, wait for 8 seconds
UnreachableDelay=8
failed: first network error
Setting Timeout in server configuration
also Timeout in agents?
no active checks on server
- Hostname in agent config (-kagent.hostname) must match name on server
- simple no connection possible? firewall?
- maybe the host was configured to be monitored by (another) proxy?
show cpu utilization
Monitoring->host->graphs
fuzzytime on command line
TS=lotsofseconds
- output in hours
echo $(( ($(date +%s) - $TS) / 3600 ))
duplicate entry adding user/group
Check table 'ids'
Troubleshooting
zabbix_get: no route to host
Check the firewall