| 
 | 
	
 
 
报警问题: 
Too many processes on  
zabbix poller processes more than 75% busy 
zabbix unreachable poller processes more than 75% busy 
 
 
1.通过Zabbix agent采集数据的设备处于moniting的状态但是此时机器死机或其他原因导致zabbix agent死掉server获取不到数据,此时unreachable poller 
就会升高。  
 
2.通过Zabbix agent采集数据的设备处于moniting的状态但是server向agent获取数据时时间过长,经常超过server甚至的timeout时间,此时unreachable poller就会升高。 
 
 
优化思想: 
1.确保zabbix内部组件性能处于被监控状态(调优的基础!) 
 
2.使用硬件性能足够好的服务器 
 
3.不同角色分开,使用各自独立的服务器 
 
4.使用active主动模式 
 
5.zabbixtmp使用tmpfs文件系统 
                  
6.使用分布式部署 
 
7.调整MySQL性能 
 
8.调整Zabbix自身配置 
 
 
 
优化部署: 
1.度量zabbix性能 
通过Zabbix的NVPS(每秒处理数值数)来衡量其性能,在Zabbix的dashboard上有一个粗略的估值 
 
 
2.获得zabbix内部组件工作状态 
 
 
 
3.使用tmpfs文件系统 
cd /  
mkdir zabbixtmp  
chown mysql:mysql zabbixtmp  
vi /etc/fstab #配置/etc/fstab文件  
tmpfs /zabbixtmp tmpfs rw,size=400m,nr_inodes=10k,mod=0700,uid=mysql,gid=mysql 0 0  
 
在配置/etc/fstab参数中需要注意文件的大小设置,一般情况下设成物理内存的8%-10%。 
 
4.使用active模式以及proxy分布式监控 
zabbix_server端当主机数量过多的时候,由Server端去收集数据,zabbix会出现严重的性能问题,主要表现如下: 
(1)当被监控端达到一个量级的时候,web操作很卡,容易出现502 
(2)图层断裂 
(3)开启的进程(pollar)太多,即使减少iteam数量,以后加入一定量的机器也会有问题 
优化考虑方向: 
a.添加proxy节点或Node模式做分布式监控 
b.调整agentd为主动模式 
 
被监控端zabbix_Agentd.conf配置 
vim zabbix_Agentd.conf 
LogFile = /tmp/zabbix_agentd.log 
StartAgents=0 
ServerActive=ip 
Hostname= 
RefreshActiveChecks=1800 
BufferSize=200 
Timeout=10 
 
Serverd端zabbix_server.conf配置调整 
StartPollers=100 
StartTrappers=200 
 
zabbix模板中批量修改成为zabbix agent(active)模式 
 
5.zabbix mysql调优 
[mysqld]  
datadir=/var/lib/mysql  
socket=/var/lib/mysql/mysql.sock  
user=mysql  
 
# Disabling symbolic-links is recommended to prevent assorted security risks  
tmpdir=/zabbixtmp  
#network          
connect_timeout =60  
wait_timeout =5000  
max_connections =400  
max_allowed_packet =16M  
max_connect_errors =400  
#limits  
tmp_table_size =256M  
max_heap_table_size =64M  
table_cache =256  
#logs  
slow_query_log_file =/var/log/slowquery.log  
 
log_error =/var/log/mysql-error.log  
long_query_time =10  
slow_query_log =1  
#innodb  
 
#innodb_data_file_path =ibdata1:128M;ibdata2:128M:autoextend:max:4096M  
innodb_file_per_table =1     #每个table一个文件 
innodb_status_file =1  
 
innodb_additional_mem_pool_size =128M  
innodb_buffer_pool_size =2800M  #一般设为服务器物理内存的70%-80% 
innodb_flush_method =O_DIRECT  
#innodb_io_capacity =1000  
innodb_support_xa =0  
innodb_log_file_size =64M  # zabbix数据库属于写入较多的数据库,因此设置大一点可以避免MySQL持续将log文件flush到表中。 
不过有一个副作用,就是启动和关闭数据库会变慢一点。 
innodb_log_buffer_size =32M  
symbolic-links=0  
#log-queries-not-using-indexes  
thread_cache_size=4  #这个值似乎会影响show global status输出中Threads_created per Connection的hit rate 
当设置成4的时候,有3228483 Connections和5840 Threads_created,hit rate达到了99.2%Threads_created这个数值应该越小越好。 
query_cache_size=128M  
#join_buffer_size=512K  
join_buffer_size=128M  
read_buffer_size=128M  
read_rnd_buffer_size=128M  
key_buffer=128M  
innodb_flush_log_at_trx_commit=2  
[mysqld_safe]  
log-error=/var/log/mysqld.log  
pid-file=/var/run/mysqld/mysqld.pid  
#DsiableHousekeeper=1  #使用分区表时,关闭Houerkeeper 
 
6.调整zabbix工作进程数量 
vim zabbix_server.conf 
StartPollers=90 
StartPingers=10 
StartPollersUnreacheable=80 
StartIPMIPollers=10 
StartTrappers=20 
StartDBSyncers=8 
LogSlowQueries=1000 
 
6.zabbix db partition 
 
step 1.准备相关表 
ALTER TABLE `acknowledges` DROP PRIMARY KEY, ADD KEY `acknowledgedid` (`acknowledgeid`); 
ALTER TABLE `alerts` DROP PRIMARY KEY, ADD KEY `alertid` (`alertid`); 
ALTER TABLE `auditlog` DROP PRIMARY KEY, ADD KEY `auditid` (`auditid`); 
ALTER TABLE `events` DROP PRIMARY KEY, ADD KEY `eventid` (`eventid`); 
ALTER TABLE `service_alarms` DROP PRIMARY KEY, ADD KEY `servicealarmid` (`servicealarmid`); 
ALTER TABLE `history_log` DROP PRIMARY KEY, ADD PRIMARY KEY (`itemid`,`id`,`clock`); 
ALTER TABLE `history_log` DROP KEY `history_log_2`; 
ALTER TABLE `history_text` DROP PRIMARY KEY, ADD PRIMARY KEY (`itemid`,`id`,`clock`); 
ALTER TABLE `history_text` DROP KEY `history_text_2`; 
 
 
step2.设置每月的分区 
以下步骤请在第一步的所有表中重复,下例是为events表创建2011-5到2011-12之间的月度分区。 
ALTER TABLE `events` PARTITION BY RANGE( clock ) ( 
PARTITION p201105 VALUES LESS THAN (UNIX_TIMESTAMP("2011-06-01 00:00:00")), 
PARTITION p201106 VALUES LESS THAN (UNIX_TIMESTAMP("2011-07-01 00:00:00")), 
PARTITION p201107 VALUES LESS THAN (UNIX_TIMESTAMP("2011-08-01 00:00:00")), 
PARTITION p201108 VALUES LESS THAN (UNIX_TIMESTAMP("2011-09-01 00:00:00")), 
PARTITION p201109 VALUES LESS THAN (UNIX_TIMESTAMP("2011-10-01 00:00:00")), 
PARTITION p201110 VALUES LESS THAN (UNIX_TIMESTAMP("2011-11-01 00:00:00")), 
PARTITION p201111 VALUES LESS THAN (UNIX_TIMESTAMP("2011-12-01 00:00:00")), 
PARTITION p201112 VALUES LESS THAN (UNIX_TIMESTAMP("2012-01-01 00:00:00")) 
); 
 
step3.设置每日的分区 
以下步骤请在第一步的所有表中重复,下例是为history_uint表创建5.15到5.22之间的每日分区。 
ALTER TABLE `history_uint` PARTITION BY RANGE( clock ) ( 
PARTITION p20110515 VALUES LESS THAN (UNIX_TIMESTAMP("2011-05-16 00:00:00")), 
PARTITION p20110516 VALUES LESS THAN (UNIX_TIMESTAMP("2011-05-17 00:00:00")), 
PARTITION p20110517 VALUES LESS THAN (UNIX_TIMESTAMP("2011-05-18 00:00:00")), 
PARTITION p20110518 VALUES LESS THAN (UNIX_TIMESTAMP("2011-05-19 00:00:00")), 
PARTITION p20110519 VALUES LESS THAN (UNIX_TIMESTAMP("2011-05-20 00:00:00")), 
PARTITION p20110520 VALUES LESS THAN (UNIX_TIMESTAMP("2011-05-21 00:00:00")), 
PARTITION p20110521 VALUES LESS THAN (UNIX_TIMESTAMP("2011-05-22 00:00:00")), 
PARTITION p20110522 VALUES LESS THAN (UNIX_TIMESTAMP("2011-05-23 00:00:00")) 
); 
 
 
手动维护分区: 
增加新分区 
ALTER TABLE `history_uint` ADD PARTITION ( 
PARTITION p20110523 VALUES LESS THAN (UNIX_TIMESTAMP("2011-05-24 00:00:00")) 
); 
 
删除分区(使用Housekeepeing) 
ALTER TABLE `history_uint` DROP PARTITION p20110515; 
 
 
step4.自动每日分区 
确认已经在step3的时候为history表正确创建了分区。 
以下脚本自动drop和创建每日分区,默认只保留最近3天,如果你需要更多天的,请修改 
@mindays 这个变量。 
 
 
不要忘记将这条命令加入到你的cron中! 
mysql -B -h localhost -u zabbix -pPASSWORD zabbix -e "CALL create_zabbix_partitions();" 
 
 
自动创建分区的脚本: 
https://github.com/xsbr/zabbixzo ... utopartitioning.sql 
 
 
DELIMITER // 
DROP PROCEDURE IF EXISTS `zabbix`.`create_zabbix_partitions` // 
CREATE PROCEDURE `zabbix`.`create_zabbix_partitions` () 
BEGIN 
CALL zabbix.create_next_partitions("zabbix","history"); 
CALL zabbix.create_next_partitions("zabbix","history_log"); 
CALL zabbix.create_next_partitions("zabbix","history_str"); 
CALL zabbix.create_next_partitions("zabbix","history_text"); 
CALL zabbix.create_next_partitions("zabbix","history_uint"); 
CALL zabbix.drop_old_partitions("zabbix","history"); 
CALL zabbix.drop_old_partitions("zabbix","history_log"); 
CALL zabbix.drop_old_partitions("zabbix","history_str"); 
CALL zabbix.drop_old_partitions("zabbix","history_text"); 
CALL zabbix.drop_old_partitions("zabbix","history_uint"); 
END // 
DROP PROCEDURE IF EXISTS `zabbix`.`create_next_partitions` // 
CREATE PROCEDURE `zabbix`.`create_next_partitions` (SCHEMANAME varchar(64), TABLENAME varchar(64)) 
BEGIN 
DECLARE NEXTCLOCK timestamp; 
DECLARE PARTITIONNAME varchar(16); 
DECLARE CLOCK int; 
SET @totaldays = 7; 
SET @i = 1; 
createloop: LOOP 
SET NEXTCLOCK = DATE_ADD(NOW(),INTERVAL @i DAY); 
SET PARTITIONNAME = DATE_FORMAT( NEXTCLOCK, 'p%Y%m%d' ); 
SET CLOCK = UNIX_TIMESTAMP(DATE_FORMAT(DATE_ADD( NEXTCLOCK ,INTERVAL 1 DAY),'%Y-%m-%d 00:00:00')); 
CALL zabbix.create_partition( SCHEMANAME, TABLENAME, PARTITIONNAME, CLOCK ); 
SET @i=@i+1; 
IF @i > @totaldays THEN 
LEAVE createloop; 
END IF; 
END LOOP; 
END // 
DROP PROCEDURE IF EXISTS `zabbix`.`drop_old_partitions` // 
CREATE PROCEDURE `zabbix`.`drop_old_partitions` (SCHEMANAME varchar(64), TABLENAME varchar(64)) 
BEGIN 
DECLARE OLDCLOCK timestamp; 
DECLARE PARTITIONNAME varchar(16); 
DECLARE CLOCK int; 
SET @mindays = 3; 
SET @maxdays = @mindays+4; 
SET @i = @maxdays; 
droploop: LOOP 
SET OLDCLOCK = DATE_SUB(NOW(),INTERVAL @i DAY); 
SET PARTITIONNAME = DATE_FORMAT( OLDCLOCK, 'p%Y%m%d' ); 
CALL zabbix.drop_partition( SCHEMANAME, TABLENAME, PARTITIONNAME ); 
SET @i=@i-1; 
IF @i <= @mindays THEN 
LEAVE droploop; 
END IF; 
END LOOP; 
END // 
DROP PROCEDURE IF EXISTS `zabbix`.`create_partition` // 
CREATE PROCEDURE `zabbix`.`create_partition` (SCHEMANAME varchar(64), TABLENAME varchar(64), PARTITIONNAME varchar(64), CLOCK int) 
BEGIN 
DECLARE RETROWS int; 
SELECT COUNT(1) INTO RETROWS 
FROM `information_schema`.`partitions` 
WHERE `table_schema` = SCHEMANAME AND `table_name` = TABLENAME AND `partition_name` = PARTITIONNAME; 
 
IF RETROWS = 0 THEN 
SELECT CONCAT( "create_partition(", SCHEMANAME, ",", TABLENAME, ",", PARTITIONNAME, ",", CLOCK, ")" ) AS msg; 
SET @sql = CONCAT( 'ALTER TABLE `', SCHEMANAME, '`.`', TABLENAME, '`', 
' ADD PARTITION (PARTITION ', PARTITIONNAME, ' VALUES LESS THAN (', CLOCK, '));' ); 
PREPARE STMT FROM @sql; 
EXECUTE STMT; 
DEALLOCATE PREPARE STMT; 
END IF; 
END // 
DROP PROCEDURE IF EXISTS `zabbix`.`drop_partition` // 
CREATE PROCEDURE `zabbix`.`drop_partition` (SCHEMANAME varchar(64), TABLENAME varchar(64), PARTITIONNAME varchar(64)) 
BEGIN 
DECLARE RETROWS int; 
SELECT COUNT(1) INTO RETROWS 
FROM `information_schema`.`partitions` 
WHERE `table_schema` = SCHEMANAME AND `table_name` = TABLENAME AND `partition_name` = PARTITIONNAME; 
 
IF RETROWS = 1 THEN 
SELECT CONCAT( "drop_partition(", SCHEMANAME, ",", TABLENAME, ",", PARTITIONNAME, ")" ) AS msg; 
SET @sql = CONCAT( 'ALTER TABLE `', SCHEMANAME, '`.`', TABLENAME, '`', 
' DROP PARTITION ', PARTITIONNAME, ';' ); 
PREPARE STMT FROM @sql; 
EXECUTE STMT; 
DEALLOCATE PREPARE STMT; 
END IF; 
END // 
DELIMITER ; 
 
小结:优化的思想就是当被机器越来越多时 
1.        增加zabbix工作进程数量 
2.        采用active模式,由agent端主动发送数据 
3.        采用proxy进行分布式监控 
4.        mysql调优 
 
 
 
 
 
 
 
 
 
 
 |   
 
 
 
 |