guyuehhh 发表于 2018-9-30 10:07:23

MySQL压测--注意事项和FAQ

  上次我们讲了TPCC的安装和一些使用方法,今天先不着急真正的做MySQL压力测试,先把我最近做压力测试的遇到的一些问题罗列出来,这样
  后面做测试可以事半功倍。
  1.注意事项
  
  (1).提前规划好具体要测试什么,即测试目的,比如

[*]  MySQL5.6与5.7的性能差异;
[*]  异步复制和半同步复制的TPS;
[*]  设置双1(innodb_flush_log_at_trx_commit=1、sync_binlog=1)性能对数据库性能影响多少;
[*]  确认即将上线的新业务对MySQL负载影响多少,是否能承载得住,是否需要对服务器进行扩容或升级配置;
  (2).压力测试比较耗时,我们不可能时刻监督每次压测什么时候结束,然后手动的进行下一次测试,所以要写个脚本帮助我们一键测试;
  (3).根据测试环境的硬件配置给予机器适当的压力,对于有限的硬件配置,不能无限的增压,这样数据库里全是lock wait timeout等报错,最后会使MySQL Crash的,这个可以参考大师叶老师的博客《MySQL压力测试基准值》
  http://imysql.com/2015/07/28/mysql-benchmark-reference.shtml
  2.FAQ
  这一块主要讲测试的时候系统和数据库出现的问题,导致压测无法进行
  (1)tpcc数据库在预热rampup时候的报错
sht-sgmhadoopcm-01:mysqladmin:/usr/local/mysql/tpcc-mysql-master/logs:>less tpcc_runlog_parameter_20180620234749_512_THREADS  
RAMP-UP TIME.(120 sec.)
  
1461, 42000, Can't create more than max_prepared_stmt_count statements (current value: 16382)
  
1461, 42000, Can't create more than max_prepared_stmt_count statements (current value: 16382)
  
1461, 42000, Can't create more than max_prepared_stmt_count statements (current value: 16382)
  
1461, 42000, Can't create more than max_prepared_stmt_count statements (current value: 16382)
  
1461, 42000, Can't create more than max_prepared_stmt_count statements (current value: 16382)
  
1461, 42000, Can't create more than max_prepared_stmt_count statements (current value: 16382)
  
1461, 42000, Can't create more than max_prepared_stmt_count statements (current value: 16382)
  
1461, 42000, Can't create more than max_prepared_stmt_count statements (current value: 16382)
  
1461, 42000, Can't create more than max_prepared_stmt_count statements (current value: 16382)
  
1461, 42000, Can't create more than max_prepared_stmt_count statements (current value: 16382)
  
1461, 42000, Can't create more than max_prepared_stmt_count statements (current value: 16382)
  
1461, 42000, Can't create more than max_prepared_stmt_count statements (current value: 16382)
  解决方法:
  错误很明显,参数max_prepared_stmt_count(默认16382)现在的大小已经无法满足负载需求了,这个参数是限制数据库中已经准备的statements数量,即sql数量,我们把它增大十倍。
  sht-sgmhadoopdn-02.telenav.cn:mysqladmin:/usr/local/mysql/data:>vim /ect/my.cnf
  max_prepared_stmt_count = 163820
  然后重启MySQL
  mysql> show variables like '%max_prepared_stmt_count%';
  +-------------------------+-------+
  | Variable_name         | Value |
  +-------------------------+-------+
  | max_prepared_stmt_count =| 163820 |
  +-------------------------+-------+
  参数官方解析:
  This variable limits the total number of prepared statements in the server. It can be used in environments where there is the potential for denial-of-service attacks based on running the server out of memory by preparing huge numbers of statements. If the value is set lower than the current number of prepared statements, existing statements are not affected and can be used, but no new statements can be prepared until the current number drops below the limit. The default value is 16,382. The permissible range of values is from 0 to 1 million. Setting the value to 0 disables prepared statements.
  (2)MySQL负载过高,导致crash
  数据库日志也出现大量的日志信息:
sht-sgmhadoopdn-02.telenav.cn:mysqladmin:/usr/local/mysql/data:>vim sht-sgmhadoopdn-01.err  
2018-06-20T17:04:42.935422Z 0 InnoDB: page_cleaner: 1000ms intended loop took 5690ms. The settings might not be optimal. (flushed=1119 and evicted=0, during the time.)
  
2018-06-20T17:05:45.061790Z 0 Error in accept: Too many open files
  
2018-06-20T17:05:47.846070Z 0 InnoDB: page_cleaner: 1000ms intended loop took 5057ms. The settings might not be optimal. (flushed=912 and evicted=0, during the time.)
  
2018-06-20T17:06:09.564302Z 0 InnoDB: page_cleaner: 1000ms intended loop took 4569ms. The settings might not be optimal. (flushed=913 and evicted=0, during the time.)
  
2018-06-20T17:06:35.456128Z 0 InnoDB: page_cleaner: 1000ms intended loop took 5822ms. The settings might not be optimal. (flushed=959 and evicted=0, during the time.)
  
2018-06-20T17:06:47.324711Z 0 InnoDB: page_cleaner: 1000ms intended loop took 4312ms. The settings might not be optimal. (flushed=673 and evicted=0, during the time.)
  
2018-06-20T17:06:55.651725Z 0 InnoDB: page_cleaner: 1000ms intended loop took 5018ms. The settings might not be optimal. (flushed=592 and evicted=0, during the time.)
  
2018-06-20T17:07:09.104978Z 0 InnoDB: page_cleaner: 1000ms intended loop took 4983ms. The settings might not be optimal. (flushed=936 and evicted=0, during the time.)
  
2018-06-20T17:07:21.873005Z 0 InnoDB: page_cleaner: 1000ms intended loop took 4723ms. The settings might not be optimal. (flushed=677 and evicted=0, during the time.)
  
2018-06-20T17:07:40.903065Z 0 InnoDB: page_cleaner: 1000ms intended loop took 4432ms. The settings might not be optimal. (flushed=431 and evicted=0, during the time.)
  
2018-06-20T17:08:13.283422Z 1085 Can't generate a unique log-filename /usr/local/mysql/arch/mysql-bin.(1-999)
  
2018-06-20T17:08:13.283501Z 1085 mysqld: Binary logging not possible. Message: Either disk is full or file system is read only while rotating the binlog. Aborting the server.
  
17:08:13 UTC - mysqld got signal 6 ;
  
This could be because you hit a bug. It is also possible that this binary
  
or one of the libraries it was linked against is corrupt, improperly built,
  
or misconfigured. This error can also be caused by malfunctioning hardware.
  
Attempting to collect some information that could help diagnose the problem.
  
As this is a crash and something is definitely wrong, the information
  
collection process might fail.
  
key_buffer_size=268435456
  
read_buffer_size=2097152
  
max_used_connections=981
  
max_threads=214
  
thread_count=981
  
connection_count=981
  
It is possible that mysqld could use up to
  
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1272281 Kbytes of memory
  
Hope that's ok; if not, decrease some variables in the equation.
  
Thread pointer: 0x7f55ddfa9710
  
Attempting backtrace. You can use the following information to find out
  
where mysqld died. If you see no messages after this, something went
  
terribly wrong...
  
stack_bottom = 7f55bac30ea8 thread_stack 0x40000
  
mysqld(my_print_stacktrace+0x35)
  
mysqld(handle_fatal_signal+0x4a4)
  
/lib64/libpthread.so.0(+0xf5e0)
  
/lib64/libc.so.6(gsignal+0x37)
  
/lib64/libc.so.6(abort+0x148)
  
mysqld
  
mysqld(_ZN13MYSQL_BIN_LOG13new_file_implEbP28Format_description_log_event+0x50c)
  
mysqld(_ZN13MYSQL_BIN_LOG6rotateEbPb+0x3e)
  
mysqld(_ZN13MYSQL_BIN_LOG14ordered_commitEP3THDbb+0x497)
  
mysqld(_ZN13MYSQL_BIN_LOG6commitEP3THDb+0x585)
  
mysqld(_Z15ha_commit_transP3THDbb+0x174)
  
mysqld(_Z12trans_commitP3THD+0x49)
  
mysqld(_Z21mysql_execute_commandP3THDb+0x2be0)
  
mysqld(_Z11mysql_parseP3THDP12Parser_state+0x40d)
  
mysqld(_Z16dispatch_commandP3THDPK8COM_DATA19enum_server_command+0x119a)
  
mysqld(_Z10do_commandP3THD+0x194)
  
mysqld(handle_connection+0x29c)
  
mysqld(pfs_spawn_thread+0x174)
  
/lib64/libpthread.so.0(+0x7e25)
  
/lib64/libc.so.6(clone+0x6d)
  
Trying to get some variables.
  
Some pointers may be invalid and cause the dump to abort.
  
Query (7f55de4918b0): is an invalid pointer
  
Connection ID (thread ID): 1085
  
Status: NOT_KILLED
  
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
  
information that should help you find out what is causing the crash.
  解决方法:
  建议:不要给与数据库太大的压力,控制好thread并发数
  (3)linux某一用户达到max user processes后,导致暂时无法使用该用户mysqladmin登录和无法使用任何bash命令
  -bash-4.2$ sudo su mysqladmin
  su: failed to execute /bin/bash: Resource temporarily unavailable
  sht-sgmhadoopcm-01:mysqladmin:/usr/local/mysql/tpcc-mysql-master/logs:>ls
  bash: fork: retry: No child processes
  bash: fork: retry: No child processes
  bash: fork: retry: No child processes
  bash: fork: retry: No child processes
  bash: fork: Resource temporarily unavailable
  解决方法:
  查看系统的一些限制设置,发现max user processes 只有4096,进程被用完,所以无论写入什么命令,系统都无法及时作出响应
  # ulimit -a

  core file>
  data seg>  scheduling priority             (-e) 0

  file>  pending signals               (-i) 31207
  max locked memory       (kbytes, -l) 64

  max memory>  open files                      (-n) 65536

  pipe>  POSIX message queues   (bytes, -q) 819200
  real-time priority            (-r) 0

  stack>  cpu time               (seconds, -t) unlimited
  max user processes            (-u) 4096
  virtual memory          (kbytes, -v) unlimited
  file locks                      (-x) unlimited
  # vim /etc/security/limits.conf
  *               soft    nproc         65536
  *               hard    nproc         65536
  *               soft    nofile          65536
  *               hard    nofile          65536
  还需要修改一个默认的配置文件才能真正的生效:
  # vim /etc/security/limits.d/20-nproc.conf
  # Default limit for number of user's processes to prevent
  # accidental fork bombs.
  # See rhbz #432903 for reasoning.
  *          soft    nproc   65536 #把这里的4096改大为65536或者为unlimited
  root       soft    nproc   unlimited
  查看已经生效:
  sht-sgmhadoopcm-01:mysqladmin:/usr/local/mysql/tpcc-mysql-master/logs:>ulimit -u
  65536
  (4)并发太大,导致大量的lock
  当线程数达到1536及以上的时候,日志中出现大量的lock信息:
$less tpcc_runlog_parameter_20180621170023_1536_THREADS  
payment 1466:1
  
payment 247:1
  
1205, HY000, Lock wait timeout exceeded; try restarting transaction
  
1205, HY000, Lock wait timeout exceeded; try restarting transaction
  
payment 954:1
  
1205, HY000, Lock wait timeout exceeded; try restarting transaction
  
payment 485:1
  
1205, HY000, Lock wait timeout exceeded; try restarting transaction
  
payment 266:1
  
1205, HY000, Lock wait timeout exceeded; try restarting transaction
  
......
  
.......
  
2013, HY000, Lost connection to MySQL server at 'reading initial communication packet', system error: 104
  
2013, HY000, Lost connection to MySQL server at 'reading initial communication packet', system error: 104
  
2013, HY000, Lost connection to MySQL server at 'reading initial communication packet', system error: 104
  
2013, HY000, Lost connection to MySQL server at 'reading initial communication packet', system error: 104
  
2013, HY000, Lost connection to MySQL server at 'reading initial communication packet', system error: 104
  解决方法:
  此时查看MySQL中的一些Lock状态信息,可以看出来全部都是一些x锁,说明现在并发过大,导致数据库处理不过来,产生大量排它锁。
mysql> select * from INFORMATION_SCHEMA.INNODB_LOCKS;  
+------------------+-------------+-----------+-----------+----------------------+------------+------------+-----------+----------+-----------+
  
| lock_id          | lock_trx_id | lock_mode | lock_type | lock_table         | lock_index | lock_space | lock_page | lock_rec | lock_data |
  
+------------------+-------------+-----------+-----------+----------------------+------------+------------+-----------+----------+-----------+
  
| 15043927:63:3:8| 15043927    | X         | RECORD    | `tpccdb`.`warehouse` | PRIMARY    |         63 |         3 |      8 | 7         |
  
| 15043750:63:3:8| 15043750    | X         | RECORD    | `tpccdb`.`warehouse` | PRIMARY    |         63 |         3 |      8 | 7         |
  
| 15043742:63:3:8| 15043742    | X         | RECORD    | `tpccdb`.`warehouse` | PRIMARY    |         63 |         3 |      8 | 7         |
  
| 15043700:63:3:8| 15043700    | X         | RECORD    | `tpccdb`.`warehouse` | PRIMARY    |         63 |         3 |      8 | 7         |
  
| 15043656:63:3:8| 15043656    | X         | RECORD    | `tpccdb`.`warehouse` | PRIMARY    |         63 |         3 |      8 | 7         |
  
| 15043646:63:3:8| 15043646    | X         | RECORD    | `tpccdb`.`warehouse` | PRIMARY    |         63 |         3 |      8 | 7         |
  
| 15043607:63:3:8| 15043607    | X         | RECORD    | `tpccdb`.`warehouse` | PRIMARY    |         63 |         3 |      8 | 7         |
  
| 15043399:63:3:8| 15043399    | X         | RECORD    | `tpccdb`.`warehouse` | PRIMARY    |         63 |         3 |      8 | 7         |
  
| 15043388:63:3:8| 15043388    | X         | RECORD    | `tpccdb`.`warehouse` | PRIMARY    |         63 |         3 |      8 | 7         |
  
......
  
| 15043727:64:3:56 | 15043727    | X         | RECORD    | `tpccdb`.`district`| PRIMARY    |         64 |         3 |       56 | 6, 5      |
  
| 15035687:64:3:62 | 15035687    | X         | RECORD    | `tpccdb`.`district`| PRIMARY    |         64 |         3 |       62 | 7, 1      |
  
| 15043855:64:3:62 | 15043855    | X         | RECORD    | `tpccdb`.`district`| PRIMARY    |         64 |         3 |       62 | 7, 1      |
  
| 15032784:64:3:56 | 15032784    | X         | RECORD    | `tpccdb`.`district`| PRIMARY    |         64 |         3 |       56 | 6, 5      |
  
+------------------+-------------+-----------+-----------+----------------------+------------+------------+-----------+----------+-----------+
  
1756 rows in set, 1 warning (3.87 sec)
  建议:不要给与数据库太大的压力,控制好thread并发数
  (5)由于磁盘空间不足,导致OS和MySQL相关报错

[*]  在启动MySQL的时候err日志报的错误:
sht-sgmhadoopdn-01.telenav.cn:mysqladmin:/usr/local/mysql/data:>tail -f hostname.err  
2018-06-23T08:21:35.417981Z 0 --secure-file-priv is set to NULL. Operations related to importing and exporting data are disabled
  
2018-06-23T08:21:35.418138Z 0 mysqld (mysqld 5.7.21-log) starting as process 16512 ...
  
2018-06-23T08:21:35.438102Z 0 InnoDB: PUNCH HOLE support available
  
2018-06-23T08:21:35.438190Z 0 InnoDB: Mutexes and rw_locks use GCC atomic builtins
  
2018-06-23T08:21:35.438202Z 0 InnoDB: Uses event mutexes
  
2018-06-23T08:21:35.438242Z 0 InnoDB: GCC builtin __sync_synchronize() is used for memory barrier
  
2018-06-23T08:21:35.438340Z 0 InnoDB: Compressed tables use zlib 1.2.3
  
2018-06-23T08:21:35.438390Z 0 InnoDB: Using Linux native AIO
  
2018-06-23T08:21:35.449551Z 0 InnoDB: Number of pools: 1
  
2018-06-23T08:21:35.449862Z 0 InnoDB: Using CPU crc32 instructions
  
2018-06-23T08:21:35.456460Z 0 InnoDB: Initializing buffer pool, total size = 5G, instances = 8, chunk size = 128M
  
2018-06-23T08:21:36.274392Z 0 InnoDB: Completed initialization of buffer pool
  
2018-06-23T08:21:36.426894Z 0 InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
  
2018-06-23T08:21:36.441471Z 0 InnoDB: Opened 3 undo tablespaces
  
2018-06-23T08:21:36.441504Z 0 InnoDB: 3 undo tablespaces made active
  
2018-06-23T08:21:36.441913Z 0 InnoDB: Highest supported file format is Barracuda.
  
2018-06-23T08:21:36.629202Z 0 InnoDB: Creating shared tablespace for temporary tables
  
2018-06-23T08:21:36.629339Z 0 InnoDB: Setting file '/usr/local/mysql/data/ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
  
2018-06-23T08:21:36.646819Z 0 InnoDB: 1048576 bytes should have been written. Only 794624 bytes written. Retrying for the remaining bytes.
  
2018-06-23T08:21:36.790869Z 0 InnoDB: 1048576 bytes should have been written. Only 921600 bytes written. Retrying for the remaining bytes.
  
2018-06-23T08:21:38.380026Z 0 InnoDB: 1048576 bytes should have been written. Only 925696 bytes written. Retrying for the remaining bytes.
  
2018-06-23T08:21:38.390611Z 0 InnoDB: 1048576 bytes should have been written. Only 929792 bytes written. Retrying for the remaining bytes.
  
2018-06-23T08:21:38.399767Z 0 InnoDB: 1048576 bytes should have been written. Only 933888 bytes written. Retrying for the remaining bytes.
  
2018-06-23T08:21:38.412285Z 0 InnoDB: 1048576 bytes should have been written. Only 937984 bytes written. Retrying for the remaining bytes.
  
2018-06-23T08:21:38.419670Z 0 InnoDB: 1048576 bytes should have been written. Only 942080 bytes written. Retrying for the remaining bytes.
  
2018-06-23T08:21:38.436278Z 0 InnoDB: Retry attempts for writing partial data failed.
  
2018-06-23T08:21:38.436345Z 0 InnoDB: Write to file /usr/local/mysql/data/ibtmp1failed at offset 11534336, 1048576 bytes should have been written, only 942080 were written. Operating system error number 28. Check that your OS and file system support files of this size. Check also that the disk is not full or a disk quota exceeded.
  
2018-06-23T08:21:38.436408Z 0 InnoDB: Error number 28 means 'No space left on device'
  
2018-06-23T08:21:38.436426Z 0 InnoDB: Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/operating-system-error-codes.html
  
2018-06-23T08:21:38.436442Z 0 InnoDB: Could not set the file size of '/usr/local/mysql/data/ibtmp1'. Probably out of disk space
  
2018-06-23T08:21:38.436457Z 0 InnoDB: Unable to create the shared innodb_temporary
  
2018-06-23T08:21:38.436467Z 0 InnoDB: Plugin initialization aborted with error Generic error
  
2018-06-23T08:21:39.040518Z 0 InnoDB: Removed temporary tablespace data file: "ibtmp1"
  
2018-06-23T08:21:39.040570Z 0 Plugin 'InnoDB' init function returned error.
  
2018-06-23T08:21:39.040581Z 0 Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
  
2018-06-23T08:21:39.040597Z 0 Failed to initialize builtin plugins.
  
2018-06-23T08:21:39.040673Z 0 Aborting
  
2018-06-23T08:21:39.040699Z 0 Binlog end
  
2018-06-23T08:21:39.041472Z 0 mysqld: Shutdown complete
  

  
从库同步也会遇到错误:
  
root@localhost [(none)]>show slave status\G
  
*************************** 1. row ***************************
  
               Slave_IO_State: Reconnecting after a failed master event read
  
                  Master_Host: 172.16.101.58
  
                  Master_User: repl
  
                  Master_Port: 3306
  
                Connect_Retry: 60
  
            Master_Log_File: mysql-bin.000041
  
          Read_Master_Log_Pos: 245615
  
               Relay_Log_File: relay-bin.000025
  
                Relay_Log_Pos: 925225142
  
      Relay_Master_Log_File: mysql-bin.000014
  
             Slave_IO_Running: Connecting
  
            Slave_SQL_Running: Yes
  
            Replicate_Do_DB:
  
          Replicate_Ignore_DB:
  
         Replicate_Do_Table:
  
       Replicate_Ignore_Table:
  
      Replicate_Wild_Do_Table:
  
Replicate_Wild_Ignore_Table:
  
                   Last_Errno: 0
  
                   Last_Error:
  
               Skip_Counter: 0
  
          Exec_Master_Log_Pos: 925224929
  
            Relay_Log_Space: 21652623260
  
            Until_Condition: None
  
               Until_Log_File:
  
                Until_Log_Pos: 0
  
         Master_SSL_Allowed: No
  
         Master_SSL_CA_File:
  
         Master_SSL_CA_Path:
  
            Master_SSL_Cert:
  
            Master_SSL_Cipher:
  
               Master_SSL_Key:
  
      Seconds_Behind_Master: 123889
  
Master_SSL_Verify_Server_Cert: No
  
                Last_IO_Errno: 2003
  
                Last_IO_Error: error reconnecting to master 'repl@172.16.101.58:3306' - retry-time: 60retries: 119
  
               Last_SQL_Errno: 0
  
               Last_SQL_Error:
  
Replicate_Ignore_Server_Ids:
  
             Master_Server_Id: 583306
  
                  Master_UUID: 7dac3e2a-7789-11e8-ae76-0050568211bd
  
             Master_Info_File: mysql.slave_master_info
  
                  SQL_Delay: 0
  
          SQL_Remaining_Delay: NULL
  
      Slave_SQL_Running_State: System lock
  
         Master_Retry_Count: 86400
  
                  Master_Bind:
  
      Last_IO_Error_Timestamp: 180626 11:26:30
  
   Last_SQL_Error_Timestamp:
  
               Master_SSL_Crl:
  
         Master_SSL_Crlpath:
  
         Retrieved_Gtid_Set: 7dac3e2a-7789-11e8-ae76-0050568211bd:9-6216578
  
            Executed_Gtid_Set: 7dac3e2a-7789-11e8-ae76-0050568211bd:1-1912222
  
                Auto_Position: 1
  
         Replicate_Rewrite_DB:
  
               Channel_Name:
  
         Master_TLS_Version:
  
1 row in set (0.00 sec)

[*]  TPCC压测时候的日志报错
  sht-sgmhadoopcm-01:mysqladmin:/usr/local/mysql/tpcc-mysql-master/logs:>less tpcc_runlog_parameter_20180625075957_1024_THREADS
  delivery 241:5
  1114, HY000, The table 'order_line' is full
  delivery 823:5
  1114, HY000, The table 'order_line' is full
  delivery 618:5
  ......
  ......

[*]  使用vim打开一个文件的时候报错
  E297: Write error in swap file
  "hostname.err" 3437L, 417792C
  Press ENTER or type command to continue
  解决方法:
  sht-sgmhadoopdn-01.telenav.cn:mysqladmin:/usr/local/mysql/data:>df -Th
  Filesystem            Type      SizeUsed Avail Use% Mounted on
  /dev/mapper/centos-root xfs      77G   77G   12M 100% /
  devtmpfs                devtmpfs2.9G   02.9G   0% /dev
  tmpfs                   tmpfs   2.9G   02.9G   0% /dev/shm
  tmpfs                   tmpfs   2.9G   65M2.8G   3% /run
  tmpfs                   tmpfs   2.9G   02.9G   0% /sys/fs/cgroup
  /dev/sda1               xfs       497M132M366M27% /boot
  建议:删除无用的文件释放disk space,或者mount一个更大的磁盘
  参考链接:
  MySQL压测工具--TPCC安装,测试
  http://imysql.com/2015/07/28/mysql-benchmark-reference.shtml



页: [1]
查看完整版本: MySQL压测--注意事项和FAQ