Linux 操作系统日常巡检[原创]

date：2017-10-06

author：杨然儿

操作系统日常巡检

0.查询是否被攻击


统计登录成功的IP
    last | awk '{ print $3}' | sort | uniq -c | sort -n
登录失败的IP
    lastb | awk '{ print $3}' | sort | uniq -c | sort -n
查看是否被攻击
    grep :x:0: /etc/passwd  --只返回一行
是否存在可疑登录
    登录成功的
        grep "Accept" /var/log/secure*|awk '{print $9 '=' $11}'|sort|uniq -c
    登录失败的
        grep "Failed" /var/log/secure*|awk '{print $9 '=' $11}'|sort|uniq -c

1.查操作系统日志是否存在错误:


more /var/log/messages|grep -E '[Ee]erro|fail|[Ww]arning'

2.查主机是否存在硬件错误


dmesg|more

3.文件使用率


df -h

4.检查是否存在僵尸进程


ps -ef|grep defunct

5.检查网关是否正确


netstat -nr
ping  网关

6.检查网络通信是否正常


netstat -ni
RX-DRP 值为0或者保持稳定

7.查网络连接状态


netstat -an|egrep 'ESTABLISHED|FIN_WAIT_1|FIN_WAIT_2|TIME_WAIT|CLOSING|CLOSE_WAIT|LAST_ACK'

netstat -n | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'

8.查看是否丢包


cat /proc/net/dev
注意：dropped:0==> 如果这个值不为 0 , 则说明有丢包现象

9.网卡的工作状态


ifconfig -a
如果网卡中状态无为down状态

ethtool eth1
ethtool -i eth1
iftop -B -n  -N -p
fftop -i eth1 -B -n  -N -p
查看网卡速度

10.CPU是否存在瓶颈


.CPU IDel大于90％则OK
        top -i -n 1|grep id|awk '{print $8}'

.CPU load average 正常值：< CPU个数 * 核数 *0.7
        top -i -n 1|grep load|awk '{print $13}'

        一般来说只要每个CPU的当前活动进程数不大于3那么系统的性能就是良好的，
        如果每个CPU的任务数大于5，那么就表示这台机器的性能有严重问题。
        实际上这个进程数就是每个CPU的负载.

11.检查内存是否充足


内存有足够空间
free -m

没有发生换页
vmstat 1 6
        如果 si 和so 的值总是不为零，则说明，swap 有很高的换页率
        如果 bi 和 bo 的值很高说明系统在进行大量的IO读些操作

12.检查IO


查看读写速度
        iostat -d -x -k 1 5

        Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
        xvdb              0.00    23.00    0.00   47.00     0.00   280.00    11.91     0.54   11.55   0.49   2.30
        xvda              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
        xvdc              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
        xvdd              0.00     2.00    0.00  388.00     0.00  1560.00     8.04     1.26    3.26   0.34  13.10

        rrqm/s 每秒读请求合并的次数
        wrqm/s 每秒写请求合并的次数
        r/s 每秒读请求次数，这个值通常会跟avgrq-sz一起观察，avgrq-sz大则有可能r/s比较小
        w/s 每秒写请求次数，这个值通常会跟avgrq-sz一起观察，avgrq-sz大则有可能w/s比较小
        rkB/s 这个指标跟vmstat的bi值通常应该是很接近的
        wkB/s 这个指标跟vmstat的bo值通常应该是很接近的
        avgrq-sz 每个请求平均大小，单位是扇区数，一般在200~400之间算是正常和理想的状态，如果这个值比较小，比方说只在100左右，说明太多的IO请求没有被合并，或者大的IO请求被“打散”，在写操作时，过多小的请求会造成磁盘磁头的频繁移动，降低IO性能。
        avgqu-sz 平均请求队列长度，这个值在正常的系统中不应超过113太多，如果在200左右，甚至上千那说明发生了IO拥塞，而系统还在向IO请求队列中放请求（有一个例外是在执行sync,fsync操作时，该值到达最大值8192是正常的）
        await 单位是毫秒，不应超过两位数，几百甚至上千都是不可接受的。
        svctm 平均每次设备I/O操作的服务时间(毫秒)。即delta(use)/delta(rio+wio)
        %util 尽可能的让该值达到100%，否则说明IO能力没有完全被利用。

        如果svctm比较接近await,说明I/O几乎没有等待时间;
        如果await远大于svctm,说明I/O队列太长，应用得到的响应时间变慢。


    磁盘平均队列长度 avgqu-sz 大于 3 就需要关注，
        iostat -xt 1


    查看磁盘统计信息
        cat /proc/diskstats

     主设备号 次 设备名称   第1个域
     202      32 xvdc  			4959692 347 411803562 33790444 8109304 76242434 674813944 1340903295 0 14873863 1374687623
     202      33 xvdc1 			4959649 347 411803218 33790119 8109304 76242434 674813944 1340903295 0 14873753 1374687299
     202      48 xvdd  			222375610 17443 16714996538 923047010 5307720710 1063308104 50968261080 2447567938 0 1816063211 3368385818
     202      49 xvdd1 			222375567 17443 16714996194 923046942 5307720710 1063308104 50968261080 2447567938 0 1816063228 3368386366


    第1个域：读完成次数 ----- 读磁盘的次数，成功完成读的总次数。
    （number of issued reads. This is the total number of reads completed successfully.）

    第2个域：合并读完成次数， 第6个域：合并写完成次数。为了效率可能会合并相邻的读和写。从而两次4K的读在它最终被处理到磁盘上之前可能会变成一次8K的读，才被计数（和排队），因此只有一次I/O操作。这个域使你知道这样的操作有多频繁。
    （number of reads merged）

    第3个域：读扇区的次数，成功读过的扇区总次数。
    （number of sectors read. This is the total number of sectors read successfully.）

    第4个域：读花费的毫秒数，这是所有读操作所花费的毫秒数（用__make_request()到end_that_request_last()测量）。
    （number of milliseconds spent reading. This is the total number of milliseconds spent by all reads (as measured from __make_request() to end_that_request_last()).）

    第5个域：写完成次数 ----写完成的次数，成功写完成的总次数。
    （number of writes completed. This is the total number of writes completed successfully.）

    第6个域：合并写完成次数 -----合并写次数。
    （number of writes merged Reads and writes which are adjacent to each other may be merged for efficiency. Thus two 4K reads may become one 8K read before it is ultimately handed to the disk, and so it will be counted (and queued) as only one I/O. This field lets you know how often this was done.）

    第7个域：写扇区次数 ---- 写扇区的次数，成功写扇区总次数。
    （number of sectors written. This is the total number of sectors written successfully.）

    第8个域：写操作花费的毫秒数  ---  写花费的毫秒数，这是所有写操作所花费的毫秒数（用__make_request()到end_that_request_last()测量）。
    （number of milliseconds spent writing This is the total number of milliseconds spent by all writes (as measured from __make_request() to end_that_request_last()).）

    第9个域：正在处理的输入/输出请求数 -- -I/O的当前进度，只有这个域应该是0。当请求被交给适当的request_queue_t时增加和请求完成时减小。
    （number of I/Os currently in progress. The only field that should go to zero. Incremented as requests are given to appropriate request_queue_t and decremented as they finish.）

    第10个域：输入/输出操作花费的毫秒数  ----花在I/O操作上的毫秒数，这个域会增长只要field 9不为0。
    （number of milliseconds spent doing I/Os. This field is increased so long as field 9 is nonzero.）

    第11个域：输入/输出操作花费的加权毫秒数 -----  加权， 花在I/O操作上的毫秒数，在每次I/O开始，I/O结束，I/O合并时这个域都会增加。这可以给I/O完成时间和存储那些可以累积的提供一个便利的测量标准。
    （number of milliseconds spent doing I/Os. This field is incremented at each I/O start, I/O completion, I/O merge, or read of these stats by the number of I/Os in progress (field 9) times the number of milliseconds spent doing I/O since the last update of this field. This can provide an easy measure of both I/O completion time and the backlog that may be accumulating.）

以上操作经过验证可以直接拿去。

auther breakEval13

https://github.com/breakEval13