Scenario: Diagnosing Disk I/O Latency
You suspect that a disk is experiencing high latency during peak traffic. To monitor real-time disk performance, you run:
iostat -y -x 5 3
Output:
avg-cpu: %user %nice %system %iowait %steal %idle
1.50 0.00 0.40 0.10 0.00 98.00
Device rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 1.00 10.00 20.00 0.50 1.00 100.00 0.50 25.00 20.00 30.00 5.00 15.00
avg-cpu: %user %nice %system %iowait %steal %idle
2.00 0.00 0.50 0.20 0.00 97.30
Device rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 2.00 15.00 25.00 0.75 1.25 120.00 0.60 30.00 25.00 35.00 6.00 20.00
What is iostat?
iostat
is a Linux/Unix command-line utility that provides detailed statistics about CPU usage and input/output (I/O) performance of storage devices (disks, partitions, or logical volumes). It is part of the sysstat package and is widely used by system administrators and SREs to diagnose performance bottlenecks related to disk I/O and CPU utilization.
Let’s go through the full output of iostat
step by step, explaining each section and metric in detail.
iostat -x 5 2
-x
: Displays extended statistics for devices.5
: Interval in seconds between reports.2
: Number of reports (including the first one).
Section 1: CPU Statistics
The first section of the output shows CPU utilization metrics.
Header:
avg-cpu: %user %nice %system %iowait %steal %idle
Metrics Explained:
%user: Percentage of CPU time spent on user processes (non-kernel processes).
%nice: Percentage of CPU time spent on user processes with a "nice" priority (low-priority tasks).
%system: Percentage of CPU time spent on kernel/system processes.
%iowait: Percentage of CPU time spent waiting for I/O operations (e.g., disk or network) to complete.
%steal: Percentage of CPU time "stolen" by the hypervisor for other virtual machines (in virtualized environments).
%idle: Percentage of CPU time spent idle (not doing any work).
Example:
avg-cpu: %user %nice %system %iowait %steal %idle
2.50 0.00 1.00 5.00 0.00 91.50
Interpretation:
2.5% of the CPU is being used by user processes.
1% is being used by system/kernel processes.
5% of the CPU is waiting for I/O operations to complete (this is significant and could indicate a disk bottleneck).
91.5% of the CPU is idle, meaning there is plenty of CPU capacity available.
Section 2: Device I/O Statistics
The second section provides detailed statistics for each storage device (e.g., /dev/sda
).
Header:
Device rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
Metrics Explained:
Device: The name of the storage device (e.g.,
sda
,nvme0n1
).rrqm/s: The number of read requests merged per second. If multiple read requests are queued for the same block, they are merged into one request.
wrqm/s: The number of write requests merged per second.
r/s: The number of read requests completed per second.
w/s: The number of write requests completed per second.
rMB/s: The amount of data read from the device per second (in megabytes).
wMB/s: The amount of data written to the device per second (in megabytes).
avgrq-sz: The average size of I/O requests (in sectors). Larger values indicate larger I/O operations.
avgqu-sz: The average number of I/O requests in the queue. Higher values indicate more queuing and potential contention.
await: The average time (in milliseconds) for I/O requests to be completed, including both queue time and service time.
- High
await
values indicate that I/O operations are taking too long, which could be due to disk contention or slow storage.
- High
r_await: The average time (in milliseconds) for read requests to be completed.
w_await: The average time (in milliseconds) for write requests to be completed.
svctm: The average service time (in milliseconds) for I/O requests. This is the time the device spends servicing requests, excluding queue time.
%util: The percentage of time the device was busy handling I/O requests. If this value is close to 100%, the device is saturated and may be a bottleneck.
Example:
Device rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 1.00 10.00 20.00 0.50 1.00 100.00 0.50 25.00 20.00 30.00 5.00 15.00
Interpretation:
rrqm/s and wrqm/s: Very low values (0.00 and 1.00), meaning there is little merging of I/O requests.
r/s and w/s: The device is handling 10 read requests and 20 write requests per second.
rMB/s and wMB/s: The device is reading 0.5 MB/s and writing 1 MB/s.
avgrq-sz: The average request size is 100 sectors (50 KB per request, as 1 sector = 512 bytes).
avgqu-sz: The average queue size is 0.5, meaning there is some queuing but not excessive.
await: The average time for I/O requests is 25 ms, which is relatively high and could indicate a performance issue.
r_await: Read requests take 20 ms on average.
w_await: Write requests take 30 ms on average.
svctm: The service time is 5 ms, meaning the device itself is fast, but the queuing time is causing delays.
%util: The device is 15% utilized, so it is not saturated.
Second Report
The second report shows updated statistics after 5 seconds.
Device rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 2.00 15.00 25.00 0.75 1.25 120.00 0.60 30.00 25.00 35.00 6.00 20.00
Changes:
r/s and w/s: Read and write requests have increased to 15 and 25 per second, respectively.
rMB/s and wMB/s: Read and write throughput have increased to 0.75 MB/s and 1.25 MB/s.
await: The average time for I/O requests has increased to 30 ms, indicating worsening performance.
%util: The device utilization has increased to 20%, meaning the disk is busier.
How to Use This Data
High
await
and%util
:If
await
is high and%util
is close to 100%, the disk is likely a bottleneck.Solution: Upgrade to faster storage (e.g., SSDs) or optimize the application to reduce disk I/O.
High
avgqu-sz
:If
avgqu-sz
is high, it indicates queuing, which could be due to contention or insufficient IOPS.Solution: Increase IOPS (e.g., provisioned IOPS on AWS EBS) or reduce the number of concurrent I/O operations.
Low
svctm
but Highawait
:If
svctm
is low butawait
is high, the delay is in the queue rather than the device itself.Solution: Investigate the application or workload causing excessive I/O.
High
r_await
orw_await
:If read or write latency is significantly higher than the other, it could indicate a specific issue with read or write operations.
Solution: Optimize the workload (e.g., caching for reads, batching for writes, Buffered IO).
Batching for writes is a technique used to group multiple write operations together and execute them as a single operation. This reduces the number of I/O operations, improves performance, and minimizes the overhead associated with frequent small writes. It is particularly useful in scenarios where write latency or disk I/O is a bottleneck.
Benefits of Write Batching
Reduced Disk I/O:
- Fewer write operations mean less overhead and better performance.
Improved Throughput:
- Grouping writes allows the system to process them more efficiently.
Lower Latency:
- By reducing the number of I/O operations, the overall latency of the system decreases.
Cost Savings:
- In cloud environments (e.g., AWS EBS), fewer IOPS can reduce costs.
Trade-Offs
Increased Memory Usage:
- Data must be buffered in memory before being written, which can increase memory usage.
Risk of Data Loss:
- If the application crashes before flushing the buffer, some data may be lost. This can be mitigated by using durable queues or periodic flushing.
Complexity:
- Implementing batching adds complexity to the application logic.
Conclusion
The iostat
command provides a wealth of information about CPU and disk performance. By understanding the metrics and their relationships, you can diagnose performance bottlenecks and take corrective actions. In this example, the high await
values and increasing %util
suggest that the disk is becoming a bottleneck, and further investigation or optimization is needed.