Linux Kernel Performance Tuning
๐ Linux Kernel Performance Tuning
Modern high-throughput systems donโt fail because of lack of hardwareโthey fail because of inefficient defaults. The Linux kernel is powerful, but its out-of-the-box configuration is designed for general-purpose workloads, not latency-sensitive or throughput-intensive production systems.
This guide walks through practical kernel tuning techniques used in real-world systems, along with a data-driven benchmarking methodology so you can validate improvements instead of relying on assumptions.
๐ง Tuning Philosophy
Before touching any kernel parameter:
- Measure first, then tune
- Change one variable at a time
- Validate using repeatable benchmarks
- Always have a rollback strategy
Kernel tuning without measurement is just guesswork.
โ๏ธ CPU & Scheduler Optimization
๐งฉ CPU Governor
Set CPU to performance mode to avoid frequency scaling latency:
echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
Or persist via:
sudo apt install cpufrequtils
echo 'GOVERNOR="performance"' | sudo tee /etc/default/cpufrequtils
๐ Scheduler Selection
Check current scheduler:
cat /sys/block/sda/queue/scheduler
Common options:
mq-deadlineโ balanced, good default for SSDnoneโ best for NVMe devices (no scheduling overhead)bfqโ desktop workloads
Set scheduler:
echo none | sudo tee /sys/block/nvme0n1/queue/scheduler
๐ CPU Pinning (Isolation)
Pin workloads to specific cores to reduce context switching:
taskset -c 2,3 ./your_app
Or isolate CPUs at boot:
GRUB_CMDLINE_LINUX="isolcpus=2,3 nohz_full=2,3 rcu_nocbs=2,3"
Then:
sudo update-grub
๐งฎ Memory Management Tuning
๐ vm.swappiness
Controls how aggressively Linux swaps memory.
sysctl vm.swappiness=10
Persist:
echo "vm.swappiness=10" | sudo tee -a /etc/sysctl.conf
0โ10โ latency-sensitive workloads60โ default (too high for most production systems)
๐งฑ Transparent Huge Pages (THP)
THP can introduce latency spikes in some workloads (databases especially).
Disable:
echo never | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
Persist via systemd:
sudo nano /etc/systemd/system/disable-thp.service
[Unit]
Description=Disable Transparent Huge Pages
[Service]
Type=oneshot
ExecStart=/bin/sh -c "echo never > /sys/kernel/mm/transparent_hugepage/enabled"
[Install]
WantedBy=multi-user.target
sudo systemctl enable disable-thp
๐ vm.dirty_ratio & vm.dirty_background_ratio
Control when the kernel flushes dirty pages to disk.
sysctl vm.dirty_background_ratio=5
sysctl vm.dirty_ratio=10
- Lower values โ more consistent latency
- Higher values โ better throughput, worse spikes
๐พ Disk & I/O Optimization
โก Read-Ahead Buffer
Check:
blockdev --getra /dev/sda
Set (example: 4096 KB):
blockdev --setra 4096 /dev/sda
- Larger values โ good for sequential workloads
- Smaller values โ better for random I/O
๐งต I/O Queue Depth
For NVMe:
cat /sys/block/nvme0n1/queue/nr_requests
Tune:
echo 1024 | sudo tee /sys/block/nvme0n1/queue/nr_requests
๐ Multi-Queue (blk-mq)
Modern kernels use multi-queue by default, but verify:
cat /sys/block/sda/queue/nr_hw_queues
Higher values โ better parallelism
๐ Network Stack Optimization
๐ฆ Increase Socket Buffers
sysctl -w net.core.rmem_max=134217728
sysctl -w net.core.wmem_max=134217728
๐ TCP Tuning
sysctl -w net.ipv4.tcp_congestion_control=bbr
sysctl -w net.core.default_qdisc=fq
Enable BBR:
sysctl net.ipv4.tcp_available_congestion_control
๐ File Descriptors
ulimit -n 1048576
Persist:
echo "* soft nofile 1048576" | sudo tee -a /etc/security/limits.conf
echo "* hard nofile 1048576" | sudo tee -a /etc/security/limits.conf
๐ Benchmarking Methodology
๐งช Tools
sysbenchโ CPU, memory, I/Ofioโ disk benchmarkingiperf3โ network throughputperfโ kernel-level profiling
๐ Example: Disk Benchmark
fio --name=randread \
--ioengine=libaio \
--rw=randread \
--bs=4k \
--numjobs=4 \
--size=1G \
--runtime=60 \
--group_reporting
๐ง Key Metrics
Track:
- Throughput (MB/s)
- Latency (avg, p95, p99)
- CPU utilization
- Context switches
- I/O wait
๐ Testing Strategy
- Capture baseline
- Apply one tuning change
- Re-run benchmark
- Compare results
- Keep or rollback
โ ๏ธ Common Pitfalls
- Tuning everything at once โ impossible to isolate impact
- Ignoring workload type โ wrong optimizations
- Blindly copying configs โ dangerous in production
- Not testing under realistic load
๐ Putting It All Together
A typical production-tuned system might include:
- CPU governor โ
performance - Scheduler โ
none(NVMe) vm.swappiness=10- THP disabled
- Tuned dirty ratios
- Increased I/O queue depth
- BBR congestion control
- High file descriptor limits
But the exact combination depends on your workload.
๐ Final Thoughts
Linux kernel tuning is not about memorizing sysctl valuesโitโs about:
- Understanding system behavior
- Identifying bottlenecks
- Applying targeted optimizations
- Validating with data
The best engineers donโt just tune systemsโthey prove improvements with evidence.
๐ Next Steps
If youโre building high-performance platforms:
- Automate tuning via Ansible or Terraform
- Integrate benchmarks into CI pipelines
- Combine kernel tuning with observability (eBPF, perf, tracing)
Measure. Tune. Validate. Repeat.