In-Depth Analysis of Linux System Performance Tuning

Optimizing system performance on Linux requires a deep understanding of kernel internals and systemd’s resource management. This article explores the most important tuning vectors that allow for precise adjustments tailored to specific workloads – particularly in demanding scenarios such as virtualization, high-performance databases, or HPC environments.

1. Virtual Memory (VM) Tuning with sysctl

The files /etc/sysctl.conf or preferably /etc/sysctl.d/*.conf serve as the main interface for modifying kernel runtime parameters.

Swapping & Memory Pressure

vm.swappiness (0–100)
Controls the kernel’s tendency to swap out memory pages.
- Low (1–10): Ideal for databases and in-memory caches.
- 0 does not completely disable swapping but prevents it until memory pressure is extreme.
Combination for minimal swapping: vm.swappiness = 0 vm.min_free_kbytes = 524288 # ~5% RAM reserved

This prevents unnecessary swapping while still keeping emergency memory reserves for the kernel before invoking the OOM killer.

Dirty Pages & Write I/O

vm.dirty_ratio (default: 20): Max percentage of memory allowed as dirty pages.
vm.dirty_background_ratio (default: 10): When exceeded, the kernel starts background flushing.
vm.dirty_expire_centisecs: How long data may stay dirty before forced writeback.

Example for write-heavy workloads (e.g., logging):

vm.dirty_background_ratio = 5
vm.dirty_expire_centisecs = 6000

Additional Memory Parameters

vm.overcommit_memory
- 0: Default heuristic
- 1: Aggressive overcommit (used in HPC/scientific computing)
- 2: Strict mode
vm.overcommit_ratio: Ratio of RAM+swap allowed for malloc() commitments.

2. Block I/O and Filesystem Tuning

I/O Scheduler Selection

none / noop – best for VMs and SSD/NVMe (host or controller handles scheduling).
mq-deadline – good default for SATA/SAS disks, balanced latency vs throughput.
bfq – fairness-oriented, suited for desktops.
kyber – low-latency, still experimental.

Check scheduler:

cat /sys/block/sda/queue/scheduler

Filesystem Mount Options

noatime – never update access times (saves I/O).
relatime (default) – only update atime when older than mtime.
discard – inline TRIM for SSDs (better via fstrim.timer).
ext4: data=writeback → reduces journaling overhead (metadata-only), faster but riskier.

3. Resource Management with systemd & cgroups

systemd (since v219) relies on cgroups v2 for fine-grained process resource control.

Example: mysql.service unit:

[Service]
CPUAffinity=0 1
MemoryMax=8G
MemoryHigh=6G
CPUShares=1024
IOWeight=100
BlockIOReadBandwidth=/dev/sda 10M
BlockIOWriteBandwidth=/dev/sdb 50M
TasksMax=10000

Apply changes:

sudo systemctl daemon-reload
sudo systemctl restart mysql

4. CPU and NUMA Optimization

CPU Frequency Scaling

sudo cpupower frequency-set -g performance

performance → fixed max frequency (recommended for servers).
schedutil → dynamic scaling based on scheduler load info (good for modern CPUs).

Process Affinity

CPU pinning: taskset -cp 0,2,4-6 <pid>
NUMA binding: numactl --cpunodebind=0 --membind=0 /usr/bin/app

Helpful tools: numastat, hwloc.

5. Additional Optimization Techniques

Transparent Huge Pages (THP)

Often harmful for databases → disable:

echo never > /sys/kernel/mm/transparent_hugepage/enabled

Classical HugePages

Beneficial for DBs (Postgres, Oracle, MySQL) → reduces TLB misses:

vm.nr_hugepages = 2048

IRQ Affinity

Distribute interrupts across CPUs:

cat /proc/interrupts
echo 2 > /proc/irq/32/smp_affinity

Tools: irqbalance, tuned-adm.

6. Monitoring & Analysis

Optimization without measurement is pointless. Key tools:

perf – CPU profiler (cache misses, cycles).
ftrace / trace-cmd – kernel tracing.
iostat -xzm 1 – per-device I/O utilization & latency.
pidstat -d – per-process I/O stats.
vmstat 1 – virtual memory activity.
bcc/eBPF (biolatency, execsnoop, tcpconnect) – modern deep-dive analysis.

7. Best-Practice Tables

Recommended Kernel Settings by Workload

Workload	Swappiness	I/O Scheduler	CPU Governor	atime	Extra Notes
Database	1–10	noop/none (SSD), mq-deadline (HDD)	performance	noatime	THP off, HugePages on
Webserver	10–30	none (SSD)	performance/schedutil	relatime	tcp_tw_reuse=1
Virtualization	10–20	none/noop	performance	relatime	IRQ balancing
HPC/Scientific	0–5	noop	performance	noatime	overcommit_memory=1
Desktop	30–60	bfq	schedutil	relatime	fairness over raw perf

8. The TOTE Model: Systematic Optimization

Tuning should follow a feedback loop based on the TOTE principle (Test–Operate–Test–Exit):

Test: Measure baseline (perf, iostat, vmstat).
Operate: Apply one targeted change (e.g., swappiness=10).
Test: Measure again, compare with baseline.
Exit: Keep changes only if results are positive.

If results fall short, refine the hypothesis and repeat – avoid blind trial-and-error.

9. Conclusion

Linux provides an extensive toolkit for performance tuning – from VM parameters, I/O schedulers, and cgroups to NUMA binding and IRQ affinity. The key is not random parameter tweaking but a measurement-driven approach. By applying the TOTE model systematically, administrators can achieve sustainable and verifiable performance gains across diverse workloads.

Linux Performance Tuning