Linux Performance Tuning

In-Depth Analysis of Linux System Performance Tuning

Optimizing system performance on Linux requires a deep understanding of kernel internals and systemd’s resource management. This article explores the most important tuning vectors that allow for precise adjustments tailored to specific workloads – particularly in demanding scenarios such as virtualization, high-performance databases, or HPC environments.


1. Virtual Memory (VM) Tuning with sysctl

The files /etc/sysctl.conf or preferably /etc/sysctl.d/*.conf serve as the main interface for modifying kernel runtime parameters.

Swapping & Memory Pressure

  • vm.swappiness (0–100)
    Controls the kernel’s tendency to swap out memory pages.
    • Low (1–10): Ideal for databases and in-memory caches.
    • 0 does not completely disable swapping but prevents it until memory pressure is extreme.
  • Combination for minimal swapping: vm.swappiness = 0 vm.min_free_kbytes = 524288 # ~5% RAM reserved

This prevents unnecessary swapping while still keeping emergency memory reserves for the kernel before invoking the OOM killer.

Dirty Pages & Write I/O

  • vm.dirty_ratio (default: 20): Max percentage of memory allowed as dirty pages.
  • vm.dirty_background_ratio (default: 10): When exceeded, the kernel starts background flushing.
  • vm.dirty_expire_centisecs: How long data may stay dirty before forced writeback.

Example for write-heavy workloads (e.g., logging):

vm.dirty_background_ratio = 5
vm.dirty_expire_centisecs = 6000

Additional Memory Parameters

  • vm.overcommit_memory
    • 0: Default heuristic
    • 1: Aggressive overcommit (used in HPC/scientific computing)
    • 2: Strict mode
  • vm.overcommit_ratio: Ratio of RAM+swap allowed for malloc() commitments.

2. Block I/O and Filesystem Tuning

I/O Scheduler Selection

  • none / noop – best for VMs and SSD/NVMe (host or controller handles scheduling).
  • mq-deadline – good default for SATA/SAS disks, balanced latency vs throughput.
  • bfq – fairness-oriented, suited for desktops.
  • kyber – low-latency, still experimental.

Check scheduler:

cat /sys/block/sda/queue/scheduler

Filesystem Mount Options

  • noatime – never update access times (saves I/O).
  • relatime (default) – only update atime when older than mtime.
  • discard – inline TRIM for SSDs (better via fstrim.timer).
  • ext4: data=writeback → reduces journaling overhead (metadata-only), faster but riskier.

3. Resource Management with systemd & cgroups

systemd (since v219) relies on cgroups v2 for fine-grained process resource control.

Example: mysql.service unit:

[Service]
CPUAffinity=0 1
MemoryMax=8G
MemoryHigh=6G
CPUShares=1024
IOWeight=100
BlockIOReadBandwidth=/dev/sda 10M
BlockIOWriteBandwidth=/dev/sdb 50M
TasksMax=10000

Apply changes:

sudo systemctl daemon-reload
sudo systemctl restart mysql

4. CPU and NUMA Optimization

CPU Frequency Scaling

sudo cpupower frequency-set -g performance
  • performance → fixed max frequency (recommended for servers).
  • schedutil → dynamic scaling based on scheduler load info (good for modern CPUs).

Process Affinity

  • CPU pinning: taskset -cp 0,2,4-6 <pid>
  • NUMA binding: numactl --cpunodebind=0 --membind=0 /usr/bin/app

Helpful tools: numastat, hwloc.


5. Additional Optimization Techniques

Transparent Huge Pages (THP)

Often harmful for databases → disable:

echo never > /sys/kernel/mm/transparent_hugepage/enabled

Classical HugePages

Beneficial for DBs (Postgres, Oracle, MySQL) → reduces TLB misses:

vm.nr_hugepages = 2048

IRQ Affinity

Distribute interrupts across CPUs:

cat /proc/interrupts
echo 2 > /proc/irq/32/smp_affinity

Tools: irqbalance, tuned-adm.


6. Monitoring & Analysis

Optimization without measurement is pointless. Key tools:

  • perf – CPU profiler (cache misses, cycles).
  • ftrace / trace-cmd – kernel tracing.
  • iostat -xzm 1 – per-device I/O utilization & latency.
  • pidstat -d – per-process I/O stats.
  • vmstat 1 – virtual memory activity.
  • bcc/eBPF (biolatency, execsnoop, tcpconnect) – modern deep-dive analysis.

7. Best-Practice Tables

Recommended Kernel Settings by Workload

WorkloadSwappinessI/O SchedulerCPU GovernoratimeExtra Notes
Database1–10noop/none (SSD), mq-deadline (HDD)performancenoatimeTHP off, HugePages on
Webserver10–30none (SSD)performance/schedutilrelatimetcp_tw_reuse=1
Virtualization10–20none/noopperformancerelatimeIRQ balancing
HPC/Scientific0–5noopperformancenoatimeovercommit_memory=1
Desktop30–60bfqschedutilrelatimefairness over raw perf

8. The TOTE Model: Systematic Optimization

Tuning should follow a feedback loop based on the TOTE principle (Test–Operate–Test–Exit):

  1. Test: Measure baseline (perf, iostat, vmstat).
  2. Operate: Apply one targeted change (e.g., swappiness=10).
  3. Test: Measure again, compare with baseline.
  4. Exit: Keep changes only if results are positive.

If results fall short, refine the hypothesis and repeat – avoid blind trial-and-error.


9. Conclusion

Linux provides an extensive toolkit for performance tuning – from VM parameters, I/O schedulers, and cgroups to NUMA binding and IRQ affinity. The key is not random parameter tweaking but a measurement-driven approach. By applying the TOTE model systematically, administrators can achieve sustainable and verifiable performance gains across diverse workloads.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top