NeFut Logo NeFut
Admin Login

[C++ Magic] Optimizing My Epoll HTTP Server from 15k to 125k req/sec

Published at: 2026-06-05 07:20 Last updated: 2026-06-06 13:04
#algorithm #optimization #C++

Greetings to my fellow nerds. A month ago, I had zero network programming experience. So I decided to fix that by building an epoll based HTTP server from scratch and benchmarked every major architectural change along the way.

Performance Benchmarks

Benchmark command:

wrk -t4 -c10000 -d10s http://127.0.0.1:8080/

Request: GET /index.html
Response: Static HTML file (~1500 bytes)
CPU: Intel i5-13420H (13th Gen)
Compiler: Clang (O3)

Architecture Throughput (req/sec) Description
Blocking ~15k Single threaded blocking accept/read/write
Epoll (LT) ~34k Single threaded event loop utilizing non blocking I/O multiplexing
Epoll (LT, keep alive) ~37.5k Single threaded event loop with persistent connections
Epoll (LT, keep alive, sendfile) ~41k Single threaded event loop with persistent connections and zero copy file serving
Epoll (LT, keep alive, sendfile, multithreading) ~125k Multithreaded architecture running 4 concurrent epoll loops (optimal on test machine)

Some Surprising Observations

Sendfile mattered less than I expected... for a server whose entire purpose is to serve files, I was expecting a bigger gain but maybe because my file was only ~1.5KB, it did not help much.

More threads made things worse:

Worker Threads Throughput (req/sec)
1 ~40k
2 ~95k
3 ~115k
4 ~125k
5 ~90k
6 ~90k
8 ~75k
10 ~70k
12 ~65k

My CPU has 6 physical cores and 12 logical processors, I suspect that the cost of all the syscalls for every loop, context switching, and lock contention on shared kernel objects, dominated on higher thread counts, though I haven't fully investigated it yet.

Profiling with perf

Function Approx. CPU Samples
readSock() ~22%
writeSock() ~16%
parse() ~8%
std::format() ~7%
open() ~3%
sendfile() ~2.5%

Turns out I'm still spending more time reading and parsing requests than sending responses, meaning there might still be room for batched reads or buffer pooling in a future iteration...

Final Thoughts

I could hunt for possible micro optimizations or even experiment with an edge triggered architecture but I'm kinda burnt out at this point and this feels like a great point to end this project... The codebase is pretty small (~1k LOC), so if anyone's interested in taking a look: GitHub Repository

Blogger's Review: This article illustrates the critical technical challenges and solutions encountered during the optimization of an epoll-based HTTP server, particularly the in-depth analysis of multithreading performance. The detailed exploration of performance bottlenecks offers valuable insights for future optimization directions.

Original Source: https://www.reddit.com/r/cpp/comments/1twh4e1/i_spent_a_month_optimizing_my_epoll_based_http/

[h] Back to Home