People monitor their systems for two main reasons: to keep their system healthy and to understand its performance. Almost everyone does both wrong, for the same reasons: they monitor so they can react to failures, rather than measuring their workload so that they can predict problems.
(feed for performance engineering posts)
Recent articles
- read more
Fast query log with tcpdump and tshark
read moredbbenchis a tool I've been working on for a while at MemSQL. It is an open source database workload driver engineers at MemSQL and I use for performance testing. One often-overlooked feature indbbenchis the ability to replay query log files. Previously, this was a somewhat manual process …An informal survey of Linux dynamic tracers
read moreI survey some dynamic tracers (e.g. perf, sysdig) available on Linux.
Dtrace isn't just a tool; it's a philosophy
read moreI document some pain points from recent performance investigations and then speculate that such issues are endemic to the Linux community.
Using off-cpu flame graphs on Linux
read moreI use off-cpu flame graphs to identify that repeated mmap calls are slowing my database.
Why are builds on HGFS so slow?
read moreWe use flame graphs to identify that hgfs is the bottleneck in my build.
TCP Keepalive is a lie
read moreIn the past few months, I’ve had to debug some gnarly issues related to TCP_KEEPALIVE. Through these issues, I’ve learned that it is harder than one might think to ensure that your sockets fail after a short time when the network is disconnected. This blog post is intended …
Bash Performance Tricks
read moreMy coworkers presented a silly programming interview style question to me the other day: given a list of words, find the largest set of words from that list that all have the same hash value. Everyone was playing around with a different language, and someone made the claim that it …
Achieving maximum memory bandwidth
read moreI embarked upon a quest to understand some unexpected behavior and write a program that achieved the theoretical maximum memory bandwidth.
A cross-platform monotonic timer
read moreI've been working on writing a memory bandwidth benchmark for a while and needed to use a monotonic timer to compute accurate timings. I have since learned that this is more challenging to do that I initially expected and each platform has a different way of doing it.
Why is
omp_get_num_procsso slow?read moreSome students had some difficulty profiling their code because
omp_get_num_procswas dominating the profiling traces. I tracked it down and found that the profiling tools emitted misleading results when the library didn't have symbols.Introduction to Using Profiling Tools
read moreIn this article, you will see several performance tools used to identify bottlenecks in a simple program.
Analysis of a Parallel Memory Allocator
read moreI implemented and tested different configurations of a modern parallel memory allocator.