1. # Why everyone fails at monitoring; and what you can do about it

People monitor their systems for two main reasons: to keep their system healthy and to understand its performance. Almost everyone does both wrong, for the same reasons: they monitor so they can react to failures, rather than measuring their workload so that they can predict problems.

2. # Fast query log with tcpdump and tshark

Thu 21 July 2016

dbbench is a tool I've been working on for a while at MemSQL. It is an open source database workload driver engineers at MemSQL and I use for performance testing. One often-overlooked feature in dbbench is the ability to replay query log files. Previously, this was a somewhat manual process …

3. # An informal survey of Linux dynamic tracers

Sat 09 January 2016

I survey some dynamic tracers (e.g. perf, sysdig) available on Linux.

4. # Dtrace isn't just a tool; it's a philosophy

I document some pain points from recent performance investigations and then speculate that such issues are endemic to the Linux community.

5. # Using off-cpu flame graphs on Linux

Sun 20 December 2015

I use off-cpu flame graphs to identify that repeated mmap calls are slowing my database.

6. # Why are builds on HGFS so slow?

We use flame graphs to identify that hgfs is the bottleneck in my build.

7. # TCP Keepalive is a lie

Fri 28 August 2015

In the past few months, I’ve had to debug some gnarly issues related to TCP_KEEPALIVE. Through these issues, I’ve learned that it is harder than one might think to ensure that your sockets fail after a short time when the network is disconnected. This blog post is intended …

8. # Bash Performance Tricks

My coworkers presented a silly programming interview style question to me the other day: given a list of words, find the largest set of words from that list that all have the same hash value. Everyone was playing around with a different language, and someone made the claim that it …

9. # Achieving maximum memory bandwidth

I embarked upon a quest to understand some unexpected behavior and write a program that achieved the theoretical maximum memory bandwidth.

10. # A cross-platform monotonic timer

I've been working on writing a memory bandwidth benchmark for a while and needed to use a monotonic timer to compute accurate timings. I have since learned that this is more challenging to do that I initially expected and each platform has a different way of doing it.

11. # Why is omp_get_num_procs so slow?

Some students had some difficulty profiling their code because omp_get_num_procs was dominating the profiling traces. I tracked it down and found that the profiling tools emitted misleading results when the library didn't have symbols.