Are you participating in the 1 billion row challenge (lnkd.in/gMVXiueF)? The challenge is to read 1 billion rows from a file where each row consists of a location name and a temperature value and the goal is to produce the min,max, and mean of each location as fast as possible. It's a pretty interesting exercise to do and learn how every small decision you make in your code impacts the performance once you hit a certain scale, and then how do you analyze and improve the performance.
I've been attempting it for the last few days. I started with a runtime of 540 seconds and at this point it is down to 7.3 seconds. I've reached a point where the flamegraphs hardly show any obvious candidates to eliminate but there are still plenty of lower level things to improve, such as branches and cache misses.
I am planning to write a series of articles on some of these performance engineering related topics in the context of this problem. Sign up if you are interested in this sort of thing.
Jan 14, 2024
at
1:29 PM
Log in or sign up
Join the most interesting and insightful discussions.