There is a point when you can't increase performance without rethinking storage.
Working with leader-follower replication topology and denormalization was a really good move.