At Wanelo we are pretty ardent fans of PostgreSQL database server, but try not to be dogmatic about it.
I have personally used PostgreSQL since version 7.4, dating back to some time in 2003 or 4. I was always impressed with how easy it was to get PostgreSQL installed on a UNIX system, how quick it was to configure (only two config files to edit), and how simple it was to create and authenticate users.
Of course I also played with MySQL back then, and always found it a bit wanting. Its user authentication and password configuration made little sense to me, and I found myself constantly having to look it up just to start using the database. MySQL won the battle during those early days by providing a native Windows one-click installer and good performance out of the box, but it seems to have lost ground in recent years as a growing number of high-profile companies (notably Heroku and Instagram) adopt PostgreSQL, and as fear and loathing over the fuzziness of MySQL's open source nature and Oracle's involvement increases.
Traditionally, PostgreSQL has had a richer feature set, better stability and higher scalability, but slightly poorer performance. I don't believe the performance difference is anywhere near what it used to be, if at all, but doing yet another benchmark is beyond the scope of this post. We the developers get to choose our tools from time to time, and my open source database of choice has always been PostgreSQL.
Wanelo is currently hosted on a high-performance cloud -- Joyent Cloud, something we've mentioned in previous posts. Joyent Cloud comes with close to native hardware RAID I/O performance, and as Wanelo has been scaling up rapidly, we've been taking advantage of that high I/O throughput. Even with that advantage though we recently vertically sharded our database in order to spread the write load across more than one RAID server.
In this post, I'll go over some of our settings in postgresql.conf, which have been adjusted for high-performance/throughput and large RAM sizes. I would like to credit Josh Berkus and his PGExperts consultancy for providing us with timely and necessary assistance in tuning PostgreSQL these last few months.
You can grab a gist of our postgresql.conf file here.
Additionally, you can review our open source Chef cookbook for installing PostgreSQL on Joyent Cloud by compiling it from sources, which comes with some of those settings by default.
As we are running most of our databases on 80GB instances, we set shared_buffers to 12GB. We did this because PostgreSQL takes great advantage of file system caches. The related parameter effective_cache_size tells the PostgreSQL query planner exactly how much RAM is available for caching. On Joyent this cache is provided by the extremely efficient ZFS ARC cache.
We've set work_mem to a somewhat large value of 65MB (per process) so that our SQL sorts don't go to disk.
We've set asynchronous commit to off, so that we can buffer several writes together. Commit delay is set to 100 microseconds, so that any commits arriving within that time are buffered and synced together. This provides benefit mostly during peak times, when write volume is very high. During other times it simply delays each commit by 100 microseconds, which is acceptable for us.
Please let us know if you have any questions about any of the values in this config file. Discussion welcome :)