Wanelo on Github

Welcome to the engineering blog of Wanelo, featuring technical tales of triumph, daring and woe. Sometimes cats. We are definitely hiring. Please email play AT wanelo.com if you're curious...

  • Romeo is Bleeding (CVE-2014-0160)

    First posted on Thursday, 10 Apr 2014

    This week was arguably one of the worst weeks to work in systems operations in the history of the Internet. The revelation of what has been called Heartbleed, a bug in OpenSSL that allows attackers to read memory from vulnerable servers (and potentially retrieve memory from vulnerable clients) has had many administrators scrambling. This bug makes it trivial for hackers to obtain the private keys to a site's SSL certificate, as well as private data that might be in-process such as usernames and passwords.

    While there is a huge potential for multiple blog posts regarding our learnings from this week, in this post I'll focus on the current state of affairs, as well as a timeline of events.

    tl;dr — wanelo.com was affected by Heartbleed. As of 1am April 8, the public-facing parts of Wanelo were no longer vulnerable. Through the rest of this week we have followed up to ensure that internal components are also secure. This afternoon we deployed new SSL certificates and revoked our old ones. We have no indication that our site was hacked, but there is no way to be certain.

    ⟹ full post...

  • Capistrano 3, You've Changed! (Since Version 2)

    First posted on Monday, 31 Mar 2014

    Capistrano has been around for almost as long as Rails has been around, perhaps short by just a year or so. Back in the early days it introduced much needed sanity into the world of deployment automation, including documenting in code some of the best practices for application deployment, such as the directory layout that included 'releases' folder with the ability to roll back, 'shared' folder with the ability to maintain continuity from release to release. Capistrano was built upon the concept of having roles for application servers. Finally, being written in Ruby, Capistrano always offered remarkable levels of flexibility and customization. So it should not come as a surprise that it became highly popular, and that subsequent infrastructure automation tools like Chef and Puppet include Capistrano-like deployment automation recipes.

    These days it is not uncommon to bump into Python, Java, or Scala applications that are deployed to production using Capistrano (which itself is written in ruby). It's because a lot of the assumptions that Capistrano makes are not language or framework specific.

    It's worth noting that in it's entire history of existence, Capistrano have not had an upgrade so dramatically different from the previous version, that in some way it requires rewiring some of your brain neurons to grasp new concepts, new callbacks, and the new mappings between roles and servers, for example.

    This blog post represents a typical tale of "We upgraded from version X to version Y. It was hard! But here's what we learned.". And amazingly, despite having been released more than 4 months ago, there is still a massive shortage of quality Capistrano 3 documentation (or upgrade paths) online. With this post I am hoping to bridge this gap a tiny bit, and perhaps help a few folks out there upgrading their deployment scripts.

    ⟹ full post...

  • 12-Step Program for Scaling Web Applications on PostgreSQL

    First posted on Friday, 21 Mar 2014

    On Tuesday night this week Wanelo hosted a monthly meeting of SFPUG - San Francisco PostgreSQL User Group, and I gave a talk that presented a summary to date of Wanelo's performance journey to today. The presentation ended upo being much longer than I originally anticipated, and went on for an hour and a half. Whoops! With over a dozen questions near the end, it felt good to share the tips and tricks that we learned while scaling our app.

    The presentation got recorded on video, but it's not a very good quality unfortunately.

    In the meantime, you can see the slides for it :)

    ⟹ full post...

  • Lake Tahoe is Perfect for fun, skiing, and... hackathons :)

    First posted on Monday, 10 Mar 2014

    This week entire Wanelo crew packed up and went up to Tahoe City, a small town on the shore of beautiful Lake Tahoe. We've done a hackathon before, but never outside of our main office HQ in San Francisco.

    On Sunday after dinner everyone pitched their ideas and tried to get a team assembled to work on a project. There have been a total of 19 project submissions, and given that we have 15 engineers, I would call this a huge success.

    ⟹ full post...

  • A Brief History of Sprout Wrap

    First posted on Monday, 27 Jan 2014

    When Wanelo gets a brand new workstation the first thing we install on it is Sprout. Sprout is a collection of OS X-specific recipes that allow you to install common utilities and applications that every Ruby developer has and will appreciate.

    ⟹ full post...

  • Just enough client-side error tracking

    First posted on Wednesday, 18 Dec 2013

    Deploying at Wanelo tends to be high-frequency and low-stress, since we have most aspects of our systems performance graphed in real time. We can roll out new code to a percentage of app servers, monitor app server and db performance, check error rates, and then finish up the deploy. 

    However, there’s one area where I’ve always wanted better metrics: on the client side. In particular, I want better visibility into uncaught JavaScript exceptions. Client-side error tracking is a notoriously difficult problem -- browser extensions can throw errors, adding noise to your reports; issues may manifest only in certain browsers or with certain network conditions; exception messages tend to be generic, and line-numbers are unhelpful, since scripts are usually minified; data has to be captured and collected from users’ browsers and reported via http before a user navigates to a new page. And on and on.

    On the other hand, many sites are moving more and more functionality client-side these days, so it’s becoming increasingly important to know when there are problems in the browser.

    ⟹ full post...

  • Multi-process or multi-threaded design for Ruby daemons? GIL to the rescue :)

    First posted on Wednesday, 11 Dec 2013

    MRI Ruby has a global interpreter lock (GIL), meaning that even when writing multi-threaded Ruby only a single thread is on-CPU at a point in time. Other distributions of Ruby have done away with the GIL, but even in MRI threads can be useful. The Sidekiq background worker gem takes advantage of this, running multiple workers in separate threads within a single process.

    If the workload of a job blocks on I/O, Ruby can context-switch to other threads and do other work until the I/O finishes. This could happen when the workload reaches out to an external API, shells out to another command, or is accessing the file system. 

    If the workload of a process does not block on I/O, it will not benefit from thread switching under a GIL, as it will be, instead, CPU-bound. In this case, multiple processes will be more efficient, and will be able to take better advantage of multi-core systems.

    So… why not skip threads and just deal with processes? A number of reasons.

    ⟹ full post...

  • Quick heads-up on our upcoming webinar with Joyent on Manta

    First posted on Friday, 18 Oct 2013

    A few months back, one of our engineers Atasay Gokkaya published a fantastic overview of how we at Wanelo use Joyent's new innovative object store Manta for a massively parallelized user retention analysis, using just a few lines of basic UNIX commands in combination with map/reduce paradigm.  

    ⟹ full post...

  • Detangling Business Logic in Rails Apps with PORO Events and Observers

    First posted on Monday, 05 Aug 2013

    With any Rails app that evolves along with substantial user growth and active feature development, pretty soon a moment comes when there appears to be a decent amount of tangled logic, AKA "technical debt."

    ⟹ full post...

  • Really Really Really Deleting SMF Service Instances on Illumos

    First posted on Tuesday, 23 Jul 2013

    We recently ran into a tricky situation with a custom SMF service we maintain on our Joyent SmartOS hosts.  The namespace for the service instance (defined in upstream code) had changed, which meant that as our Chef automation upgraded the service instances to the latest code, we ended up with a lot of duplicate service instances that each had a unique namespace.

    ⟹ full post...

  • A Cost-effective Approach to Scaling Event-based Data Collection and Analysis

    First posted on Friday, 28 Jun 2013

    With millions of people now using Wanelo across various platforms, collecting and analyzing user actions and events becomes a pretty fun problem to solve. While in most services user actions generate some aggregated records in database systems and keeping those actions non-aggregated is not explicitly required for the product itself, it is critical for other reasons such as user history, behavioral analytics, spam detection and ad hoc querying.

    ⟹ full post...

  • Scaling Wanelo 100x in Six Months

    First posted on Saturday, 25 May 2013

    We recently gave a talk at the SFRoR Meetup here in San Francisco about how we scaled this rails app to 200K RPM in six months. There were a lot of excellent questions at the meetup, and so we decided to put the slides up on SlideShare.

    ⟹ full post...

  • High Read/Write Performance PostgreSQL 9.2 and Joyent Cloud

    First posted on Wednesday, 13 Feb 2013

    At Wanelo we are pretty ardent fans of PostgreSQL database server, but try not to be dogmatic about it. 

    I have personally used PostgreSQL since version 7.4, dating back to some time in 2003 or 4. I was always impressed with how easy it was to get PostgreSQL installed on a UNIX system, how quick it was to configure (only two config files to edit), and how simple it was to create and authenticate users.

    ⟹ full post...

  • How Alerts Can Tell You When Beyoncé Is On

    First posted on Wednesday, 06 Feb 2013

    image

    This past weekend a number of us were focused on a really important annual prime time television event (the Puppy Bowl, of course). Turns out other people out there were watching some other sporting event, which leads to the rest of this story.

    ⟹ full post...

  • The Case for Vertical Sharding

    First posted on Tuesday, 05 Feb 2013

    Wanelo's recent surge in popularity rewarded our engineers with a healthy stream of scaling problems to solve.

    Among the many performance initiatives launched over the last few weeks, vertical sharding has been the most impactful and interesting so far.

    ⟹ full post...

  • The Big Switch How We Rebuilt Wanelo from Scratch and Lived to Tell About It

    First posted on Friday, 14 Sep 2012

    The Wanelo you see today is a completely different website than the one that existed a few months ago. It’s been rewritten and rebuilt from the ground up, as part of a process that took about two months. We thought we’d share the details of what we did and what we learned, in case someone out there ever finds themselves in a similar situation, weighing the risks of either working with a legacy stack or going full steam ahead with a rewrite.

    ⟹ full post...