Multi-process or multi-threaded design for Ruby daemons? GIL to the rescue :)
MRI Ruby has a global interpreter lock (GIL), meaning that even when writing multi-threaded Ruby only a single thread is on-CPU at a point in time. Other distributions of Ruby have done away with the GIL, but even in MRI threads can be useful. The Sidekiq background worker gem takes advantage of this, running multiple workers in separate threads within a single process.
If the workload of a job blocks on I/O, Ruby can context-switch to other threads and do other work until the I/O finishes. This could happen when the workload reaches out to an external API, shells out to another command, or is accessing the file system.
If the workload of a process does not block on I/O, it will not benefit from thread switching under a GIL, as it will be, instead, CPU-bound. In this case, multiple processes will be more efficient, and will be able to take better advantage of multi-core systems.
So… why not skip threads and just deal with processes? A number of reasons.
Quick Shout Out for Our Upcoming Webinar w/ Joyent on Manta
A few months back, one of our engineers Atasay Gokkaya published a fantastic overview of how we at Wanelo use Joyent’s new innovative object store Manta for a massively parallelized user retention analysis, using just a few lines of basic UNIX commands in combination with map/reduce paradigm.
I also recently went onstage with Joyent’s VP of Engineering Bryan Cantrill for a fireside chat at VentureBeat’s CloudBeat, discussing Wanelo’s use of Manta, as well as our excitement about Joyent’s cloud. If you missed it, or are interested to learn more about the subject, we’re continuing the discussion with a live webinar on Tuesday, October 29th.
Atasay and I will dive deep into our team’s experience using Joyent Manta storage and big data analytics service.
It’s an hour-long webinar, and we’ll cover the following:
- How we solved the problem of user event data collection on a massive scale, and very cheaply
- How Joyent Manta storage and big data analytics service allowed us to use the collected data to analyze user behavior and retention over many months, and run our queries in mere minutes
- We’ll discuss the unique benefits of using Joyent Manta Storage Service, including ease of use, flexibility, performance, and cost-savings
- We’ll answer any questions from the audience as much as time permits.
To join, please register here.
Detangling Business Logic in Rails Apps with PORO Events and Observers
With any Rails app that evolves along with substantial user growth and active feature development, pretty soon a moment comes when there appears to be a decent amount of tangled logic, AKA “technical debt.”
A typical example would be a user registration controller’s “register” action, which upon a successful registration might coordinate a bunch of actions related to the registration but unrelated to one another, such as:
- Sending the user a welcome email
- Logging an analytics event for future reporting
- Queueing up a job to notify user’s Facebook friends
- Running a check against a spam database of IP addresses to validate the new account
- Running recommendation engine logic to suggest topics to follow
These are all concerns that are independent of one another, but happen when a user registers. Some of these actions happen immediately, some even within a single transaction, and some asynchronously (in another thread, or in a background job).
This topic has been given a lot of discussion on this famous thread, where even DHH chimed in. We’ll use the example discussed in that thread, and the version that DHH presented (slightly compacted) below. Basically, a controller that’s creating a comment and then performing a bunch of related actions, such as posting to Twitter and Facebook, or running it through a spam check.
class PostsController def create @entry = current_user.entries.find(params[:id]) return head(:bad_request) if SpamChecker.spammy?(params[:post][:body]) @comment = @entry.comments. create!(params[:post]. permit(:title, :body). merge(author: current_user)) Notifications.new_comment(@comment).deliver if @comment.share_on_twitter? TwitterPoster.new(current_user, @comment.body).post end if @comment.share_on_facebook? FacebookPoster.new(current_user, @comment.body). action(:comment) end end end
In this blog post we’ll examine an event-based approach to decoupling this business logic, a method that’s been pretty successful within the Wanelo codebase thus far.
Really Really Really Deleting SMF Service Instances on Illumos
We recently ran into a tricky situation with a custom SMF service we maintain on our Joyent SmartOS hosts. The namespace for the service instance (defined in upstream code) had changed, which meant that as our Chef automation upgraded the service instances to the latest code, we ended up with a lot of duplicate service instances that each had a unique namespace.
After wrestling with the best way to batch delete/reinstall the service (using Chef’s knife cli), we found a way to improve our old process.
Normally, we would delete services with something like svccfg delete <service_name>, but this doesn’t work well if you need to delete a number of services, especially if they have similar namespaces. Further, we found that running this in a loop against the output of svcs -a -H | grep <service_name> wasn’t effective because service configurations could linger even after the service instance had been deleted.
Digging into man svccfg, we came up with a way to enumerate services and service configurations more cleanly with svccfg:
for service in $(svccfg list | grep nad); do sudo svcadm disable -s $service done for instance in $(svccfg list | grep nad); do sudo svccfg delete $instance done
A Cost-effective Approach to Scaling Event-based Data Collection and Analysis
With millions of people now using Wanelo across various platforms, collecting and analyzing user actions and events becomes a pretty fun problem to solve. While in most services user actions generate some aggregated records in database systems and keeping those actions non-aggregated is not explicitly required for the product itself, it is critical for other reasons such as user history, behavioral analytics, spam detection and ad hoc querying.
If we were to split this problem into two sub-problems, they would probably be “data collection” and “data aggregation and analysis.”
UPDATE: please checkout the following presentation from Surge2013 Conference for another view into this project: