Selecting the Right Cloud Platform

While the array of cloud platforms (Platform as service) has increased significantly recently, its still a hard job to find the perfect host for your website. The obvious advantages of cloud platforms over dedicated servers start with cost and extend to ease of maintenance. For instance, on EC2 it takes about 30 min or less to fire a new server instance and the Amazon’s ability to charge its customer on per hour basis, makes EC2 a very attractive option. AutoScale is another great add-on to EC2 platform and lets the consumer automate the computing power usage based on user load.

We recently did an extended research on available cloud platforms for a client. The requirements were however no straight forward. Client’s top priorities were ability of platform to scale automatically, ease of maintenance, and performance of infrastructure. Then, to make the task challenging and exciting, the client wanted a geographical fail-over ability in case of a disaster. Amazon alone had 2 major outages in year 2011 so users are quite concern about business continuity in time of a disaster.

We evaluated many options starting from Amzon EC2, RDS, Heroku, Bluebox, RackSpace, High Velocity, Xeround etc and concluded that only one provider will not be able to answer all questions from our client. So we divided the architecture into App layer and DB layer and choose different hosts for each layer. In such a division the biggest concern is network latency but interesting fact is most cloud platforms are built over EC2 with additional wrapping so within EC2 network latency is not huge at all. We ran some test with Heroku (app) and Xeround (DB) and found network latency to be less than half a sec per transaction.

You can find below a presentation we put together after our analysis of different cloud hosts. There is a slide that links to a Google spreadsheet with calculation of estimated cost for our proposed infrastructure. Feel free to drop a comment to correct me or ask me

Presentation:

Selecting the Right Cloud Host

Kung-fu Chop To The Load Best – Part Deux

This is part II of my post on load testing for a server side solution. Part 1 discussed an overall strategy and importance of running organized load testing. Here I will discuss some tips that helped us scale our Ruby app to bear a load of 50TPS+ on a Large amazon EC2 instance. You can find part 1 to my post here

An mentioned in part 1 of the post, this information is by no means exhaustive but to server as a guide line for performance tuning.

Load Test Simulator

We used a very simply and handy tool, JMeter, to simulate load on our server. It gives nice flexibility in terms of setting up request load (multiple threads) and we can also configure data pool to diversify payload. The load tests can be scheduled or setup for fixed time. The results are real time and quantitative (no charts or graphics though). Attached a couple of sample screen shots from JMeter that demonstrate thread settings and real time results of a sample load test

Some Cheeky Moves that Enhance Performance

In Memory Processing is a Blessing: Processing (calculation, comparison, searching is much much faster in memory than on disk. Though MySQL comes with native SQLCache (read here). Other options such as Memcache are also available that give user more control on caching policies

Group Process Database Requests: A common programming practice is to programatically loop DB inserts. For large data sets this becomes quite a performance bottleneck because of a separate DB transaction being initiated every time (new connection, new statement object, new commit every iteration). This bottleneck can we avoided by using a group insert e.g.: insert into _table  (col1, col 2) values (‘val1′ , ‘val2′), (‘val3′,’val4′) , (‘val5′,’val6′) …..

Move Independent Processing to Background: In our experience quite some processing (like firing emails, PUSH, or updating OLAP schemas) can be moved to background to run independently using a good background processing solution. The one we used was Recue. It uses Redis to create a namespace and is quite efficient

:include is faster than JOIN: take a look at http://stackoverflow.com/questions/1208636/rails-include-vs-joins

Calibrating App Server and DB Threads

A normal processing sequence of a request landing on a server is : HTTP Server -> Interpreter/Compiler -> DB

Modern day Web, application an DB servers come with configurations to increase or decrease thread counts. More threads mean more parallel processing BUT more system resource requirement as well so its very important to:

a) Set the number of threads for each server layer that does not burn out your CPU and Memory, OR does not use CPU and memory at all

b) Keep a balanced ratio between server layers. If Application server threads are too many and DB threads are too low then most of the time application server threads will be hitting DB asking for a connection and producing load that does not help user

Apache Server Threads

Apacher server config can be calibrated to set an optimal value for MaxClients which represents the simultaneous requests the apache server will handle. At any time apache threads can be viewed by hitting http://your-server/server-status. A sample snapshot follows:

MySQL Threads

MySQL uses a thread based client connection model. The more the connections, higher will be load on MySQL and system hardware resources. Read more

Monitor Your Load Tests

Load tests are as good as the results deduced from them. The results should at the minimum record the following:

a) System Throughput (No of requests going in per sec VS number of  requests completed per sec) – we used JMeter for this purpose

Screenshot:

b) System Response Time (Time to entertain single request. There is no point of taking a load too high and keep user waiting for a response) – we used JMeter to record System response time

Screenshot:

c) Server Hardware Status (CPU, memory, Disk-Swap are the most important in my opinion) – we used monit for this purpose

Screenshot:

Have fun optimizing!

Kung-Fu Chop To the Load Beast – Part I

A common goal every website or online service provider share is to have maximum visitors or server hits possible. That, of course, does not include  spam or denial-of-service attack on the server. While more traffic or hits means more business, it also generates the need to ensure the scalability and load capacity of the server. Interestingly, most engineers haste in order to test the load capacity of the server and put tons of loads the very first day and expect to see great results. Guess what? the server either gets non-responsive (meaning probably crashed) or it did take the load but the engineer did not know they over-killed the hardware and received a shockingly high server invoice at end of month.

Clearly one nor want too small a server muscle, neither too big and expensive of a server. To look eye-to-eye with the load creep, there is need of a strategy. Here is part I of my kung-fu attempt to chop it down. I would like to keep it high level and at a strategic level. In part II I will discuss specific details of

our recent optimization exercise on Rails. I must disclaim that server tuning or optimization is a big topic with tons of material available and what follows is my experience working with load testing and optimization of a Ruby on Rails server for a messaging engine project.

Before You Start Optimizing and Scaling:

a) Size the Beast: Estimate your target load on server. I prefer a number like transactions per min rather than transactions per second. Per seconds is much smaller a number when you have heavy processing and a normal transaction spans over a few seconds. It is also important to clearly define what a transaction means in your system, does a single DB hit counts for a transaction or entertaining a user request end-end counts for a transaction?

b) Benchmark Response Times: The goal of optimization and scaling should not be solely handling more load on the server, but also serving requests within a decent response time. A server handling tons of loads but keeping the user waiting for longer period will soon put the CEO out of business

c) Choose an appropriate load generation mechanism: This could be a free tool like JMeter or SOAPUI who can can create massive HTTP hits on the server. The flexibility these tools provide is quite nice ranging from configuring exact load to put on server using multi-threading and ability to attach a data pool to vary request data. You can also write your own code to generate a load if the request structure is complex. In our case, we used both.

Places to look for Optimization:

a) Starting with code optimizations. Hotspots are DB calls, third party web service calls and parsing large JSONs, XMLs etc. I have experienced that using an async approach of DB writing and JSON/XML parsing (wherever possible) greatly improves system performance and user experience. We optimized one of our routines by 800% using asynchronous DB writing

b) Application server threads: Application server request threads should always maintain appropriate ratio with hardware muscle. You don’t want to do too much or too less parallel request handling on application server. Too much will lead to CPU or Memory starvation and too less means you have getting an oversize server invoice month end. With out pretty standard request size, we have enabled 50 maxClients for Apache on standard EC2 XLarge instance and hitting about 50% of CPU capacity

c) Caching: Caching saves us from disk and notwork latency by reusing already fetched data. Caching is also available at multiple levels starting from Web serer caching, SQL caching provided by standard RDBMS and third party caching such as Memcached

d) DB Indexing: This is not something super latest or cutting-edge and has been in use for a while, but, there is a catch. Normally we create DB indexes on tables whom we hit the most in searching etc. However, if there are massive CUD operation (Create, Update and Delete) on the table as well then indexes will really slow them down because it updates the B-trees every time

Guide Yourself in Load Testing:

a) A cyclic approach is what works. Run more than one tests while recording them. I have found it useful to create a simple spreadsheet that records details and results of every test run. I suggest to record basic information like hardware profile, change in settings/hardware from previous test, load put on server, throughput of the server, exceptions/crashed, and duration of test

b) Its important to bring one change at a time to the system – let it be DB index, memcached, or more memory attached to the system If we bring more than one changes to the system for test run then it will be hard to determine the adverse or positive affect of a change independently

c) Profile your system: We recorded following information during the tests. CPU, memory and disk usage using Munin, system throughput using NewRelic and system response times using JMeter

c) Do not forget longevity tests: While we run many short duration tests it is important to run 10 hour or a day long tests as well to figure out if there are any dormant memory leaks that might crash the system in a few days time

Below is my attempt to picture the optimization process in a simple flow chart:

In part II I will discuss specifics of our recent load testing exercise on Rails

Starling Queue over Unix Socket

Now that we have a memcache-client that uses Unix sockets instead of TCP socket, we automagically have the starling client that is capable of communication of unix sockets by simply doing the requiring ‘memcache_extensions.rb’ before creating the starling client. However, while Starling uses the memcache protocol, the Starling server itself only provides option to listen on a TCP socket. (more…)

Ruby Memcache Client using Unix Sockets

Using TCP sockets provides portability and location independence, there is an inherent overhead on performance due to TCP headers, checksum, flow control, and marshalling/unmarshalling of data packets. On Unix machines, this overhead can be avoided through use of Unix domain sockets resulting in performance gains.

Memcache server supports unix sockets, but the existing memcache clients available for Ruby only support TCP sockets. (more…)

Integrating Sphinx into WiceGrid

WiceGrid is a ruby data grid scaffolding plugin with some unique features such as sorting and out of the box search for grids generated using the plugin. While WiceGrid worked well for small data sets it does not scale and searches are extremely slow when working with millions of records.

To enable scalability of WiceGrid, which offers some great functionality, we went about integrating it with a search engine. This avoids database table locking during read queries and results in a significant performance improvement by using indexes optimized for fetching data summaries and associations. Our choice for search engine is Sphinx, although others too can be used to achieve similar results. (more…)

SEO – Comprehensive Search Engine Optimization Tutorial

We have regular tech sessions @ Confiz on a variety of web related technologies & tools, recently I got an opportunity to give a comprehensive tutorial on search engine optimization.

Presentation covers topics from effective page titles, meta descriptions, URL structures, custom 404 pages, effective Content & keyword analysis, optimizing images seo, Sitemaps, usage of Robots.txt, PageRank and Beyond.

SEO – Comprehensive Search Engine Optimization Tutorial – Confiz Solutions

Looking for help with search engine optimization for your website ? Click here to contact us.