A week or so ago I had my first experience using someone else’s cluster on Amazon EC2. EC2 is the Amazon Elastic Compute Cloud. Users set up a virtual computing platform that runs on Amazon’s servers “in the cloud.” Amazon EC2 is not just another cluster. EC2 allows the user to create a disk image containing an operating system and all of the software they need to perform their computations. In my case, the disk image would contain Hadoop, R, Python and all of the R and Python packages I need for my work. This prevents the user (and the provider) from having to worry about providing or upgrading software and having compatibility issues.
No subscription is required. Users pay for the amount of resources used for the computing session. Hourly prices are very cheap, but accrue quickly. Additionally, Amazon charges for pretty much everything single thing you can do with an OS: transferring data to/from the cloud per GB, data storage per GB, CPU time per hour per core etc.
This is somewhat of a tangent, but EC2 was a brilliant business move in my opinion.
Anyway, life gets a bit more difficult when the EC2 instance […]