AWS php sysadmin WebApps

Building PHP with nginx, and fast-cgi on EC2

Here’s my quick and dirty guide to building PHP with nginx and fast-cgi on EC2:

yum install mysqld
yum install mysql
yum install mysql-server
yum install mysql-devel
service mysqld start
/usr/bin/mysqladmin -u root password 'your_password'
/usr/bin/mysqld_safe &
yum install php-fpm php-cli php-mysql php-gd php-imap php-ldap php-odbc php-pear
 php-xml php-xmlrpc php-eaccelerator php-magickwand php-magpierss php-mbstring p
hp-mcrypt php-mssql php-shout php-snmp php-soap php-tidy
yum install spawn-fcgi
# Next, download spawn-fcgi init.d shell script:
mv /etc/init.d/php_cgi
chmod +x /etc/init.d/php_cgi
# Start php app server, enter:
/etc/init.d/php_cgi start

# check to see if it's running
netstat -tulpn | grep :9000

Your /etc/nginx/nginx.conf file should look like this:

scalability hacking sysadmin TechBiz WebApps

Why is Foursquare Down? 3 Educated Guesses

Why is Foursquare down?

Update (5 October 2010 at 5:36 pm PDT) : The folks at Foursquare tell us why in a post-mortem. There are autosharding issues with MongoDB. Yup, my guesses were wrong, unless you consider MongoDB a kind of cache. 😉

I used to work for a few sites that required high scalability expertise. Now that we’re over 5 hours into the outage I’ll share some of my thoughts.

But before I do, I’d just like to say, I really hope that it’s nothing bad and I really like the Foursquare peeps. I’m not putting out this article to harsh on anybody, but just to share some knowledge I have. Outages happen to everybody!

Also, I do not feel that this meltdown is in any way indicative of Amazon’s EC2. I have a site that shares the same IP space and facility as Foursquare and we have had no outages today.

  • The worst case scenario is a full scale Magnolia meltdown. This is where because of a backup process that was off, they cannot restore ever from backup. Odds: unlikely.
  • Someone turned off caching. I’m not sure how cache dependent the architecture is at Foursquare. If someone turned off the cache and the cache is just plain gone, then the caches have to be re-built. Rebuilding caches, depending on the time and complexity of each query can take up to 100x more time that it takes to retrieve the cache. If there’s some cached item that takes 100 seconds per user, the site will be down for a long time. They can only put a user back on foursquare at a rate of 100 per second if that’s the case, unless they can concurrently run the re-building of the cache.
  • There’s an issue with a hacker who has broken through security and is wreaking havoc on Foursquare. It’s happened to the best sites, e.g. Google in the 90s, and it’s pretty tough to recover from. Sometimes you let the criminals in and do their worst while keeping the site up. Sometimes you have 0 tolerance.
  • I wish Foursquare the best of luck. I am more than happy to lend a hand to their issues, if they need another pair of eyes.

command-line How-To sysadmin

Commands I Use Frequently

Here’s a list of commands I use frequently, where the first number represents the number of times I used that command today:

86 git – the best version control software ever
59 cd – used to change directories on the command-line
54 ls – used to list files in a directory
41 vim – when textmate just isn’t fast enough for moving and manipulating text I use this text editor
24 grep – this is great for searching through code
21 sudo – I use this for stopping and starting servers and anything that requires super user access

I figured this out by using the following:

history | cut -c8-20 | sort > commands.txt

I created the following script in Perl:

#!/usr/bin/env perl

use strict;
use warnings;

my %h_list = ();
my @sorted = ();
my @listed = ();

open(LS, “commands.txt”);
while() {
if ($_ =~ /(\w+)/) {


foreach my $key (keys %h_list)
push @listed, $h_list{$key} . “\t” . $key;

@sorted = sort { $b <=> $a } @listed;
foreach (@sorted)
print $_ . “\n”;

AWS How-To sysadmin

EC2 Backup Script

This is a quick and dirty EC2 backup script for virtual unix servers that works just fine when crontabbed:


DATE=`date +%m%d%Y-%H%m%M`

s3cmd mb s3://$BUCKET

cd /mnt
mkdir img
ec2-bundle-vol -d /mnt/img -k /mnt/$PRIVATE_KEY -c /mnt/$PRIVATE_CERT -u $USERID -s 9999 –arch i386
cd /dev
mkdir loop
cd loop
mknod 0 b 7 0

ec2-upload-bundle -b $BUCKET -m /mnt/img/image.manifest.xml -a $AWS_ACCESS_ID -s $AWS_SECRET

# rm -rf /mnt/img
echo “please register $BUCKET/image.manifest.xml” >> /mnt/registerbackups.txt

sysadmin TechBiz

Amazon EC2 in the Enterprise

This is just a quick summary of what it was like implementing Amazon’s EC2 in an enterprise environment.

1. You’ll need to write your own LDAP plug-ins to interface with any access control lists. E.G. where I work WordPress is used for corporate communications so an LDAP plug-in had to be written to make sure the right people saw the right information.

2. Migration can be expensive if you’re using EBS on the first go. On windows, and I’m not sure why, it can cost about $50 to migrate 2GB of data into EBS. In linux, it happens at a fraction of that cost and as advertised.

3. Windows can be very expensive. Although they say it’s 12 cents per hour per small instance beware of hidden costs like authentication services and SQL server. With both, you are using a server at the cost of $1.35 / hour, which IMHO could be run cheaper with just a small linux instance and do the same thing at 10 cents per hour.

I’m pretty sure that with the right Amazon EC2 set up you could run a cluster of servers for a Fortune 500 company for under $1000.00 (one thousand dollars) per month without the CapEX costs associated with new hardware.

If you have any more questions about Amazon EC2 in the enterprise I’d be happy to answer them. Please ask them in the comments below.

How-To scalability hacking sysadmin TechBiz WebApps

How to Load Balance and Auto Scale with Amazon’s EC2

This blog post is a quick introduction to load balancing and auto scaling on with Amazon’s EC2.

I was kinda amazed about how easy it was.

Prelims: Download the load balancer API software, auto scaling software, and cloud watch software. You can get all three at a download page on Amazon.

Let’s load balancer two servers.

elb-create-lb lb-example --headers \
--listener "lb-port=80,instance-port=80,protocol=http" \
--availability-zones us-east-1a

The above creates a load balancer called “lb-example,” and will load balance traffic on port 80, i.e. the web pages that you serve.

To attach specific servers to the load balancer you just type:

elb-register-instances-with-lb lb-example --headers \
--instances i-example,i-example2

where i-example and i-example2 are the instance id’s of the servers you want added to the load balancer.

You’ll also want to monitor the health of the load balanced servers, so please add a health check:

elb-configure-healthcheck lb-example --headers \
--target "HTTP:80/index.html" --interval 30 --timeout 3 \
--unhealthy-threshold 2 --healthy-threshold 2

Now let’s set up autoscaling:

as-create-launch-config example3autoscale --image-id ami-mydefaultami \
--instance-type m1.small
as-create-auto-scaling-group example3autoscalegroup  \
--launch-configuration example3autoscale \
--availability-zones us-east-1a \
--min-size 2 --max-size 20 \
--load-balancers lb-example
as-create-or-update-trigger example3trigger \
--auto-scaling-group example3autoscalegroup --namespace "AWS/EC2" \
--measure CPUUtlization --statistic Average \
--dimensions "AutoScalingGroupName=example3autoscalegroup" \
--period 60 --lower-threshold 20 --upper-threshold 40 \
--lower-breach-increment=-1 --upper-breach-increment 1 \
--breach-duration 120

With the 3 commands above I’ve created an auto-scaling scenario where a new server is spawned and added to the load balancer every two minutes if the CPU Utilization is above 20% for more than 1 minute.

Ideally you want to set –lower-threshold to something high like 70 and –upper-threshold to 90, but I set both to 20 and 40 respectively just to be able to test.

I tested using siege.

Caveats: the auto-termination part is buggy, or simply didn’t work. As the load went down, the number of the server on-line remained the same. Anybody have thoughts on this?

What does auto-scaling and load balancing in the cloud mean? Well, the total cost of ownership for scalable, enterprise infrastructure just went down by lots. It also means that IT departments can just hire a cloud expert and deploy solutions from a single laptop instead of having to figure out the cost for hardware load balancers and physical servers.

The age of Just-In-Time IT just got ushered in with auto-scaling and load balancing in the cloud.

sysadmin TechBiz WebApps

Monitoring Websites on the Cheap: Screen and Sitebeagle

If you don’t fail fast enough, you’re on the slow road to success.

One idea that I recently failed was using a screen and sitebeagle to monitor sites.

It’s not a complete failure… it works okay.

Due to budget constraints, I put my screen and sitebeagle set up on a production server.

For some reason that production server ran out of space and became unresponsive. Screen no doubt caused this. I was alerted of the issue and did a reboot.

After the reboot, although Amazon’s monitoring tools told me the server was okay, the server was not. The MySQL database was in an EBS volume and needed to be re-mounted.

The solution I now have in place is still screen and sitebeagle. But I use another server with screen and sitebeagle on it to monitor the production server that gave me the issue in the first place.

It’s a question of who will monitor the monitors… in a world of web sites with few site users the answers pretty bleak. In the world of super popular commercial sites, the answer’s clear. The wisdom of crowds will monitor the web sites.

Announcements sysadmin WebApps

A Cross Platform Browser, Windows 2003 EC2 AMI

I recently created a cross platform browser, Windows 2003 EC2 AMI: ami-69739500

It has the following pre-installed:

  • gvim
  • IE 7
  • Firefox 3 with Web Developer, yslow & Firebug
  • opera
  • Putty SSH
  • Putty SCP

Pretty much with that list you’re all set to do troubleshooting for cross platform browser issues.

There’s IIS 6.0 and SQL Server, too.

I’ve linked the password to this ami at . It’s a short-coming of Windows AMIs on EC2 that I have to link the password, so please change it once you get into the instance.

command-line sysadmin WebApps

Doing Sysadmin on the iPhone

For checking up on sites in the enterprise, I use Alertsite. It was suggested to me by a VP I work with at McCann, Ed Recinto. It’s been a great tool.

For personal websites that I manage, I’ve been using something I rolled in newLISP, sitebeagle. Why? Because beagles are great watchdogs.

Very often, most problems can be solved with tweaking code, changing permissions, or upgrading and apache or mysql.

Very often, it’s the weekend, I’m sitting in a cafe, and get an alert from Nagios or Alertsite. With iSSH, on the iPhone, I can ssh into a LAMP server and do the work I need.

I can see things getting a bit more complex. What tools do you use to sysadmin from an iPhone?