Categories
Career Coding Questions Recruiting scalability hacking Social Media TechBiz

Coders Who Don’t Job Interview: Zed Shaw

I wrote a piece about the current state of job recruiting from a coder looking for work. I wondered:

What would it be like if you didn’t have to do a job interview?

(The non-tl;dr summary is below.)

By “job interview,” I just mean the normal process where I job candidate replies to an ad, contacts an employer directly, or works with a recruiter, and gets a job through that process. High-profile experts are courted, or work out a mutually beneficial deal where it doesn’t feel like an interview.

I asked around for folks that didn’t have to interview. One name that consistently came to the top was Zed Shaw.

Zed is the creator of the Mongrel Web Server, and a really great framework that is powered by Mongrel, Tir. Personally, I first heard of him from a video Leah Culver linked to on a talk that Zed gave, “The ACL is dead.” A careful viewing of that talk is always rewarded, especially if you are a coder freelancing for a corporation.

Here’s my interview with him (conducted over email). Thanks Zed!

Barce: What’s your own process for choosing the projects you want to work on?

Zed: Within my profession I try to just work on whatever is needed to get the
project or job done. Sometimes that ends up being a lot of crap work so
other people can do more important stuff. Professionally I don’t mind
this kind of work as it’s low investment and removes the pressure off
other folks who would rather do interesting things. I think I also tend
to pick off the lower level work because most of my original ideas are
usually too weird for a professional setting.

Personally, I tend to work on projects that match ideas I might have,
and usually they have a secondary motive that’s outside of programming.
Many times these ideas come from combining a couple of concepts, or
they’re based on a problem I’ve noticed, or they are just a kind of
funny joke or cool hack I thought up.

I think the most important thing is I don’t try to plan my inspiration
in my personal projects, but instead go with it when it comes. I don’t
have a “process”, and in fact I think “process” kills creativity.
Proess definitely helps make creative ideas a reality, but it doesn’t
create the initial concepts very well.

Professionally though, inspiration is for amateurs and I just do my
work.

Barce: What advice can you give someone who feels trapped by their job or surrounded by recruiters?

Zed: Well, if you’re trapped by your job then I’d say start working on
getting a new one. Nobody is every really *trapped*, but maybe you
can’t just quit right away. Instead, work on projects at home,
constantly look for new work, and move to where the work is. Even if
it’s temporary, moving to say San Francisco during the boom times could
be a major boost to your career.

I’d also say that going back to school is a good way to update your life
and change your profession. I’m a firm believer in getting government
student loans and using them to go to school. They’re cheap, low
interest, and the US government is usually very nice about letting you
pay them back. I’m not so sure about other places around the world
though.

Barcee: What’s the most disruptive technology you know about right now?

Zed: If I were to be honest, I’d have to say Facebook, even though I
absolutely hate it. It’s probably the one technology in recent history,
maybe after HTTP and the Browser, that is changing the way governments,
societies, and regular people work. It’s also sort of irritating that
the most important thing to hit most people’s lives is also one of the
most privacy invading companies in the world.

After that I’d have to say the rise of automated operations and
virtualized machines. Things like Xen, kvm, and even llvm as compiler
infrastructure are changing how systems are managed and deployed, which
then leads to bigger automation for large hetergenous networks. I’m
sort of waiting for operating systems to catch up and realize that their
configuration systems are getting in the way of real automation.

Barce: Thanks again, Zed, for the interview. The take aways that I hope readers get from this are:

  • Zed has open source projects that free him from the normal interviewing process. Building your own open source project is one way to free yourself.
  • “Professionally though, inspiration is for amateurs and I just do my work.”
  • “[W]ork on projects at home,
    constantly look for new work, and move to where the work is.”
  • Facebook is the most disruptive technology that’s changing governments… Virtualization / Cloud technologies are a 2nd.
Categories
scalability hacking sysadmin TechBiz WebApps

Why is Foursquare Down? 3 Educated Guesses

Why is Foursquare down?

Update (5 October 2010 at 5:36 pm PDT) : The folks at Foursquare tell us why in a post-mortem. There are autosharding issues with MongoDB. Yup, my guesses were wrong, unless you consider MongoDB a kind of cache. 😉

I used to work for a few sites that required high scalability expertise. Now that we’re over 5 hours into the outage I’ll share some of my thoughts.

But before I do, I’d just like to say, I really hope that it’s nothing bad and I really like the Foursquare peeps. I’m not putting out this article to harsh on anybody, but just to share some knowledge I have. Outages happen to everybody!

Also, I do not feel that this meltdown is in any way indicative of Amazon’s EC2. I have a site that shares the same IP space and facility as Foursquare and we have had no outages today.

  • The worst case scenario is a full scale Magnolia meltdown. This is where because of a backup process that was off, they cannot restore ever from backup. Odds: unlikely.
  • Someone turned off caching. I’m not sure how cache dependent the architecture is at Foursquare. If someone turned off the cache and the cache is just plain gone, then the caches have to be re-built. Rebuilding caches, depending on the time and complexity of each query can take up to 100x more time that it takes to retrieve the cache. If there’s some cached item that takes 100 seconds per user, the site will be down for a long time. They can only put a user back on foursquare at a rate of 100 per second if that’s the case, unless they can concurrently run the re-building of the cache.
  • There’s an issue with a hacker who has broken through security and is wreaking havoc on Foursquare. It’s happened to the best sites, e.g. Google in the 90s, and it’s pretty tough to recover from. Sometimes you let the criminals in and do their worst while keeping the site up. Sometimes you have 0 tolerance.
  • I wish Foursquare the best of luck. I am more than happy to lend a hand to their issues, if they need another pair of eyes.

Categories
Databases MySQL scalability hacking WebApps

Notes on adding more MySQL databases

Just notes for myself on adding more MySQL databases without shutting down the master database.

on existing slave:

/etc/init.d/mysqld stop

copy data dir from /var/lib/mysql and data from /var/run/mysqld to new slave database:

cd /var/lib
tar cvf Mysql_slave.tar mysql/*
scp Mysql_slave.tar root@new-db.com:/var/lib/.
cd /var/run
tar cvf Mysqld_slave.tar mysqld/*
scp Mysqld_slave.tar mysqld/*
scp Mysqld_slave.tar root@new-db.com:/var/run/.

copy /etc/my.cnf from old slave to new slave
add entry for new server-id

start existing slave:

cd /var/lib
tar xvf Mysql_slave.tar
cd /var/run
tar xvf Mysqld_slave.tar
/etc/init.d/mysqld start

start new slave:

/etc/init.d/mysqld start
mysql
start slave;

on masterdb:
e.g.:

grant replication slave on *.* to ‘repl’@’192.168.107.33’ identified by ‘password’;

test on master:
create database repl;

check on slave:
show databases; /* should show new database */

test on master:
drop database repl;

check on slave:
show databases; /* new database should be dropped */

Now it’s time to turn this into an automated shell script with Expect in there.

Categories
How-To scalability hacking

Part II: Getting to 600 Concurrent Users

I couldn’t sleep last night. I’m worried we’ll lose this client.

So just to be clear. I wasn’t part of the crew responsible for scaling this site. I had already set up a scalable architecture for the site, that would automatically and horizontally scale at Amazon. That idea got shot down for legal reasons that to my surprise haven’t been in play for awhile. Can we say, “Office politics?”

I totally recommend Amazon’s Autoscaling to anybody that’s new to this.

Instead of auto-scaling, the site was architected by a local San Francisco firm who I won’t mention here.

Let’s just hope enough people read this so that they won’t even have to know the name of the company and will just know the smell of an un-scaleable architecture.

Scalability requirement: 100,000 concurrent users

This is how they set it up:

  • two web servers
  • one database
  • four video transcoders that hits the master database
  • one more app server that hits the master database
  • no slave db 😀

If they had even googled ‘building scalable websites’ they would have come across a book that would have avoided all of this, Cal Henderson’s Building Scalable Websites. It should be mandatory reading for anybody working on a large website, and it just scratches the surface.

So, how did we get to 600 concurrent users?

We tweaked mysql by putting this in /etc/m.cnf:

[mysqld]
max_connections=10000
query_cache_size=50000000
thread_cache_size=16
thread_concurrency=16 # only works on Solaris and is ignored on other OSes

We ran siege and were able to get to about 300 concurrent users without breaking a sweat, but now apache was dying.

So we tweaked apache. We started out with this:

StartServers 8
MinSpareServers 5
MaxSpareServers 20
ServerLimit 256
MaxClients 256
MaxRequestsPerChild 4000

And ended up with this:

StartServers 150
MinSpareServers 50
MaxSpareServers 200
ServerLimit 256
MaxClients 256
MaxRequestsPerChild 4000

RAM and CPU were doubled.

Categories
Databases How-To scalability hacking

Scaling from 100 to 100000 concurrent users in a day?

Well, it looks pretty bad right now. A vendor just ceded control for web application architecture. Initial tests say that the site won’t do no more 100 users concurrently.

Who the hell makes a web app without a slave database and calls themselves website architects? Apparently these guys did.

Please start following if you want to see if this web app can make it to launch.

Categories
Databases scalability hacking WebApps

Benchmarking Inserts on Drizzle and MySQL

I’m not comparing apples to apples yet… but out of the box, drizzle does inserts faster than MySQL using the same table type, InnoDB.

Here’s what I’m comparing:
drizzle r1126 configured with defaults, and
MySQL 5.1.38 configured with

./configure --prefix=/usr/local/mysql --with-extra-charsets=complex \
--enable-thread-safe-client --enable-local-infile --enable-shared \
--with-plugins=partition,innobase

which is really nothing complicated.

SQL query caching is turned off on both database servers. Both are using the InnoDB engine plug-in.

I’m running these benchmarks on a MacBook Pro 2.4 GHz Intel Core 2 Duo with 2GB 1067 MHz DDR3 RAM.

I wrote benchmarking software about 2 years ago to test partitions but I’ve since abstracted the code to be database agnostic.

You can get the benchmarking code at Github.

At the command-line, you type:

php build_tables.php 10000 4 drizzle

where 10000 is the number of rows allocated total, and 4 is the number of partitions for those rows.

You can type the same thing for mysql:

php build_tables.php 10000 4 mysql

and get interesting results.

Here’s what I got:

MySQL

bash-3.2$ php build_tables.php 10000 4 mysql
Elapsed time between Start and Test_Code_Partition: 13.856538
last table for php partition: users_03
Elapsed time between No_Partition and Code_Partition: 14.740206
-------------------------------------------------------------
marker           time index            ex time         perct   
-------------------------------------------------------------
Start            1252376759.26094100   -                0.00%
-------------------------------------------------------------
No_Partition     1252376773.11747900   13.856538       48.45%
-------------------------------------------------------------
Code_Partition   1252376787.85768500   14.740206       51.54%
-------------------------------------------------------------
Stop             1252376787.85815000   0.000465         0.00%
-------------------------------------------------------------
total            -                     28.597209      100.00%
-------------------------------------------------------------
20000 rows inserted...

drizzle

bash-3.2$ php build_tables.php 10000 4 drizzle
Elapsed time between Start and Test_Code_Partition: 7.502141
last table for php partition: users_03
Elapsed time between No_Partition and Code_Partition: 7.072367
-------------------------------------------------------------
marker           time index            ex time         perct   
-------------------------------------------------------------
Start            1252376733.68141500   -                0.00%
-------------------------------------------------------------
No_Partition     1252376741.18355600   7.502141        51.47%
-------------------------------------------------------------
Code_Partition   1252376748.25592300   7.072367        48.52%
-------------------------------------------------------------
Stop             1252376748.25627400   0.000351         0.00%
-------------------------------------------------------------
total            -                     14.574859      100.00%
-------------------------------------------------------------
20000 rows inserted...

MySQL: 699 inserts per second
drizzle: 1372 inserts per second
As far as inserts go, drizzle is about 2 times faster out of the box than MySQL.

Categories
How-To scalability hacking sysadmin TechBiz WebApps

How to Load Balance and Auto Scale with Amazon’s EC2

This blog post is a quick introduction to load balancing and auto scaling on with Amazon’s EC2.

I was kinda amazed about how easy it was.

Prelims: Download the load balancer API software, auto scaling software, and cloud watch software. You can get all three at a download page on Amazon.

Let’s load balancer two servers.

elb-create-lb lb-example --headers \
--listener "lb-port=80,instance-port=80,protocol=http" \
--availability-zones us-east-1a

The above creates a load balancer called “lb-example,” and will load balance traffic on port 80, i.e. the web pages that you serve.

To attach specific servers to the load balancer you just type:

elb-register-instances-with-lb lb-example --headers \
--instances i-example,i-example2

where i-example and i-example2 are the instance id’s of the servers you want added to the load balancer.

You’ll also want to monitor the health of the load balanced servers, so please add a health check:

elb-configure-healthcheck lb-example --headers \
--target "HTTP:80/index.html" --interval 30 --timeout 3 \
--unhealthy-threshold 2 --healthy-threshold 2

Now let’s set up autoscaling:

as-create-launch-config example3autoscale --image-id ami-mydefaultami \
--instance-type m1.small
as-create-auto-scaling-group example3autoscalegroup  \
--launch-configuration example3autoscale \
--availability-zones us-east-1a \
--min-size 2 --max-size 20 \
--load-balancers lb-example
as-create-or-update-trigger example3trigger \
--auto-scaling-group example3autoscalegroup --namespace "AWS/EC2" \
--measure CPUUtlization --statistic Average \
--dimensions "AutoScalingGroupName=example3autoscalegroup" \
--period 60 --lower-threshold 20 --upper-threshold 40 \
--lower-breach-increment=-1 --upper-breach-increment 1 \
--breach-duration 120

With the 3 commands above I’ve created an auto-scaling scenario where a new server is spawned and added to the load balancer every two minutes if the CPU Utilization is above 20% for more than 1 minute.

Ideally you want to set –lower-threshold to something high like 70 and –upper-threshold to 90, but I set both to 20 and 40 respectively just to be able to test.

I tested using siege.

Caveats: the auto-termination part is buggy, or simply didn’t work. As the load went down, the number of the server on-line remained the same. Anybody have thoughts on this?

What does auto-scaling and load balancing in the cloud mean? Well, the total cost of ownership for scalable, enterprise infrastructure just went down by lots. It also means that IT departments can just hire a cloud expert and deploy solutions from a single laptop instead of having to figure out the cost for hardware load balancers and physical servers.

The age of Just-In-Time IT just got ushered in with auto-scaling and load balancing in the cloud.

Categories
command-line scalability hacking

Oddments: A Great Blog For Keeping Up With Drizzle and Gearman

Alan Kasindorf just introduced me to a great blog by Eric Day, Oddments.

If you are into learning about alternatives to MySQL like Drizzle, or how to scale writes to a database using Gearman, then I wholeheartedly recommend his blog.

I really like the samples of code he puts up that acts as a very useful, and direct tutorial to new technologies.