<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Scaling Audiogalaxy to 80 million daily page views</title>
	<atom:link href="http://www.spiteful.com/2008/02/27/scaling-audiogalaxy-to-80-million-daily-page-views/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.spiteful.com/2008/02/27/scaling-audiogalaxy-to-80-million-daily-page-views/</link>
	<description></description>
	<pubDate>Fri, 29 Aug 2008 04:08:46 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5</generator>
		<item>
		<title>By: Tom</title>
		<link>http://www.spiteful.com/2008/02/27/scaling-audiogalaxy-to-80-million-daily-page-views/#comment-151</link>
		<dc:creator>Tom</dc:creator>
		<pubDate>Wed, 19 Mar 2008 18:04:42 +0000</pubDate>
		<guid isPermaLink="false">http://www.spiteful.com/2008/02/27/scaling-audiogalaxy-to-80-million-daily-page-views/#comment-151</guid>
		<description>Wow, Tomas -- thanks for the great notes about MySQL Cluster.  Glad to hear that you don't think Cluster is ready for primetime yet.  It is always hard to know before you use a product how well it will perform, particularly with failures.  Stress testing can only help you so much.  

Sometimes I wish there was some sort of independent wiki or resource that documented how usable projects are.  Given how proven they are, something like Apache or PHP or memcached are probably safe to use.  But for newer things, it can be hard to find examples of other people using them.  Benchmarks are nice, but I want to hear real world stories about how they performed under real world failures -- DNS problems, hard drive failures, bad memory, etc.</description>
		<content:encoded><![CDATA[<p>Wow, Tomas &#8212; thanks for the great notes about MySQL Cluster.  Glad to hear that you don&#8217;t think Cluster is ready for primetime yet.  It is always hard to know before you use a product how well it will perform, particularly with failures.  Stress testing can only help you so much.  </p>
<p>Sometimes I wish there was some sort of independent wiki or resource that documented how usable projects are.  Given how proven they are, something like Apache or PHP or memcached are probably safe to use.  But for newer things, it can be hard to find examples of other people using them.  Benchmarks are nice, but I want to hear real world stories about how they performed under real world failures &#8212; DNS problems, hard drive failures, bad memory, etc.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tomas Doran</title>
		<link>http://www.spiteful.com/2008/02/27/scaling-audiogalaxy-to-80-million-daily-page-views/#comment-143</link>
		<dc:creator>Tomas Doran</dc:creator>
		<pubDate>Wed, 19 Mar 2008 09:44:05 +0000</pubDate>
		<guid isPermaLink="false">http://www.spiteful.com/2008/02/27/scaling-audiogalaxy-to-80-million-daily-page-views/#comment-143</guid>
		<description>Nice article, thanks. 

I have to say that from my experience MySQL cluster is to be avoided as it's horrifically flakey. Whilst it is meant to be scalable and robust, having tried running it in a large production environment (to store metadata about cache items in memcache) it has a number of massive problems:

. Tendency to core dump, usually at the worst times. And not just part of your cluster core dumps, it *all* core dumps. (e.g. 2 people saying TRUNCATE TABLE for the same table at the same time == coredump)

. Never frees space. Deleting rows just marks them dead, it doesn't actually free them. So unless your data set / key values are constant then you're stuffed. This is similar to PostgreSQL, except MySQL cluster *does not* have a VACUUM which frees the space. This obviously compounds the problem above - when your DB is about to get full, a couple of your admins get paged, and truncate a no longer used table.. Bang goes your cluster.

. If your cluster core-dumps, and you need to re-load it from disk, this takes *AGES* (e.g. &#62; 30mins for ~ 4G of data across 4 machines)

. If you don't re-load the cluster from disk, but just nuke it and start again - your web servers start issuing queries against tables which no longer exist (until you re-create them). This brings the cluster to a grinding halt. About 10 sessions saying 'SELECT * FROM non_existant_table' or generally any query to 'non_existant_table' is *more than enough* to make the cluster unuseable (&#62;3 mins for a new connection).

As our cache system had to deal with all of this flakyness, we ended up with MySQL being an *optional* component (needed for performance, but if not there would not block requests - however if it wasn't there cache clears had to be pretty shotgun).. We moved to just using a load of standalone innoDB instances, so that any one could fall over, and only affect a portion of our client-base, as it wouldn't drag all the others down with it..

If we'd known that MySQL Cluster was so unreliable / untrustable in advance, we never have used it and/or designed our cache system in a different way.

It's a shame, as it's a good looking project, and if it's failure modes weren't so abysmal, it'd be awesome, but I won't in any way trust it again until I hear of someone actually using it in production at a reasonable scale (outside of mysql's marketing literature).</description>
		<content:encoded><![CDATA[<p>Nice article, thanks. </p>
<p>I have to say that from my experience MySQL cluster is to be avoided as it&#8217;s horrifically flakey. Whilst it is meant to be scalable and robust, having tried running it in a large production environment (to store metadata about cache items in memcache) it has a number of massive problems:</p>
<p>. Tendency to core dump, usually at the worst times. And not just part of your cluster core dumps, it *all* core dumps. (e.g. 2 people saying TRUNCATE TABLE for the same table at the same time == coredump)</p>
<p>. Never frees space. Deleting rows just marks them dead, it doesn&#8217;t actually free them. So unless your data set / key values are constant then you&#8217;re stuffed. This is similar to PostgreSQL, except MySQL cluster *does not* have a VACUUM which frees the space. This obviously compounds the problem above - when your DB is about to get full, a couple of your admins get paged, and truncate a no longer used table.. Bang goes your cluster.</p>
<p>. If your cluster core-dumps, and you need to re-load it from disk, this takes *AGES* (e.g. &gt; 30mins for ~ 4G of data across 4 machines)</p>
<p>. If you don&#8217;t re-load the cluster from disk, but just nuke it and start again - your web servers start issuing queries against tables which no longer exist (until you re-create them). This brings the cluster to a grinding halt. About 10 sessions saying &#8216;SELECT * FROM non_existant_table&#8217; or generally any query to &#8216;non_existant_table&#8217; is *more than enough* to make the cluster unuseable (&gt;3 mins for a new connection).</p>
<p>As our cache system had to deal with all of this flakyness, we ended up with MySQL being an *optional* component (needed for performance, but if not there would not block requests - however if it wasn&#8217;t there cache clears had to be pretty shotgun).. We moved to just using a load of standalone innoDB instances, so that any one could fall over, and only affect a portion of our client-base, as it wouldn&#8217;t drag all the others down with it..</p>
<p>If we&#8217;d known that MySQL Cluster was so unreliable / untrustable in advance, we never have used it and/or designed our cache system in a different way.</p>
<p>It&#8217;s a shame, as it&#8217;s a good looking project, and if it&#8217;s failure modes weren&#8217;t so abysmal, it&#8217;d be awesome, but I won&#8217;t in any way trust it again until I hear of someone actually using it in production at a reasonable scale (outside of mysql&#8217;s marketing literature).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Oz</title>
		<link>http://www.spiteful.com/2008/02/27/scaling-audiogalaxy-to-80-million-daily-page-views/#comment-24</link>
		<dc:creator>Oz</dc:creator>
		<pubDate>Sun, 02 Mar 2008 01:11:27 +0000</pubDate>
		<guid isPermaLink="false">http://www.spiteful.com/2008/02/27/scaling-audiogalaxy-to-80-million-daily-page-views/#comment-24</guid>
		<description>Found your site on news.ycombinator.com 

Damn, those were good days. I remember downloading 30 Temptations songs on Good Friday. Over dialup, too.
I wonder if my parents figured out why they got no calls that day...oh well..</description>
		<content:encoded><![CDATA[<p>Found your site on news.ycombinator.com </p>
<p>Damn, those were good days. I remember downloading 30 Temptations songs on Good Friday. Over dialup, too.<br />
I wonder if my parents figured out why they got no calls that day&#8230;oh well..</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: &#160; Scaling Audiogalaxy&#8230;&#160;by&#160;Performance Within Reach</title>
		<link>http://www.spiteful.com/2008/02/27/scaling-audiogalaxy-to-80-million-daily-page-views/#comment-8</link>
		<dc:creator>&#160; Scaling Audiogalaxy&#8230;&#160;by&#160;Performance Within Reach</dc:creator>
		<pubDate>Fri, 29 Feb 2008 13:02:56 +0000</pubDate>
		<guid isPermaLink="false">http://www.spiteful.com/2008/02/27/scaling-audiogalaxy-to-80-million-daily-page-views/#comment-8</guid>
		<description>[...] Scaling Audiogalaxy to 80 million daily page views &#124; Spiteful.com   For our most heavily accessed data set, we had an extremely good read/write ratio, so we were able to fan out to about 20 slaves from a single master. This particular database had several hundred million rows, which challenged the limits of our hardware (periodically, we had to clean out stale data when it got too large), so one trick we used was index-segmentation. Different sets of slaves had different indexes, and our database access layer could pick a different cluster based on the necessary index. Specifically, the tables in this database generally had an ID and a string, but the index on the string was only necessary for some queries. So, on some slaves we simply didn’t have the string index. This allowed those machines to keep the entire ID index in memory, which was a huge performance boost. [...]</description>
		<content:encoded><![CDATA[<p>[&#8230;] Scaling Audiogalaxy to 80 million daily page views | Spiteful.com   For our most heavily accessed data set, we had an extremely good read/write ratio, so we were able to fan out to about 20 slaves from a single master. This particular database had several hundred million rows, which challenged the limits of our hardware (periodically, we had to clean out stale data when it got too large), so one trick we used was index-segmentation. Different sets of slaves had different indexes, and our database access layer could pick a different cluster based on the necessary index. Specifically, the tables in this database generally had an ID and a string, but the index on the string was only necessary for some queries. So, on some slaves we simply didn’t have the string index. This allowed those machines to keep the entire ID index in memory, which was a huge performance boost. [&#8230;]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sue Massey</title>
		<link>http://www.spiteful.com/2008/02/27/scaling-audiogalaxy-to-80-million-daily-page-views/#comment-6</link>
		<dc:creator>Sue Massey</dc:creator>
		<pubDate>Wed, 27 Feb 2008 16:24:02 +0000</pubDate>
		<guid isPermaLink="false">http://www.spiteful.com/2008/02/27/scaling-audiogalaxy-to-80-million-daily-page-views/#comment-6</guid>
		<description>I found your site on google blog search and read a few of your other posts.  Keep up the good work.  Just added your RSS feed to my feed reader.  Look forward to reading more from you.

- Sue.</description>
		<content:encoded><![CDATA[<p>I found your site on google blog search and read a few of your other posts.  Keep up the good work.  Just added your RSS feed to my feed reader.  Look forward to reading more from you.</p>
<p>- Sue.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kennon Ballou</title>
		<link>http://www.spiteful.com/2008/02/27/scaling-audiogalaxy-to-80-million-daily-page-views/#comment-5</link>
		<dc:creator>Kennon Ballou</dc:creator>
		<pubDate>Wed, 27 Feb 2008 16:17:15 +0000</pubDate>
		<guid isPermaLink="false">http://www.spiteful.com/2008/02/27/scaling-audiogalaxy-to-80-million-daily-page-views/#comment-5</guid>
		<description>As one of the guys writing the PHP code that you had to scale, I have to say you did a great job :)

It's really amazing to think about what we were doing back then! What a crazy ride.</description>
		<content:encoded><![CDATA[<p>As one of the guys writing the PHP code that you had to scale, I have to say you did a great job <img src='http://www.spiteful.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /><br />
It&#8217;s really amazing to think about what we were doing back then! What a crazy ride.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Dynamic Page Served (once) in 1.088 seconds -->
