<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Planet Nectarius &#187; hbase</title>
	<atom:link href="http://nectarius.net/tag/hbase/feed/" rel="self" type="application/rss+xml" />
	<link>http://nectarius.net</link>
	<description>Nectarines are tasty</description>
	<lastBuildDate>Thu, 12 Jan 2012 22:58:16 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>HBase status</title>
		<link>http://nectarius.net/2010/07/02/hbase-status/</link>
		<comments>http://nectarius.net/2010/07/02/hbase-status/#comments</comments>
		<pubDate>Thu, 01 Jul 2010 20:20:48 +0000</pubDate>
		<dc:creator>Tim</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[hbase]]></category>

		<guid isPermaLink="false">http://nectarius.net/?p=579</guid>
		<description><![CDATA[After a successful Hadoop summit, we went along to the HBase meetup (#11) at Facebook to see how things are going. It turns out, pretty well. The community seems stronger than ever. Some of the main commiters are Facebook, StumbleUpon and Cloudera and they&#8217;re pushing significant work. The next version will be 0.90. It will [...]]]></description>
			<content:encoded><![CDATA[<p>After a successful Hadoop summit, we went along to the <a href="http://hbase.apache.org/">HBase</a> meetup (#11) at Facebook to see how things are going. It turns out, pretty well. The community seems stronger than ever. Some of the main commiters are Facebook, StumbleUpon and Cloudera and they&#8217;re pushing significant work.</p>
<p>The next version will be 0.90. It will be a reliability release, but also includes performance gains. The version change will break from hadoop version numbers. 0.90 was chosen as there&#8217;s a belief it is maturing towards a 1.0 release.</p>
<p>The main points I picked up are:<br />
* New batch importing allows writing hfiles directly and then just telling hbase where they are.<br />
* Taking advantage of appends in hdfs for genuine durability.<br />
* The namenode single point of failure is being addressed, facebook is planning to release their HA namenode.<br />
* Replication between clusters. Allows cross data center replication. Eventually consistent.<br />
* Tighter integration with zookeeper through a master rewrite.<br />
* Significant work to have less temperamental behaviour during compaction and splits.<br />
* Facebook are planning to release their distribution of hadoop and their highly available namenode.</p>
<p>All in all it&#8217;s very encouraging progress. I think there&#8217;s a case for us at Last.fm to look at HBase again soon.</p>
<p>Here are my notes from the meetup&#8230;</p>
<p>===Reliability===<br />
* Master overhaul. Facebook is making it highly available.<br />
* Test framework.<br />
* Testing failure scenarios.<br />
* Ops team friendly tools.<br />
* HBase fsck.<br />
* More performance metrics (in progress)</p>
<p>===HBase Master Rewrite===<br />
Why?<br />
* Master failover does not always work.<br />
* ZK is patched on.<br />
* Master to region server communication is inefficient.<br />
* The code is a bitch.</p>
<p>The crux<br />
* Better zookeeper integration<br />
* Use zookeeper to track how far operations have progressed. eg: moving a region to another region server.<br />
* Cleaner code.</p>
<p>Opens up future enhanchments<br />
* Master need not do META edits.<br />
* Region servers can recover their own regions on restarts.<br />
* Reporting on shutdown.<br />
* Limit concurrent major compactions across cluster. Because we know the progress.</p>
<p>Why not put META in zk?<br />
* Could be great. It&#8217;ll take some work yet. Seems likely in the future.</p>
<p>===HBase cluster replication===<br />
What?<br />
* Fully integrated replication between clusters.<br />
* Ships edits.<br />
* Can be cross data centre.<br />
* Eventually consistent across clusters.<br />
* Master-slave, master-master, circular.</p>
<p>How?<br />
* Master push.<br />
* Write ahead log shipping.<br />
* Meta data stored in zookeeper.<br />
* Logs get shipped in batch and applied to the new region server locally using the htable client.<br />
* Cute stuff.<br />
* Timestamp based</p>
<p>Also&#8230;<br />
* There is a new seperate utility program. A distcp-like map reduce job for copying tables between clusters.</p>
<p>===Bloom Filters, they&#8217;re back!===<br />
Why they were originally removed?<br />
* Tricky bugs during compaction.<br />
** Solution: Fixed with cleverness.<br />
* We had to estimate key size ahead of time, and under certain conditions the memory use bloated enough to make them counter productive.<br />
** Solution: Fold / compress / cleverness.</p>
<p>Now.<br />
* They don&#8217;t bloat memory uselessly.<br />
* Defaulting to 1% error rate. Configureable.<br />
* Great for exact queries (not good for scans, or if you know what you are always querying for already exists they&#8217;re a waste).</p>
<p>Usage.<br />
* super granluarity, defaults to off.<br />
* tweak max fold rate for compression if you know how big your rows are.<br />
* property for turning them off at any time.<br />
* you can turn them on on pre-existing rows, and they get added as compactions happen.</p>
<p>See HBASE-1200 for more information.</p>
<p>===HBase Bulk Loads===<br />
* Better than the last one.<br />
* Skips rpc paths.<br />
* 10x faster then api use.<br />
* Writes mapred output directly into hfiles on hdfs.<br />
* They can be loaded into hbase easily. You just have to just tell it where they are.</p>
<p>steps<br />
* Run mapred job.<br />
* bin/hbase completebulkload /output-path/ tablename</p>
<p>* also new importtsv tool.</p>
<p>===Miscellaneous (fluffy stuff)===<br />
Maven now the build system, all the way.<br />
* it&#8217;s working.</p>
<p>Logo change.<br />
* no one seems too attached to the bass cleff symbol.</p>
<p>HBASE-50 Facebook summer of code project.<br />
* implementing snapshots (they&#8217;ll be a transaction).<br />
* Design plan and implementation is looking so good, Stack says the committers should read it as they might learn something.</p>
<p>===Performance (in progress)===<br />
Reduce IO around splits.<br />
* Currently only triggered after compactions. So you rewrite the data both sides of the split.<br />
* looks at the sum of all storefiles, so compactions don&#8217;t have to happen.<br />
* checked after flush not compaction.<br />
* Much faster.<br />
* Avoids 50% of the io during the split.</p>
<p>Reduce time regions go offline.<br />
* Splits, load balancing, region server failover.<br />
* Fixes<br />
** make splits faster<br />
** Double flush memstore of region close. One before the close, one after, so that the flush after close is super fast. Which means reassigning regions wont make a region go offline for many seconds.<br />
** Using zookeeper for more intelligent region movement.</p>
<p>Concurrency and priorities.<br />
* Added multi threading to flushes and compactions.<br />
* Multi threading of master messages.<br />
* Flushes and compactions now have priorities associated with them.</p>
<p>HFile seek/reseek.<br />
* Projections.<br />
** seek to columns you want, not the start of the row.<br />
** seek the the versions you want not the start of the column. </p>
<p>Configurable WAL<br />
* HDFS appends to wal, provides durability.<br />
* Optional deferred log flush that does not block requests, but constantly appends (aka scrobble server). For when 3 seconds data loss in a failure is acceptable.<br />
* You can always disable WAL completely for speed.</p>
<p>Other stuff.<br />
* Internally storing min/max timestamp of each file, for allowing skipping files that don&#8217;t over lap.<br />
* Faster enable/disable/drop of tables.</p>
]]></content:encoded>
			<wfw:commentRss>http://nectarius.net/2010/07/02/hbase-status/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>lots of stuff happens sometimes</title>
		<link>http://nectarius.net/2009/07/18/lots-of-stuff-happens-sometimes/</link>
		<comments>http://nectarius.net/2009/07/18/lots-of-stuff-happens-sometimes/#comments</comments>
		<pubDate>Sat, 18 Jul 2009 12:51:24 +0000</pubDate>
		<dc:creator>Tim</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[dumbo]]></category>
		<category><![CDATA[hbase]]></category>
		<category><![CDATA[last.fm]]></category>
		<category><![CDATA[moving]]></category>

		<guid isPermaLink="false">http://nectarius.net/?p=556</guid>
		<description><![CDATA[Wow it&#8217;s been a long time with no blogging.. Just thought I&#8217;d post some abbreviated updates. The lease on my current place runs out on the 31st, so I&#8217;m looking for rooms again. I saw Fever Ray at Shepard&#8217;s Bush Empire on Thursday. It was amazing. At Last.fm we&#8217;ve been working hard on the next [...]]]></description>
			<content:encoded><![CDATA[<p>Wow it&#8217;s been a long time with no blogging.. Just thought I&#8217;d post some abbreviated updates.</p>
<p>The lease on my current place runs out on the 31st, so I&#8217;m looking for rooms again.</p>
<p>I saw <a href="http://www.last.fm/music/Fever+Ray">Fever Ray</a> at Shepard&#8217;s Bush Empire on Thursday. It was amazing.</p>
<p>At Last.fm we&#8217;ve been working hard on the next iteration of search (with neat auto-completion) and launched a beta to subscribers on Wednesday.</p>
<p>I&#8217;ve been playing with HBase, and loaded a 700 million cell table into a 5 node cluster with no problems at all. I&#8217;m also setting up this mini cluster so I can run <a href="http://klbostee.github.com/dumbo/">Dumbo</a> jobs over HBase.</p>
<p>Also of note: <a href="http://www.last.fm/about/jobs">Last.fm is hiring</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://nectarius.net/2009/07/18/lots-of-stuff-happens-sometimes/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>on the tubes</title>
		<link>http://nectarius.net/2008/08/30/on-the-tubes/</link>
		<comments>http://nectarius.net/2008/08/30/on-the-tubes/#comments</comments>
		<pubDate>Sat, 30 Aug 2008 10:24:57 +0000</pubDate>
		<dc:creator>Tim</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[hbase]]></category>
		<category><![CDATA[huguk]]></category>

		<guid isPermaLink="false">http://nectarius.net/?p=487</guid>
		<description><![CDATA[Tim is on the internets.]]></description>
			<content:encoded><![CDATA[<p>Tim is <a href="http://skillsmatter.com/podcast/home/postgresql-to-hbase-replication">on the internets</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://nectarius.net/2008/08/30/on-the-tubes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>something fangled</title>
		<link>http://nectarius.net/2008/06/25/something-fangled/</link>
		<comments>http://nectarius.net/2008/06/25/something-fangled/#comments</comments>
		<pubDate>Tue, 24 Jun 2008 20:44:36 +0000</pubDate>
		<dc:creator>Tim</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[hbase]]></category>
		<category><![CDATA[postgresql]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[skytools]]></category>

		<guid isPermaLink="false">http://nectarius.net/?p=481</guid>
		<description><![CDATA[I&#8217;m really excited about my current project. It&#8217;s super interesting. Not at all trivial and potentially very useful. I&#8217;m playing with things like Thrift, HBase/Hadoop, Postgresql, Skytools. I&#8217;ll be gluing things in Python, with a touch of Java to watch what happens to HBase. I&#8217;ve never programmed in Python before and I&#8217;m loving it. It [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m really excited about my current project. It&#8217;s super interesting. Not at all trivial and potentially very useful. I&#8217;m playing with things like Thrift, HBase/Hadoop, Postgresql, Skytools. I&#8217;ll be gluing things in Python, with a touch of Java to watch what happens to HBase.</p>
<p>I&#8217;ve never programmed in Python before and I&#8217;m loving it. It truely is <a title=":)" href="http://xkcd.com/353/">fun</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://nectarius.net/2008/06/25/something-fangled/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
	</channel>
</rss>

