Tuesday, September 29, 2009

cassandra connect to thrift port 9170 instead of 7000 or 7001

cassandra> connect
Connected to
cassandra> show keyspaces

for cassandra make sure you set the internal address

  <!-- TCP port, for commands and data -->
  <!-- UDP port, for membership communications (gossip) -->

If the listenaddress is set to localhost, the server won't able to discover itself.

cassandra 0.4.0 is out

Mailing list archives
adies and gentlemen, I give you Cassandra 0.4.0. It's hard to imagine that it has only been 2 months since our very first release[1]; an impressive amount of progress has been made. For example: * Nodes can now scale to billions of keys instead of millions. * There is support for multiple keyspaces (formerly known as tables). * You can now create snapshots * A new bootstrap feature for adding nodes to a running cluster * A new multiget API * Numerous improvements to the Thrift API And many, many, more. In fact, don't take my word for it, check out the changelog[2], or the 176 closed Jira issues (176!)[3]. Many thanks to all those that contributed patches, reviewed, tested, documented, or counseled.

Pretty cool.

On my macbook pro, I got about 2400 inserts / s with cassandra 0.4.0. I was not able to get a multinode setup on my ec2 enviroment.

Friday, September 18, 2009

recommended use for hbase htable and cached hbase configuration

RE: HBase Client Concerns
Many thanks again. I think I'll go initially with cached HBaseConfiguration and one new HTable instance per request thread and accept the resulting slowness overhead per request. When the HTablePool pause/retry param issue is resolved, I can switch to that. To workaround the problem of restarting the client app when the HBase servers are restarted, I can then maybe wrap HTablePool into a class which essentially clears the pool cache [forcing instantiating a new HTable] when any of the HTablePool.getTable() client calls time out, so the client app need not be restarted... Cheers,

Thursday, September 17, 2009

Check out javadocs for using hbase with hadoop!

org.apache.hadoop.hbase.mapreduce (HBase 0.20.0 API)
Package org.apache.hadoop.hbase.mapreduce Description

Provides HBase MapReduce Input/OutputFormats, a table indexing MapReduce job, and utility

Sunday, September 13, 2009

struts 2 veloctiy and javarebel rock!

Man java rebel rocks!

By using Javarebel with struts 2 and velocity I can eliminate the whole compile deploy wait cycle!

Use these maven opts at startup.

MAVEN_OPTS="-Xmx712m -XX:MaxPermSize=256m -noverify -Drebel.spring_plugin=false -Drebel.velocity_plugin=true -Drebel.struts2-plugin=true -Drebel.aspectj_plugin=true -javaagent:javarebel203/javareb
el.jar -Drebel.dirs=target/classes,../target/classes

To refresh my struts file.

cp src/main/resources/*struts*.xml target/work/webapp/WEB-INF/classes

To reload any classes

mvn compile

To refresh any velocity files

cp src/main/webapp/*.vm target/work/webapp/

Thursday, September 10, 2009

mysql benchmark on mac

Ran a quick benchmark on my macbook pro :P

1,000,000 rows inserted serially.
596023 ms taken.
1677 rows inserted /s.

MySQL is faster on my localhost than memcacheDB.

Monday, September 07, 2009

MemcacheDB benchmark

On ubuntu amazon ec2 large instance - using java and spy memcache client.
Ran memcachedb with command

memcachedb -m 2064 -p 11211 -u memcachedb -l -b 22000

Results - 10,000 puts of 140 characters in 22143 ms.
451 puts / s.

inserted 140 characters with a 5 character key.

key value store time taken to insert 10,000 recordsMB taken for storage inserts /s
memcachedb22143 ms1300 MB451 puts / s

Note - I had to add an artificial 1 ms delay after every insert. The memcache client would throw a queue size exception.
If time took 12143 ms, we would get roughly.

900 puts / s

Sunday, September 06, 2009

How to setup a Single node - hadoop cluster with hBASE

This guy has good instructions - http://blog.ibd.com/Part 1 setup Hadoop1. Get download and untar
wget http://apache.mirrors.hoobly.com/hadoop/core/hadoop-0.20.0/hadoop-0.20.0.tar.gz

tar xvf hadoop-0.20.0.tar.gz

2. Modify configuration files

cd hadoop-0.20.0/conf$edit core-site.xml

3. Edit site configuration for replication.

hadoop-0.20.0/conf$edit hdfs-site.xml

4. Set Map reduce node - note even if we are not using it.

hadoop-0.20.0/conf$edit mapred-site.xml


5. Check local ssh access

hadoop-0.20.0/conf$ssh localhost
Last login: Sat Sep 5 11:56:25 2009
Connection to localhost closed.

6. Format / initialize hadoop file system

hadoop-0.20.0$bin/hadoop namenode -format
09/09/05 12:01:28 INFO namenode.NameNode: STARTUP_MSG:
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = LOLCAT.local/
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.0
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.20 -r 763504; compiled by 'ndaley' on Thu Apr 9 05:18:40 UTC 2009
09/09/05 12:01:28 INFO namenode.FSNamesystem: fsOwner=spicysquid,staff,com.apple.sharepoint.group.2,admin,com.apple.sharepoint.group.1
09/09/05 12:01:28 INFO namenode.FSNamesystem: supergroup=supergroup
09/09/05 12:01:28 INFO namenode.FSNamesystem: isPermissionEnabled=true
09/09/05 12:01:28 INFO common.Storage: Image file of size 100 saved in 0 seconds.
09/09/05 12:01:28 INFO common.Storage: Storage directory /tmp/hadoop-spicysquid/dfs/name has been successfully formatted.
09/09/05 12:01:28 INFO namenode.NameNode: SHUTDOWN_MSG:
SHUTDOWN_MSG: Shutting down NameNode at LOLCAT.local/

8. Start everything up!
starting namenode, logging to /Users/spicysquid/hbase/hadoop20/hadoop-0.20.0/bin/../logs/hadoop-spicysquid-namenode-LOLCAT.local.out
localhost: starting datanode, logging to /Users/spicysquid/hbase/hadoop20/hadoop-0.20.0/bin/../logs/hadoop-spicysquid-datanode-LOLCAT.local.out
localhost: starting secondarynamenode, logging to /Users/spicysquid/hbase/hadoop20/hadoop-0.20.0/bin/../logs/hadoop-spicysquid-secondarynamenode-LOLCAT.local.out
starting jobtracker, logging to /Users/spicysquid/hbase/hadoop20/hadoop-0.20.0/bin/../logs/hadoop-spicysquid-jobtracker-LOLCAT.local.out
localhost: starting tasktracker, logging to /Users/spicysquid/hbase/hadoop20/hadoop-0.20.0/bin/../logs/hadoop-spicysquid-tasktracker-LOLCAT.local.out

9. Check urls to see if they are up:

The Job Tracker can be found at http://localhost:50030
The Task Tracker can be found at http://localhost:50060

The NameNode / Filesystem / log browser can be found at http://localhost:50070

Part 2 - Setup HBASE

1. download hbase 20 and untar

wget http://people.apache.org/~stack/hbase-0.20.0-candidate-3/hbase-0.20.0.tar.gz

tar xvf hbase-0.20.0.tar.gz

2. Modify conf/hbase-site.xml for hdfs server

<description>The directory shared by region servers.

3. start hbase
*** optional --- i had to start zookeeper first...
hbase-0.20.0/bin$./hbase-daemon.sh start zookeeper

4. create a table.
hbase-0.20.0/bin/hbase shell
1 row(s) in 0.0170 seconds

hbase(main):003:0> disable 'test'
09/09/05 13:22:13 INFO client.HBaseAdmin: Disabled test
0 row(s) in 4.0660 seconds
hbase(main):004:0> drop 'test'
09/09/05 13:22:17 INFO client.HBaseAdmin: Deleted test
0 row(s) in 0.0120 seconds
0 row(s) in 0.0040 seconds
0 row(s) in 0.0440 seconds
hbase(main):005:0> create 'test','data'
0 row(s) in 0.0500 seconds
hbase(main):006:0> list

5. Check
Check hdfs web url to see how the files are created
The NameNode / Filesystem / log browser can be found at http://localhost:50070

6. stop hbase
stop hadoop

stopping jobtracker
localhost: stopping tasktracker
stopping namenode
localhost: stopping datanode
localhost: stopping secondarynamenode