载入中。。。 'S bLog
 
载入中。。。
 
载入中。。。
载入中。。。
载入中。。。
载入中。。。
载入中。。。
 
填写您的邮件地址,订阅我们的精彩内容:


 
Apache Zebra Wiki
[ 2011/7/7 15:41:00 | By: 梦翔儿 ]
 

zebra是apache的一个开源项目,关于列存储,管理物理存储与元数据管理,有效的数据序列化。

Apache Zebra Wiki

Introduction

Zebra is a storage layer that provides a high level data access abstraction and a tabular view of data in Hadoop, and could free Pig users from implementing their own data storage/retrieval code. It provites

  • columnar storage format for fast data projection
  • schema language to manage physical storage metadata
  • CPU/space-efficient data serialization

In the future, it could also support predicate pushdown for further performance improvement. Initially, Zebra is released as a contrib project in Pig and can become a hadoop subproject later on.

Prerequisite

Zebra requires Hadoop 20 (as of July 24th, 2009 with Hadoop patch 6150) that supports TFile and works with Pig 0.3.0 with patch PIG-660. This patch makes PIG work with Hadoop 20. Zebra has been submitted as PIG-833.

Getting Zebra

Zebra has been committed as a Pig contrib project at:

Zebra source code

Compilation prerequisite:

  • JDK 1.6
  • Ant 1.7.1
  • Javacc 4.2

How to compile:

  • check out latest PIG trunk
  • apply the latest patch from PIG-660
  • copy hadoop20.jar attached to PIG-833 to Pig's top level ./lib

  • run 'ant jar' (generate Pig binary compatible with Hadoop 20)
  • run 'ant -Dtestcase=none test-core' (for zebra tests)
  • cd contrib/zebra
  • ant jar
  • ant test (for tests)

Zebra jar will be generated at build/contrib/zebra directory

Running Zebra

Sample Mapreduced code, Pig scripts attached to this wiki.

Java doc is available at Zebra JavaDoc

http://wiki.apache.org/pig/zebra

 
 
  • 标签:Zebra 列存储 
  • 发表评论:
    载入中。。。

     
     
     

    梦翔儿网站 梦飞翔的地方 http://www.dreamflier.net
    中华人民共和国信息产业部TCP/IP系统 备案序号:辽ICP备09000550号

    Powered by Oblog.