ÔØÈëÖС£¡£¡£ 'S bLog
 
ÔØÈëÖС£¡£¡£
 
ÔØÈëÖС£¡£¡£
ÔØÈëÖС£¡£¡£
ÔØÈëÖС£¡£¡£
ÔØÈëÖС£¡£¡£
ÔØÈëÖС£¡£¡£
 
ÌîдÄúµÄÓʼþµØÖ·£¬¶©ÔÄÎÒÃǵľ«²ÊÄÚÈÝ£º


 
»ùÓÚ hive µÄÈÕÖ¾Êý¾Ýͳ¼ÆÊµÕ½
[ 2010/12/8 11:40:00 | By: ÃÎÏè¶ù ]
 

Ò»¡¢           hive ¼ò½é

        hive ÊÇÒ»¸ö»ùÓÚ hadoop µÄ¿ªÔ´Êý¾Ý²Ö¿â¹¤¾ß£¬ÓÃÓÚ´æ´¢ºÍ´¦Àíº£Á¿½á¹¹»¯Êý¾Ý¡£    Ëü°Ñº£Á¿Êý¾Ý´æ´¢ÓÚ hadoop Îļþϵͳ£¬¶ø²»ÊÇÊý¾Ý¿â£¬µ«ÌṩÁËÒ»Ì×ÀàÊý¾Ý¿âµÄÊý¾Ý´æ´¢ºÍ´¦Àí»úÖÆ£¬²¢²ÉÓà HQL £¨Àà SQL £©ÓïÑÔ¶ÔÕâЩÊý¾Ý½øÐÐ×Ô¶¯»¯¹ÜÀíºÍ´¦Àí¡£ÎÒÃÇ¿ÉÒÔ°Ñ hive Öк£Á¿½á¹¹»¯Êý¾Ý¿´³ÉÒ»¸ö¸öµÄ±í£¬¶øÊµ¼ÊÉÏÕâЩÊý¾ÝÊÇ·Ö²¼Ê½´æ´¢ÔÚ HDFS Öеġ£ Hive ¾­¹ý¶ÔÓï¾ä½øÐнâÎöºÍת»»£¬×îÖÕÉú³ÉһϵÁлùÓÚ hadoop µÄ map/reduce ÈÎÎñ£¬Í¨¹ýÖ´ÐÐÕâЩÈÎÎñÍê³ÉÊý¾Ý´¦Àí¡£

        Hive µ®ÉúÓÚ facebook µÄÈÕÖ¾·ÖÎöÐèÇó£¬Ãæ¶Ôº£Á¿µÄ½á¹¹»¯Êý¾Ý£¬ hive ÒԽϵ͵ijɱ¾Íê³ÉÁËÒÔÍùÐèÒª´ó¹æÄ£Êý¾Ý¿â²ÅÄÜÍê³ÉµÄÈÎÎñ£¬²¢ÇÒѧϰÃż÷Ïà¶Ô½ÏµÍ£¬Ó¦Óÿª·¢Áé»î¶ø¸ßЧ¡£

        Hive ×Ô 2009.4.29 ·¢²¼µÚÒ»¸ö¹Ù·½Îȶ¨°æ 0.3.0 ÖÁ½ñ£¬²»¹ýÒ»ÄêµÄʱ¼ä£¬ÕýÔÚÂýÂýÍêÉÆ£¬ÍøÉÏÄÜÕÒµ½µÄÏà¹Ø×ÊÁÏÏ൱ÉÙ£¬ÓÈÆäÖÐÎÄ×ÊÁϸüÉÙ£¬±¾ÎĽáºÏÒµÎñ¶Ô hive µÄÓ¦ÓÃ×öÁËһЩ̽Ë÷£¬²¢°ÑÕâЩ¾­Ñé×öÒ»¸ö×ܽᣬËùνǰ³µÖ®¼ø£¬Ï£Íû¶ÁÕßÄÜÉÙ×ßһЩÍä·¡£

        Hive µÄ¹Ù·½ wiki Çë²Î¿¼ÕâÀï :

        http://wiki.apache.org/hadoop/Hive

        ¹Ù·½Ö÷Ò³ÔÚÕâÀ

        http://hadoop.apache.org/hive/

        hive-0.5.0 Ô´Âë°üºÍ¶þ½øÖÆ·¢²¼°üµÄÏÂÔØµØÖ·

        http://labs.renren.com/apache-mirror/hadoop/hive/hive-0.5.0/

¶þ¡¢           ²¿Êð

        ÓÉÓÚ Hive ÊÇ»ùÓÚ hadoop µÄ¹¤¾ß£¬ËùÒÔ hive µÄ²¿ÊðÐèÒªÒ»¸öÕý³£ÔËÐÐµÄ hadoop »·¾³¡£ÒÔϽéÉÜ hive µÄ¼òµ¥²¿ÊðºÍÓ¦Óá£

        ²¿Êð»·¾³£º

        ²Ù×÷ϵͳ£º Red Hat Enterprise Linux AS release 4 (Nahant Update 7)

        Hadoop £º hadoop-0.20.2 £¬Õý³£ÔËÐÐ

        ²¿Êð²½ÖèÈçÏ£º

1¡¢   ÏÂÔØ×îа汾·¢²¼°ü hive-0.5.0-dev.tar.gz £¬´«µ½ hadoop µÄ namenode ½ÚµãÉÏ£¬½âѹµÃµ½ hive Ŀ¼¡£¼ÙÉè·¾¶Îª£º /opt/hadoop/hive-0.5.0-bin

2¡¢   ÉèÖû·¾³±äÁ¿ HIVE_HOME £¬Ö¸Ïò hive ¸ùĿ¼ /opt/hadoop/hive-0.5.0-bin ¡£ÓÉÓÚ hadoop ÒÑÔËÐУ¬¼ì²é»·¾³±äÁ¿ JAVA_HOME ºÍ HADOOP_HOME ÊÇ·ñÕýÈ·ÓÐЧ¡£

3¡¢   Çл»µ½ $HIVE_HOME Ŀ¼£¬ hive ÅäÖÃĬÈϼ´¿É£¬ÔËÐÐ bin/hive ¼´¿ÉÆô¶¯ hive £¬Èç¹ûÕý³£Æô¶¯£¬½«»á³öÏÖ¡° hive> ¡±Ìáʾ·û¡£

4¡¢   ÔÚÃüÁîÌáʾ·ûÖÐÊäÈë¡° show tables; ¡±£¬Èç¹ûÕý³£ÔËÐУ¬ËµÃ÷ÒѲ¿Êð³É¹¦£¬¿É¹©Ê¹Óá£

³£¼ûÎÊÌ⣺

1¡¢        Ö´ÐС° show tables; ¡±ÃüÁîÌáʾ¡° FAILED: Error in metadata: java.lang.IllegalArgumentException: URI:  does not have a scheme ¡±£¬ÕâÊÇÓÉÓÚ hive ÕÒ²»µ½´æ·ÅÔªÊý¾Ý¿âµÄÊý¾Ý¿â¶øµ¼Öµģ¬ÐÞ¸Ä conf/ hive-default.xml ÅäÖÃÎļþÖÐµÄ hive.metastore.local Ϊ true ¼´¿É¡£ÓÉÓÚ hive °Ñ½á¹¹»¯Êý¾ÝµÄÔªÊý¾ÝÐÅÏ¢·ÅÔÚµÚÈý·½Êý¾Ý¿â£¬´Ë´¦ÉèÖÃΪ true £¬ hive ½«ÔÚ±¾µØ´´½¨ derby Êý¾Ý¿âÓÃÓÚ´æ·ÅÔªÊý¾Ý¡£µ±È»Èç¹ûÓÐÐèÒªÒ²¿ÉÒÔ²ÉÓà mysql µÈµÚÈý·½Êý¾Ý¿â´æ·ÅÔªÊý¾Ý£¬²»¹ýÕâʱ hive.metastore.local µÄÅäÖÃֵӦΪ false ¡£

2¡¢        Èç¹ûÄãÒÑÓÐÒ»Ì× nutch1.0 ϵͳÕýÔÚÅÜ£¬¶øÄã²»Ïëµ¥¶ÀÔÙÈ¥²¿ÊðÒ»Ì× hadoop »·¾³£¬Äã¿ÉÒÔÖ±½ÓʹÓà nutch1.0 ×Ô´øµÄ hadoop »·¾³£¬µ«ÕâÑùµÄ²¿Êð»áµ¼Ö hive ²»ÄÜÕý³£ÔËÐУ¬ÌáʾÕÒ²»µ½Ä³Ð©·½·¨¡£ÕâÊÇÓÉÓÚ nutch1.0 ʹÓÃÁË commons-lang-2.1.jar Õâ¸ö°ü£¬¶ø hive ÐèÒªµÄÊÇ commons-lang-2.4.jar £¬ÏÂÔØÒ»¸ö 2.4 °æ±¾µÄ°üÌæ»»µô 2.1 ¼´¿É£¬ nutch ºÍ hive ¶¼ÄÜÕý³£ÔËÐС£

Èý¡¢           Ó¦Óó¡¾°

        ±¾ÎÄÖ÷Òª½²ÊöʹÓà hive µÄʵ¼ù£¬ÒµÎñ²»Ê**ؼü£¬¼òÒª½éÉÜÒµÎñ³¡¾°£¬±¾´ÎµÄÈÎÎñÊǶÔËÑË÷ÈÕÖ¾Êý¾Ý½øÐÐͳ¼Æ·ÖÎö¡£

        ¼¯ÍÅËÑË÷¸ÕÉÏÏß²»¾Ã£¬ÈÕÖ¾Á¿²¢²»´ó ¡£ÕâЩÈÕÖ¾·Ö²¼ÔÚ 5 ̨ǰ¶Ë»ú£¬°´Ð¡Ê±±£´æ£¬²¢ÒÔСʱΪÖÜÆÚ¶¨Ê±½«ÉÏһСʱ²úÉúµÄÊý¾Ýͬ²½µ½ÈÕÖ¾·ÖÎö»ú£¬Í³¼ÆÊý¾ÝÒªÇó°´Ð¡Ê±¸üС£ÕâЩͳ¼ÆÏ°üÀ¨¹Ø¼ü´ÊËÑË÷Á¿ pv £¬Àà±ð·ÃÎÊÁ¿£¬Ã¿Ãë·ÃÎÊÁ¿ tps µÈµÈ¡£

»ùÓÚ hive £¬ÎÒÃǽ«ÕâЩÊý¾Ý°´ÌìΪµ¥Î»½¨±í£¬Ã¿ÌìÒ»¸ö±í£¬ºǫ́½Å±¾¸ù¾Ýʱ¼ä´Á½«Ã¿Ð¡Ê±Í¬²½¹ýÀ´µÄ 5 ̨ǰ¶Ë»úµÄÈÕÖ¾Êý¾ÝºÏ²¢³ÉÒ»¸öÈÕÖ¾Îļþ£¬µ¼Èë hive ϵͳ£¬Ã¿Ð¡Ê±Í¬²½µÄÈÕÖ¾Êý¾Ý±»×·¼Óµ½µ±ÌìÊý¾Ý±íÖУ¬µ¼ÈëÍê³Éºó£¬µ±Ìì¸÷Ïîͳ¼ÆÏ±»ÖØÐ¼ÆËã²¢Êä³öͳ¼Æ½á¹û¡£

        ÒÔÉÏÐèÇóÈôÖ±½Ó»ùÓÚ hadoop ¿ª·¢£¬ÐèÒª×ÔÐйÜÀíÊý¾Ý£¬Õë¶Ô¶à¸öͳ¼ÆÐèÇ󿪷¢²»Í¬µÄ map/reduce ÔËËãÈÎÎñ£¬¶ÔºÏ²¢¡¢ÅÅÐòµÈ¶àÏî²Ù×÷½øÐж¨ÖÆ£¬²¢¼ì²âÈÎÎñÔËÐÐ״̬£¬¹¤×÷Á¿²¢²»Ð¡¡£µ«Ê¹Óà hive £¬´Óµ¼Èëµ½·ÖÎö¡¢ÅÅÐò¡¢È¥ÖØ¡¢½á¹ûÊä³ö£¬ÕâЩ²Ù×÷¶¼¿ÉÒÔÔËÓà hql Óï¾äÀ´½â¾ö£¬Ò»ÌõÓï¾ä¾­¹ý´¦Àí±»½âÎö³É¼¸¸öÈÎÎñÀ´ÔËÐУ¬¼´Ê¹Ê**ؼü´Ê·ÃÎÊÁ¿ÔöÁ¿ÕâÖÖÐèҪͬʱ·ÃÎʶàÌìÊý¾ÝµÄ½ÏΪ¸´ÔÓµÄÐèÇóÒ²ÄÜͨ¹ý±í¹ØÁªÕâÑùµÄÓï¾ä×Ô¶¯Íê³É£¬½ÚÊ¡ÁË´óÁ¿¹¤×÷Á¿¡£

ËÄ¡¢           Hive ʵս

        ³õ´ÎʹÓà hive £¬Ó¦¸Ã˵ÉÏÊÖ»¹ÊÇͦ¿ìµÄ¡£ Hive ÌṩµÄÀà SQL Óï¾äÓë mysql Óï¾ä¼«ÎªÏàËÆ£¬Óï·¨ÉÏÓдóÁ¿ÏàͬµÄµØ·½£¬Õâ¸øÎÒÃÇÉÏÊÖ´øÀ´Á˺ܴóµÄ·½±ã£¬µ«ÊÇÒªµÃÐÄÓ¦ÊÖµØÐ´ºÃÕâЩÓï¾ä£¬»¹ÐèÒª¶Ô hive ÓнϺõÄÁ˽⣬²ÅÄܽáºÏ hive ÌØÉ«Ð´³ö¾«ÃîµÄÓï¾ä¡£

        ¹ØÓÚ hive ÓïÑÔµÄÏêϸÓï·¨¿É²Î¿¼¹Ù·½ wiki µÄÓïÑÔÊÖ²á :

        http://wiki.apache.org/hadoop/Hive/LanguageManual

        ËäÈ»Óï·¨·ç¸ñΪÎÒÃÇÌṩÁ˱ãÀû£¬µ«³õ´ÎʹÓÃÓöµ½µÄÎÊÌ⻹ÊDz»Éٵģ¬ÏÂÃæÕë¶ÔÒµÎñ³¡¾°Ì¸Ì¸ÎÒÃÇÓöµ½µÄÎÊÌ⣬ºÍ¶Ô hive ¹¦Ä͍ܵ֯¡£

1¡¢ ·Ö¸ô·ûÎÊÌâ

                Ê×ÏÈÓöµ½µÄÊÇÈÕÖ¾Êý¾ÝµÄ·Ö¸ô·ûÎÊÌ⣬ÎÒÃǵÄÈÕÖ¾Êý¾ÝµÄ´óÖ¸ñʽÈçÏ£º

2010-05-24 00:00:02@$_$@QQ2010@$_$@all@$_$@NOKIA_1681C@$_$@1@$_$@10@$_$@@$_$@-1@$_$@10@$_$@application@$_$@1

        ´Ó¸ñʽ¿É¼ûÆä·Ö¸ô·ûÊÇ¡° @$_$@ ¡±£¬ÕâÊÇΪÁ˾¡¿ÉÄÜ·ÀÖ¹ÈÕÖ¾ÕýÎijöÏÖÓë·Ö¸ô·ûÏàͬµÄ×Ö·û¶øµ¼ÖÂÊý¾Ý»ìÏý¡£±¾À´ hive Ö§³ÖÔÚ½¨±íµÄʱºòÖ¸¶¨×Ô¶¨Òå·Ö¸ô·ûµÄ£¬µ«¾­¹ý¶à´Î²âÊÔ·¢ÏÖÖ»Ö§³Öµ¥¸ö×Ö·ûµÄ×Ô¶¨Òå·Ö¸ô·û£¬Ïñ¡° @$_$@ ¡±ÕâÑùµÄ·Ö¸ô·ûÊDz»Äܱ»Ö§³ÖµÄ£¬µ«ÊÇÎÒÃÇ¿ÉÒÔͨ¹ý¶Ô·Ö¸ô·ûµÄ¶¨Öƽâ¾öÕâ¸öÎÊÌ⣬ hive µÄÄÚ²¿·Ö¸ô·ûÊÇ¡° \001 ¡±£¬Ö»Òª°Ñ·Ö¸ô·ûÌæ»»³É¡° \001 ¡±¼´¿É¡£

¾­¹ý̽Ë÷ÎÒÃÇ·¢ÏÖÓÐÁ½Ìõ;¾¶½â¾öÕâ¸öÎÊÌâ¡£

a)          ×Ô¶¨Òå outputformat ºÍ inputformat ¡£

        Hive µÄ outputformat/inputformat Óë hadoop µÄ outputformat/inputformat Ï൱ÀàËÆ£¬ inputformat ¸ºÔð°ÑÊäÈëÊý¾Ý½øÐиñʽ»¯£¬È»ºóÌṩ¸ø hive £¬ outputformat ¸ºÔð°Ñ hive Êä³öµÄÊý¾ÝÖØÐ¸ñʽ»¯³ÉÄ¿±ê¸ñʽÔÙÊä³öµ½Îļþ£¬ÕâÖÖ¶Ô¸ñʽ½øÐж¨ÖƵķ½Ê½½ÏΪµ×²ã£¬¶ÔÆä½øÐж¨ÖÆÒ²Ïà¶Ô¼òµ¥£¬ÖØÐ´ InputFormat ÖÐ RecordReader ÀàÖÐµÄ next ·½·¨¼´¿É£¬Ê¾Àý´úÂëÈçÏ£º

    public boolean next(LongWritable key, BytesWritable value)

        throws IOException {

        while ( reader .next(key, text ) ) {

        String strReplace = text .toString().toLowerCase().replace( "@$_$@" , "\001" );

        Text txtReplace = new Text();

        txtReplace.set(strReplace );

        value.set(txtReplace.getBytes(), 0, txtReplace.getLength());

        return true ;

      }

         return false ;

}

        ÖØÐ´ HiveIgnoreKeyTextOutputFormat ÖÐ RecordWriter ÖÐµÄ write ·½·¨£¬Ê¾Àý´úÂëÈçÏ£º

    public void write (Writable w) throws IOException {

      String strReplace = ((Text)w).toString().replace( "\001" , "@$_$@" );

      Text txtReplace = new Text();

      txtReplace.set(strReplace);

      byte [] output = txtReplace.getBytes();

      bytesWritable .set(output, 0, output. length );

      writer .write( bytesWritable );

}

        ×Ô¶¨Òå outputformat/inputformat ºó£¬ÔÚ½¨±íʱÐèÒªÖ¸¶¨ outputformat/inputformat £¬ÈçÏÂʾÀý£º

stored as INPUTFORMAT 'com.aspire.search.loganalysis.hive.SearchLogInputFormat' OUTPUTFORMAT 'com.aspire.search.loganalysis.hive.SearchLogOutputFormat'

b)          ͨ¹ý SerDe(serialize/deserialize) £¬ÔÚÊý¾ÝÐòÁл¯ºÍ·´ÐòÁл¯Ê±¸ñʽ»¯Êý¾Ý¡£

ÕâÖÖ·½Ê½ÉÔ΢¸´ÔÓÒ»µã£¬¶ÔÊý¾ÝµÄ¿ØÖÆÄÜÁ¦Ò²ÒªÈõһЩ£¬ËüʹÓÃÕýÔò±í´ïʽÀ´Æ¥ÅäºÍ´¦ÀíÊý¾Ý£¬ÐÔÄÜÒ²»áÓÐËùÓ°Ïì¡£µ«ËüµÄÓŵãÊÇ¿ÉÒÔ×Ô¶¨Òå±íÊôÐÔÐÅÏ¢ SERDEPROPERTIES £¬ÔÚ SerDe ÖÐͨ¹ýÕâЩÊôÐÔÐÅÏ¢¿ÉÒÔÓиü¶àµÄ¶¨ÖÆÐÐΪ¡£

2¡¢ Êý¾Ýµ¼Èëµ¼³ö

a)          ¶à°æ±¾ÈÕÖ¾¸ñʽµÄ¼æÈÝ

        ÓÉÓÚ hive µÄÓ¦Óó¡¾°Ö÷ÒªÊÇ´¦ÀíÀäÊý¾Ý£¨Ö»¶Á²»Ð´£©£¬Òò´ËËüÖ»Ö§³ÖÅúÁ¿µ¼ÈëºÍµ¼³öÊý¾Ý£¬²¢²»Ö§³Öµ¥ÌõÊý¾ÝµÄдÈë»ò¸üУ¬ËùÒÔÈç¹ûÒªµ¼ÈëµÄÊý¾Ý´æÔÚijЩ²»Ì«¹æ·¶µÄÐУ¬ÔòÐèÒªÎÒÃǶ¨ÖÆÒ»Ð©À©Õ¹¹¦ÄÜ¶ÔÆä½øÐд¦Àí¡£

        ÎÒÃÇÐèÒª´¦ÀíµÄÈÕÖ¾Êý¾Ý´æÔÚ¶à¸ö°æ±¾£¬¸÷¸ö°æ±¾Ã¿¸ö×ֶεÄÊý¾ÝÄÚÈÝ´æÔÚһЩ²îÒ죬¿ÉÄܰ汾 A ÈÕÖ¾Êý¾ÝµÄµÚ¶þ¸öÁÐÊÇËÑË÷¹Ø¼ü×Ö£¬µ«°æ±¾ B µÄµÚ¶þÁÐÈ´ÊÇËÑË÷µÄÖÕ¶ËÀàÐÍ£¬Èç¹ûÕâÁ½¸ö°æ±¾µÄÈÕÖ¾Ö±½Óµ¼Èë hive ÖУ¬ºÜÃ÷ÏÔÊý¾Ý½«»á»ìÂÒ£¬Í³¼Æ½á¹ûÒ²²»»áÕýÈ·¡£ÎÒÃǵÄÈÎÎñÊÇҪʹ¶à¸ö°æ±¾µÄÈÕÖ¾Êý¾ÝÄÜÔÚ hive Êý¾Ý²Ö¿â****´æ£¬ÇÒ±íµÄ input/output ²Ù×÷Äܹ»×îÖÕÓ³Éäµ½ÕýÈ·µÄÈÕÖ¾°æ±¾µÄÕýÈ·×ֶΡ£

        ÕâÀïÎÒÃDz»¹ØÐÄÕⲿ·Ö·±ËöµÄ¹¤×÷£¬Ö»¹ØÐļ¼ÊõʵÏֵĹؼüµã£¬Õâ¸ö¹¦ÄܸÃÔÚÄÄÀïʵÏÖ²ÅÄÜÈà hive ÈϵÃÕâЩ²»Í¬¸ñʽµÄÊý¾ÝÄØ£¿¾­¹ý¶à·½³¢ÊÔ£¬ÔÚÖмäÈκλ·½Ú×öÕâ¸ö°æ±¾ÊÊÅä¶¼½«µ¼Ö¸´ÔÓ»¯£¬×îÖÕÕâ¸ö¹¤×÷»¹ÊÇÔÚ inputformat/outputformat ÖÐÍê³É×îΪÓÅÑÅ£¬±Ï¾¹ inputformat ÊÇÔ´Í·£¬ outputformat ÊÇ×îÖÕ¹éËÞ¡£¾ßÌåÀ´Ëµ£¬ÊÇÔÚÇ°ÃæÌáµ½µÄ inputformat µÄ next ·½·¨ÖкÍÔÚ outputformat µÄ write ·½·¨ÖÐÍê³ÉÕâ¸öÊÊÅ乤×÷¡£

b)          Hive ²Ù×÷±¾µØÊý¾Ý

        Ò»¿ªÊ¼£¬×ÜÊǰѱ¾µØÊý¾ÝÏÈ´«µ½ HDFS £¬ÔÙÓÉ hive ²Ù×÷ hdfs ÉϵÄÊý¾Ý£¬È»ºóÔÙ°ÑÊý¾Ý´Ó HDFS ÉÏ´«»Ø±¾µØÊý¾Ý¡£ºóÀ´·¢ÏÖ´ó¿É²»±ØÈç´Ë£¬ hive Óï¾ä¶¼ÌṩÁË¡° local ¡±¹Ø¼ü×Ö£¬Ö§³ÖÖ±½Ó´Ó±¾µØµ¼ÈëÊý¾Ýµ½ hive £¬Ò²ÄÜ´Ó hive Ö±½Óµ¼³öÊý¾Ýµ½±¾µØ£¬²»¹ýÆäÄÚ²¿¼ÆËãʱµ±È»ÊÇÓà HDFS ÉϵÄÊý¾Ý£¬Ö»ÊÇ×Ô¶¯ÎªÎÒÃÇÍê³Éµ¼Èëµ¼³ö¶øÒÑ¡£

3¡¢ Êý¾Ý´¦Àí

ÈÕÖ¾Êý¾ÝµÄͳ¼Æ´¦ÀíÔÚÕâÀï·´µ¹Ã»ÓÐÊ²Ã´ÌØ±ðÖ®´¦£¬¾ÍÊÇһЩ SQL Óï¾ä¶øÒÑ£¬Ò²Ã»ÓÐʲô¸ßÉîµÄ¼¼ÇÉ£¬²»¹ý»¹ÊÇÁоÙһЩÓï¾äʾÀý£¬ÒÔʾ hive ´¦ÀíÊý¾ÝµÄ·½±ãÖ®´¦£¬²¢Õ¹Ê¾ hive µÄһЩÓ÷¨¡£

a)          Ϊ hive Ìí¼ÓÓû§¶¨Öƹ¦ÄÜ£¬×Ô¶¨Ò幦Äܶ¼Î»ÓÚ hive_contrib.jar °üÖÐ

add jar /opt/hadoop/hive-0.5.0-bin/lib/hive_contrib.jar;

b)          ͳ¼ÆÃ¿¸ö¹Ø¼ü´ÊµÄËÑË÷Á¿£¬²¢°´ËÑË÷Á¿½µÐòÅÅÁУ¬È»ºó°Ñ½á¹û´æÈë±í keyword_20100603 ÖÐ

create table keyword_20100603 as select keyword,count(keyword) as count from searchlog_20100603 group by keyword order by count desc;

c)          ͳ¼ÆÃ¿ÀàÓû§Öն˵ÄËÑË÷Á¿£¬²¢°´ËÑË÷Á¿½µÐòÅÅÁУ¬È»ºó°Ñ½á¹û´æÈë±í device_20100603 ÖÐ

create table device_20100603 as select device,count(device) as count from searchlog_20100603 group by device order by count desc;

d)          ´´½¨±í time_20100603 £¬Ê¹ÓÃ×Ô¶¨ÒåµÄ INPUTFORMAT ºÍ OUTPUTFORMAT £¬²¢Ö¸¶¨±íÊý¾ÝµÄÕæÊµ´æ·ÅλÖÃÔÚ '/LogAnalysis/results/time_20100603' £¨ HDFS ·¾¶£©£¬¶ø²»ÊÇ·ÅÔÚ hive ×Ô¼ºµÄÊý¾ÝĿ¼ÖÐ

create external table if not exists time_20100603(time string, count int) stored as INPUTFORMAT 'com.aspire.search.loganalysis.hive.XmlResultInputFormat' OUTPUTFORMAT 'com.aspire.search.loganalysis.hive.XmlResultOutputFormat' LOCATION '/LogAnalysis/results/time_20100603';

e)          ͳ¼ÆÃ¿Ãë·ÃÎÊÁ¿ TPS £¬°´·ÃÎÊÁ¿½µÐòÅÅÁУ¬²¢°Ñ½á¹ûÊä³öµ½±í time_20100603 ÖУ¬Õâ¸ö±íÎÒÃÇÔÚÉÏÃæ¸Õ¸Õ¶¨Òå¹ý£¬ÆäÕæÊµÎ»ÖÃÔÚ '/LogAnalysis/results/time_20100603' £¬²¢ÇÒÓÉÓÚ XmlResultOutputFormat µÄ¸ñʽ»¯£¬ÎļþÄÚÈÝÊÇ XML ¸ñʽ¡£

insert overwrite table time_20100603 select time,count(time) as count from searchlog_20100603 group by time order by count desc;

f)           ¼ÆËãÿ¸öËÑË÷ÇëÇóÏìӦʱ¼äµÄ×î´óÖµ£¬×îСֵºÍƽ¾ùÖµ

insert overwrite table response_20100603 select max(responsetime) as max,min(responsetime) as min,avg(responsetime) as avg from searchlog_20100603;

g)          ´´½¨Ò»¸ö±íÓÃÓÚ´æ·Å½ñÌìÓë×òÌìµÄ¹Ø¼ü´ÊËÑË÷Á¿ºÍÔöÁ¿¼°ÆäÔöÁ¿±ÈÂÊ£¬±íÊý¾ÝλÓÚ '/LogAnalysis/results/keyword_20100604_20100603' £¬ÄÚÈݽ«ÊÇ XML ¸ñʽ¡£

create external table if not exists keyword_20100604_20100603(keyword string, count int, increment int, incrementrate double) stored as INPUTFORMAT 'com.aspire.search.loganalysis.hive.XmlResultInputFormat' OUTPUTFORMAT 'com.aspire.search.loganalysis.hive.XmlResultOutputFormat' LOCATION '/LogAnalysis/results/keyword_20100604_20100603';

h)          ÉèÖñíµÄÊôÐÔ£¬ÒÔ±ã XmlResultInputFormat ºÍ XmlResultOutputFormat Äܸù¾Ý output.resulttype µÄ²»Í¬ÄÚÈÝÊä³ö²»Í¬¸ñʽµÄ XML Îļþ¡£

alter table keyword_20100604_20100603 set tblproperties ('output.resulttype'='keyword');

i)            ¹ØÁª½ñÌì¹Ø¼ü´Êͳ¼Æ½á¹û±í£¨ keyword_20100604 £©Óë×òÌì¹Ø¼ü´Êͳ¼Æ½á¹û±í£¨ keyword_20100603 £©£¬Í³¼Æ½ñÌìÓë×òÌìͬʱ³öÏֵĹؼü´ÊµÄËÑË÷´ÎÊý£¬½ñÌìÏà¶Ô×òÌìµÄÔöÁ¿ºÍÔöÁ¿±ÈÂÊ£¬²¢°´ÔöÁ¿±ÈÂʽµÐòÅÅÁУ¬½á¹ûÊä³öµ½¸Õ¸Õ¶¨ÒåµÄ keyword_20100604_20100603 ±íÖУ¬ÆäÊý¾ÝÎļþÄÚÈݽ«Îª XML ¸ñʽ¡£

insert overwrite table keyword_20100604_20100603 select cur.keyword, cur.count, cur.count-yes.count as increment, (cur.count-yes.count)/yes.count as incrementrate from keyword_20100604 cur join keyword_20100603 yes on (cur.keyword = yes.keyword) order by incrementrate desc;

j)             

 

4¡¢ Óû§×Ô¶¨Ò庯Êý UDF

²¿·Öͳ¼Æ½á¹ûÐèÒªÒÔ CSV µÄ¸ñʽÊä³ö£¬¶ÔÓÚÕâÀàÎļþÌåÈ«ÊÇÓÐЧÄÚÈݵÄÎļþ£¬²»ÐèÒªÏñ XML Ò»Ñù°üº¬ version £¬ encoding µÈÐÅÏ¢µÄÎļþÍ·£¬×îÊʺÏÓà UDF(user define function) ÁË¡£

UDF º¯Êý¿ÉÖ±½ÓÓ¦ÓÃÓÚ select Óï¾ä£¬¶Ô²éѯ½á¹¹×ö¸ñʽ»¯´¦ÀíÖ®ºó£¬ÔÙÊä³öÄÚÈÝ¡£×Ô¶¨Òå UDF ÐèÒª¼Ì³Ð org.apache.hadoop.hive.ql.exec.UDF £¬²¢ÊµÏÖ evaluate º¯Êý£¬ Evaluate º¯ÊýÖ§³ÖÖØÔØ£¬»¹Ö§³Ö¿É±ä²ÎÊý¡£ÎÒÃÇʵÏÖÁËÒ»¸öÖ§³Ö¿É±ä×Ö·û´®²ÎÊýµÄ UDF £¬Ö§³Ö°Ñ select µÃ³öµÄÈÎÒâ¸öÊýµÄ²»Í¬ÀàÐÍÊý¾Ýת»»Îª×Ö·û´®ºó£¬°´ CSV ¸ñʽÊä³ö£¬ÓÉÓÚ´úÂë½Ï¼òµ¥£¬ÕâÀï¸ø³öÔ´ÂëʾÀý£º

    public String evaluate(String... strs) {

       StringBuilder sb = new StringBuilder();

       for ( int i = 0; i < strs. length ; i++) {

           sb.append(ConvertCSVField(strs[i])).append( ',' );

       }

       sb.deleteCharAt(sb.length()-1);

       return sb.toString();

}

         ÐèҪעÒâµÄÊÇ£¬ÒªÊ¹Óà UDF ¹¦ÄÜ£¬³ýÁËʵÏÖ×Ô¶¨Òå UDF Í⣬»¹ÐèÒª¼ÓÈë°üº¬ UDF µÄ°ü£¬Ê¾Àý£º

add jar /opt/hadoop/hive-0.5.0-bin/lib/hive_contrib.jar;

È»ºó´´½¨ÁÙʱ·½·¨£¬Ê¾Àý£º

CREATE TEMPORARY FUNCTION Result2CSv AS ¡®com.aspire.search.loganalysis.hive. Result2CSv';

         ʹÓÃÍê±Ï»¹Òª drop ·½·¨£¬Ê¾Àý£º

DROP TEMPORARY FUNCTION Result2CSv;

5¡¢   Êä³ö XML ¸ñʽµÄͳ¼Æ½á¹û

Ç°Ãæ¿´µ½²¿·ÖÈÕ־ͳ¼Æ½á¹ûÊä³öµ½Ò»¸ö±íÖУ¬½èÖú XmlResultInputFormat ºÍ XmlResultOutputFormat ¸ñʽ»¯³É XML Îļþ£¬¿¼Âǵ½´´½¨Õâ¸ö±íÖ»ÊÇΪÁ˵õ½ XML ¸ñʽµÄÊä³öÊý¾Ý£¬ÎÒÃÇÖ»ÐèʵÏÖ XmlResultOutputFormat ¼´¿É£¬Èç¹û»¹ÒªÖ§³Ö select ²éѯ£¬ÔòÎÒÃÇ»¹ÐèҪʵÏÖ XmlResultInputFormat £¬ÕâÀïÎÒÃÇÖ»½éÉÜ XmlResultOutputFormat ¡£

Ç°Ãæ½éÉܹý£¬¶¨ÖÆ XmlResultOutputFormat ÎÒÃÇÖ»ÐèÖØÐ´ write ¼´¿É£¬Õâ¸ö·½·¨½«»á°Ñ hive µÄÒÔ ¡¯\001¡¯ ·Ö¸ôµÄ¶à×Ö¶ÎÊý¾Ý¸ñʽ»¯ÎªÎÒÃÇÐèÒªµÄ XML ¸ñʽ£¬±»¼ò»¯µÄʾÀý´úÂëÈçÏ£º

    public void write(Writable w) throws IOException {

           String[] strFields = ((Text) w).toString().split( "\001" );

           StringBuffer sbXml = new StringBuffer();

           if ( strResultType .equals( "keyword" )) {

    sbXml.append( "<record><keyword>" ).append(strFields[0]).append(

    "</keyword><count>" ).append(strFields[1]).append(           "</count><increment>" ).append(strFields[2]).append(

    "</increment><rate>" ).append(strFields[3]).append(

"</rate></result>" );

           }

           Text txtXml = new Text();

           byte [] strBytes = sbXml.toString().getBytes( "utf-8" );

           txtXml.set(strBytes, 0, strBytes. length );

           byte [] output = txtXml.getBytes();

           bytesWritable .set(output, 0, output. length );

           writer .write( bytesWritable );

    }

        ÆäÖÐµÄ strResultType .equals( "keyword" ) Ö¸¶¨¹Ø¼ü´Êͳ¼Æ½á¹û£¬Õâ¸öÊôÐÔÀ´×ÔÒÔÏÂÓï¾ä¶Ô½á¹ûÀàÐ͵ÄÖ¸¶¨£¬Í¨¹ýÕâ¸öÊôÐÔÎÒÃÇ»¹¿ÉÒÔÓÃͬһ¸ö outputformat Êä³ö¶àÖÖÀàÐ͵Ľá¹û¡£

        alter table keyword_20100604_20100603 set tblproperties ('output.resulttype'='keyword');

        ×Ðϸ¿´¿´ write º¯ÊýµÄʵÏÖ±ã¿É·¢ÏÖ£¬ÆäʵÕâÀïÖ»Êä³öÁË XML ÎļþµÄÕýÎÄ£¬¶ø XML µÄÎļþÍ·ºÍ½áÊø±êÇ©ÔÚÄÄÀïÊä³öÄØ£¿ËùÐÒÎÒÃDzÉÓõÄÊÇ»ùÓÚ outputformat µÄʵÏÖ£¬ÎÒÃÇ¿ÉÒÔÔÚ¹¹Ô캯ÊýÊä³ö version £¬ encoding µÈÎļþÍ·ÐÅÏ¢£¬ÔÚ close() ·½·¨ÖÐÊä³ö½áÊø±êÇ©¡£

        ÕâÒ²ÊÇÎÒÃÇΪʲô²»Ê¹Óà UDF À´Êä³ö½á¹ûµÄÔ­Òò£¬×Ô¶¨Òå UDF º¯Êý²»ÄÜÊä³öÎļþÍ·ºÍÎļþ⣬¶ÔÓÚ XML ¸ñʽµÄÊý¾ÝÎÞ·¨Êä³öÍêÕû¸ñʽ£¬Ö»ÄÜÊä³ö CSV ÕâÀàËùÓÐÐж¼ÊÇÓÐЧÊý¾ÝµÄÎļþ¡£

Îå¡¢           ×ܽá

        Hive ÊÇÒ»¸ö¿ÉÀ©Õ¹ÐÔ¼«Ç¿µÄÊý¾Ý²Ö¿â¹¤¾ß£¬½èÖúÓÚ hadoop ·Ö²¼Ê½´æ´¢¼ÆËãÆ½Ì¨ºÍ hive ¶Ô SQL Óï¾äµÄÀí½âÄÜÁ¦£¬ÎÒÃÇËùÒª×öµÄ´ó²¿·Ö¹¤×÷¾ÍÊÇÊäÈëºÍÊä³öÊý¾ÝµÄÊÊÅ䣬ǡǡÕâÁ½²¿·Ö IO ¸ñʽÊÇǧ±äÍò»¯µÄ£¬ÎÒÃÇÖ»ÐèÒª¶¨ÖÆÎÒÃÇ×Ô¼ºµÄÊäÈëÊä³öÊÊÅäÆ÷£¬ hive ½«ÎªÎÒÃÇ͸Ã÷»¯´æ´¢ºÍ´¦ÀíÕâЩÊý¾Ý£¬´ó´ó¼ò»¯ÎÒÃǵŤ×÷¡£±¾ÎĵÄÖØÐÄÒ²ÕýÔÚÓÚ´Ë£¬Õⲿ·Ö¹¤×÷ÏàÐÅÿһ¸ö×öÊý¾Ý·ÖÎöµÄÅóÓѶ¼»áÃæ¶ÔµÄ£¬Ï£Íû¶ÔÄúÓÐÒæ¡£

        ±¾ÎĽéÉÜÁËÒ»´ÎÏ൱¼òµ¥µÄ»ùÓÚ hive µÄÈÕ־ͳ¼ÆÊµÕ½£¬¶Ô hive µÄÔËÓû¹´¦ÓÚÒ»¸öÏà¶Ô½ÏdzµÄ²ãÃæ£¬Ä¿Ç°ÉÐÄÜÂú×ãÐèÇó¡£¶ÔÓÚһЩ½Ï¸´ÔÓµÄÊý¾Ý·ÖÎöÈÎÎñ£¬ÒÔÉÏËù½éÉܵľ­ÑéºÜ¿ÉÄÜÊDz»¹»Óõģ¬ÉõÖÁÊÇ hive ×ö²»µ½µÄ£¬ hive »¹ÓÐºÜ¶à½ø½×¹¦ÄÜ£¬ÏÞÓÚÆª·ù±¾ÎÄδÄÜÉæ¼°£¬´ýÈÕºó½áºÏ¾ßÌåÈÎÎñÔÙÏêϸ²ûÊö¡£

        ÈçÄú¶Ô±¾ÎÄÓÐÈκν¨Òé»òÖ¸½Ì£¬ÇëÆÀÂÛ£¬Ð»Ð»¡£


ת×Ô£ºhttp://blog.csdn.net/sutine/archive/2010/06/07/5653137.aspx

 
 
  • ±êÇ©£ºhive 
  • ·¢±íÆÀÂÛ£º
    ÔØÈëÖС£¡£¡£

     
     
     

    ÃÎÏè¶ùÍøÕ¾ ÃηÉÏèµÄµØ·½ http://www.dreamflier.net
    ÖлªÈËÃñ¹²ºÍ¹úÐÅÏ¢²úÒµ²¿TCP/IPϵͳ ±¸°¸ÐòºÅ£ºÁÉICP±¸09000550ºÅ

    Powered by Oblog.