ÔØÈëÖС£¡£¡£ 'S bLog
 
ÔØÈëÖС£¡£¡£
 
ÔØÈëÖС£¡£¡£
ÔØÈëÖС£¡£¡£
ÔØÈëÖС£¡£¡£
ÔØÈëÖС£¡£¡£
ÔØÈëÖС£¡£¡£
 
ÌîдÄúµÄÓʼþµØÖ·£¬¶©ÔÄÎÒÃǵľ«²ÊÄÚÈÝ£º


 
SMAQ£ºº£Á¿Êý¾ÝµÄ´æ´¢¼ÆËãºÍ²éѯģÐÍ(Òë)
[ 2011/3/24 13:14:00 | By: ÃÎÏè¶ù ]
 

º£Á¿Êý¾Ý(¡°Big Data¡±)ÊÇÖ¸ÄÇЩ×ã¹»´óµÄÊý¾Ý£¬ÒÔÖÁÓÚÎÞ·¨ÔÙʹÓô«Í³µÄ·½·¨½øÐд¦Àí¡£ÔÚ¹ýÈ¥£¬Ò»Ö±ÊÇWebËÑË÷ÒýÇæµÄ´´½¨ÕßÃÇÊ×µ±Æä³åµÄÃæ¶ÔÕâ¸öÎÊÌâ¡£¶ø½ñÌ죬¸÷ÖÖÉç½»ÍøÂç£¬ÒÆ¶¯Ó¦ÓÃÒÔ¼°¸÷ÖÖ´«¸ÐÆ÷ºÍ¿ÆÑ§ÁìÓòÿÌì´´½¨×ÅÉÏPBµÄÊý¾Ý¡£ ÎªÁËÓ¦¶ÔÕâÖÖ´ó¹æÄ£Êý¾Ý´¦ÀíµÄÌôÕ½£¬google´´ÔìÁËMapReduce¡£GoogleµÄ¹¤×÷ÒÔ¼°yahoo´´½¨µÄHadoop·õ»¯³öÒ»¸öÍêÕûµÄº£Á¿Êý¾Ý´¦Àí¹¤¾ßµÄÉú̬ϵͳ¡£ 

Ëæ×ÅMapReduceµÄÁ÷ÐУ¬Ò»¸öÓÉÊý¾Ý´æ´¢²ã£¬MapReduceºÍ²éѯ(¼ò³ÆSMAQ)×é³ÉµÄº£Á¿Êý¾Ý´¦ÀíµÄջʽģÐÍÒ²Öð½¥Õ¹ÏÖ³öÀ´¡£SMAQϵͳͨ³£ÊÇ¿ªÔ´µÄ£¬·Ö²¼Ê½µÄ£¬ÔËÐÐÔÚÆÕͨӲ¼þÉÏ¡£

 

¾ÍÏñÓÉLinux, Apache, MySQL and PHP ×é³ÉµÄLAMP¸Ä±äÁË»¥ÁªÍøÓ¦Óÿª·¢ÁìÓòÒ»Ñù£¬SMAQ½«»á°Ñº£Á¿Êý¾Ý´¦Àí´øÈëÒ»¸ö¸ü¹ãÀ«µÄÌìµØ¡£ÕýÈçLAMP³ÉΪWeb2.0µÄ¹Ø¼üÍÆ¶¯ÕßÒ»Ñù£¬SMAQϵͳ½«Ö§³ÅÆðÒ»¸ö´´ÐµÄÒÔÊý¾ÝΪÇý¶¯µÄ²úÆ·ºÍ·þÎñµÄÐÂʱ´ú¡£ 

¾¡¹Ü»ùÓÚHadoopµÄ¼Ü¹¹Õ¼¾ÝÁËÖ÷µ¼µØÎ»£¬µ«ÊÇSMAQÄ£ÐÍÒ²°üº¬´óÁ¿µÄÆäËûϵͳ£¬°üÀ¨Ö÷Á÷µÄNoSQLÊý¾Ý¿â¡£ÕâÆªÎÄÕÂÃèÊöÁËSMAQջʽģÐÍÒÔ¼°½ñÌìÄÇЩ¿ÉÒÔ°üÀ¨ÔÚÕâ¸öÄ£ÐÍϵĺ£Á¿Êý¾Ý´¦Àí¹¤¾ß¡£ 

MapReduce

MapReduceÊÇgoogleΪ´´½¨webÍøÒ³Ë÷Òý¶ø´´½¨µÄ¡£MapReduce¿ò¼ÜÒѳÉΪ½ñÌì´ó¶àÊýº£Á¿Êý¾Ý´¦ÀíµÄ³§·¿¡£MapReduceµÄ¹Ø¼üÔÚÓÚ£¬½«ÔÚÊý¾Ý¼¯ºÏÉϵÄÒ»¸ö²éѯ½øÐл®·Ö£¬È»ºóÔÚ¶à¸ö½ÚµãÉϲ¢ÐÐÖ´ÐС£ÕâÖÖ·Ö²¼Ê½Ä£Ê½½â¾öÁËÊý¾ÝÌ«´óÒÔÖÁÓÚÎÞ·¨´æ·ÅÔÚµ¥¶Àһ̨»úÆ÷ÉϵÄÄÑÌâ¡£

         ΪÁËÀí½âMapReduceÊÇÈçºÎ¹¤×÷µÄ£¬ÎÒÃÇÊ×ÏÈ¿´ËüÃû×ÖËùÌåÏÖ³öµÄÁ½¸ö¹ý³Ì¡£Ê×ÏÈÔÚmap½×¶Î£¬ÊäÈëÊý¾Ý±»Ò»ÏîÒ»ÏîµÄ´¦Àí£¬×ª»»³ÉÒ»¸öÖмä½á¹û¼¯£¬È»ºóÔÚreduce½×¶Î£¬ÕâЩÖмä½á¹ûÓÖ±»¹æÔ¼²úÉúÒ»¸öÎÒÃÇËùÆÚÍûµÃµ½µÄ¹éÄɽá¹û¡£

 

˵µ½MapReduce£¬Í¨³£Òª¾ÙµÄÒ»¸öÀý×Ó¾ÍÊDzéÕÒһƪÎĵµÖв»Í¬µ¥´ÊµÄ³öÏÖ¸öÊý¡£ÔÚmap½×¶Îµ¥´Ê±»³é³öÀ´£¬È»ºó¸ø¸öcountÖµ1£¬ÔÚreduce½Úµã£¬½«ÏàͬµÄµ¥´ÊµÄcountÖµÀÛ¼ÓÆðÀ´¡£ 

¿´ÆðÀ´ÊDz»Êǽ«Ò»¸öºÜ¼òµ¥µÄ¹¤×÷¸ãµØºÜ¸´ÔÓÁË£¬Õâ¾ÍÊÇMapReduce¡£ÎªÁËÈÃMapReduceÍê³ÉÕâÏîÈÎÎñ£¬mapºÍreduce½×¶Î±ØÐë×ñÊØÒ»¶¨µÄÏÞÖÆÀ´Ê¹µÃ¹¤×÷¿ÉÒÔ²¢Ðл¯¡£½«²éѯÇëÇóת»»ÎªÒ»¸ö»òÕß¶à¸öMapReduce²¢²»ÊÇÒ»¸öÖ±¹ÛµÄ¹ý³Ì£¬ÎªÁ˽â¾öÕâ¸öÎÊÌ⣬һЩ¸ü¸ß¼¶µÄ³éÏó±»Ìá³öÀ´£¬ÎÒÃǽ«ÔÚÏÂÃæ¹ØÓÚ²éѯµÄÄǽÚÀï½øÐÐÌÖÂÛ¡£ 

ʹÓÃMapReduce½â¾öÎÊÌ⣬ͨ³£ÐèÒªÈý¸ö²Ù×÷£º 

Êý¾Ý¼ÓÔØ¡ªÓÃÊý¾Ý²Ö¿âµÄ½Ð·¨£¬Õâ¸ö¹ý³Ì½Ð×ö³éÈ¡(extract),ת»»(transform),¼ÓÔØ(load)£û¼ò³ÆETL£ý¸üºÏÊÊЩ¡£ÎªÁËÀûÓÃMapReduce½øÐд¦Àí£¬Êý¾Ý±ØÐë´ÓÔ´Êý¾ÝÀï³éÈ¡³öÀ´£¬½øÐбØÒªµÄ½á¹¹»¯£¬¼ÓÔØµ½MapReduce¿ÉÒÔ·ÃÎʵĴ洢²ã¡£

 MapReduce¡ª´Ó´æ´¢²ã·ÃÎÊÊý¾Ý£¬½øÐд¦Àí£¬ÔÙ½«½á¹û·µ»Ø¸ø´æ´¢²ã 

½á¹û³éÈ¡¡ªÒ»µ©´¦ÀíÍê±Ï£¬ÎªÁËÈýá¹û¶ÔÓÚÈËÀ´ËµÊÇ¿ÉÓõ쬻¹ÐèÒªÄܹ»½«´æ´¢²ãµÄ½á¹ûÊý¾Ý½øÐвéѯºÍչʾ¡£

ºÜ¶àSMAQϵͳ¶¼¾ßÓÐ×ÔÉíµÄһЩÊôÐÔ£¬Ö÷Òª¾ÍÊÇÎ§ÈÆÉÏÊöÈý¸ö¹ý³ÌµÄ¼ò»¯¡£

 Hadoop MapReduce

HadoopÊÇÖ÷ÒªµÄ¿ªÔ´MapReduceʵÏÖ¡£ÓÉyahoo×ÊÖú£¬2006ÄêÓÉDoug Cutting´´½¨£¬2008Äê´ïµ½ÁËweb¹æÄ£µÄÊý¾Ý´¦ÀíÈÝÁ¿¡£ 

HadoopÏîÄ¿ÏÖÔÚÓÉApache¹ÜÀí¡£Ëæ×Ų»¶ÏµÄŬÁ¦£¬ºÍ¶à¸ö×ÓÏîĿһÆð¹²Í¬¹¹³ÉÁËÍêÕûµÄSMAQÄ£ÐÍ¡£ 

ÓÉÓÚÊÇÓÃjavaʵÏֵģ¬ËùÒÔHadoopµÄMapReduceʵÏÖ¿ÉÒÔͨ¹ýjavaÓïÑÔ½»»¥¡£´´½¨MapReduce jobͨ³£ÐèҪдһЩº¯ÊýÓÃÀ´ÊµÏÖmapºÍreduce½×¶ÎÐèÒª×öµÄ¼ÆËã¡£´¦ÀíÊý¾Ý±ØÐëÄܹ»¼ÓÔØµ½HadoopµÄ·Ö²¼Ê½ÎļþϵͳÖС£ 

ÒÔwordcountΪÀý£¬mapº¯ÊýÈçÏÂ(À´Ô´ÓÚHadoop MapReduceÎĵµ£¬Õ¹Ê¾ÁËÆäÖйؼüµÄ²½Öè)

public static class Map
      extends Mapper<LongWritable, Text, Text, IntWritable> {
      private final static IntWritable one = new IntWritable(1);
      private Text word = new Text(); 
      public void map(LongWritable key, Text value, Context context)
           throws IOException, InterruptedException { 
            String line = value.toString();
            StringTokenizer tokenizer = new StringTokenizer(line);
            while (tokenizer.hasMoreTokens()) {
                  word.set(tokenizer.nextToken());
                  context.write(word, one);
            }
      }
}

¶ÔÓ¦µÄreduceº¯ÊýÈçÏ£º

public static class Reduce     
extends Reducer<Text, IntWritable, Text, IntWritable> { 
      public void reduce(Text key, Iterable<IntWritable> values,
            Context context) throws IOException, InterruptedException { 
            int sum = 0;
            for (IntWritable val : values) {
                  sum += val.get();
            }
            context.write(key, new IntWritable(sum));
      }
}     

ʹÓÃHadoopÔËÐÐÒ»¸öMapReduce job°üÀ¨Èçϼ¸¸ö²½Ö裺

1.       ÓÃÒ»¸öjava³ÌÐò¶¨ÒåMapReduceµÄ¸÷¸ö½×¶Î

2.       ½«Êý¾Ý¼ÓÔØ½øÎļþϵͳ

3.       Ìá½»job½øÐÐÖ´ÐÐ

4.       ´ÓÎļþϵͳ»ñȡִÐнá¹û 

Ö±½Óͨ¹ýjava API£¬Hadoop MapReduce jobдÆðÀ´¿ÉÄܸܺ´ÔÓ£¬ÐèÒª³ÌÐòÔ±ºÜ¶à·½ÃæµÄ²ÎÓ롣ΪÁËÈÃÊý¾Ý¼ÓÔØºÍ´¦Àí¹¤×÷¸ü¼Ó¼òµ¥Ö±½Ó£¬Î§ÈÆ×ÅHadoopÒ»¸öºÜ´óµÄÉú̬ϵͳÒѾ­Ðγɡ£

 ÆäËûʵÏÖ

MapReduceÒѾ­ÔںܶàÆäËûµÄ³ÌÐòÓïÑÔºÍϵͳÖÐʵÏÖ£¬ÏêϸµÄÁбí¿ÉÒԲο¼Wikipedia's entry for MapReduce.¡£ÓÈÆäÊǼ¸¸öNoSQLÊý¾ÝÒѾ­¼¯³ÉÁËMapReduce£¬ºóÃæÎÒÃÇ»á¶Ô´Ë½øÐÐÃèÊö¡£

 Storage

´ÓÊý¾Ý»ñÈ¡µ½½á¹û´æ·Å£¬MapReduce¶¼ÐèÒªÓë´æ´¢´ò½»µÀ¡£Ó봫ͳÊý¾Ý¿â²»Í¬£¬MapReduceµÄÊäÈëÊý¾Ý²¢²»Ê**ØÏµÐ͵ġ£ÊäÈëÊý¾Ý´æ·ÅÔÚ²»Í¬µÄchunkÉÏ£¬Äܹ»»®·Ö¸ø²»Í¬µÄ½Úµã£¬È»ºóÌṩÒÔkey-valueµÄÐÎʽÌṩ¸ømap½×¶Î¡£Êý¾Ý²»ÐèÒªÒ»¸öschema£¬¶øÇÒ¿ÉÄÜÊÇÎ޽ṹµÄ¡£µ«ÊÇÊý¾Ý±ØÐëÊǿɷֲ¼µÄ£¬Äܹ»Ìṩ¸ø²»Í¬µÄ´¦Àí½Úµã¡£ 

´æ´¢²ãµÄÉè¼ÆºÍÌØµãºÜÖØÒª²»½ö½öÊÇÒòΪËüÓëMapReduceµÄ½Ó¿Ú£¬¶øÇÒÒòΪËüÃÇÖ±½Ó¾ö¶¨ÁËÊý¾Ý¼ÓÔØºÍ½á¹û²éѯºÍչʾµÄ·½±ãÐÔ¡£

Hadoop·Ö²¼Ê½Îļþϵͳ 

HadoopʹÓõıê×¼´æ´¢»úÖÆÊÇHDFS¡£×÷ΪHadoopµÄºËÐIJ¿·Ö£¬HDFSÓÐÈçÏÂÌØµã£¬Ïêϸ²Î¼ûHDFS design document.£º

ÈÝ´í -- ¼ÙÉèʧ°ÜÊdz£Ì¬ÔÊÐíHDFSÔËÐÐÔÚÆÕͨӲ¼þÉÏ

Á÷Êý¾Ý·ÃÎÊ ¨C HDFSʵÏÖʱ¿¼ÂǵÄÊÇÅúÁ¿´¦Àí£¬Òò´Ë×ÅÖØÓÚ¸ßÍÌÍÂÂʶø²»ÊÇÊý¾ÝµÄËæ»ú·ÃÎÊ

¸ß¶È¿ÉÀ©Õ¹ÐÔ ¨C HDFS¿ÉÒÔÀ©Õ¹µ½PB¼¶µÄÊý¾Ý£¬±ÈÈçFacebook¾ÍÓÐÒ»¸öÕâÑùµÄ²úÆ·¼¶Ê¹ÓÃ

¿ÉÒÆÖ²ÐÔ ¨C HadoopÊÇ¿ÉÒÔ¿ç²Ù×÷ÏµÍ³ÒÆÖ²µÄ

µ¥´Îд ¨C ¼ÙÉèÎļþдºó²»»á¸Ä±ä£¬HDFS¼ò»¯ÁËreplicationÌá¸ßÁËÊý¾ÝÍÌÍÂÂÊ

¼ÆËã±¾µØ»¯ ¨C ¿¼Âǵ½Êý¾ÝÁ¿£¬Í¨³£½«³ÌÐòÒÆµ½Êý¾Ý¸½½üÖ´Ðлá¸ü¿ì£¬HDFSÌṩÁËÕâ·½ÃæµÄÖ§³Ö

 HDFSÌṩÁËÒ»¸öÀàËÆÓÚ±ê×¼ÎļþϵͳµÄ½Ó¿Ú¡£Ó봫ͳÊý¾Ý¿â²»Í¬£¬HDFSÖ»ÄܽøÐÐÊý¾Ý´æ´¢ºÍ·ÃÎÊ£¬¶ø²»ÄÜΪÊý¾Ý½¨Á¢Ë÷Òý¡£ÎÞ·¨¶ÔÊý¾Ý½øÐмòµ¥µÄËæ»ú·ÃÎÊ¡£µ«ÊÇһЩ¸ü¸ß¼¶µÄ³éÏóÒѾ­´´½¨³öÀ´£¬ÓÃÀ´Ìṩ¶ÔHadoopµÄ¸üϸÁ£¶ÈµÄ¹¦ÄÜ£¬±ÈÈçHBase¡£

 HBase,HadoopÊý¾Ý¿â

Ò»ÖÖʹHDFS¸ü¾ß¿ÉÓÃÐԵķ½·¨ÊÇHBase¡£Ä£·Â¹È¸èµÄBigTableÊý¾Ý¿â£¬HBaseÒ²ÊÇÒ»¸öÉè¼ÆÓÃÀ´´æ´¢º£Á¿Êý¾ÝµÄÁдæÊ½Êý¾Ý¿â¡£ËüÒ²ÊôÓÚNoSQLÊý¾Ý¿â·¶³ë£¬ÀàËÆÓÚCassandra and Hypertable¡£

 HBaseʹÓÃHDFS×÷Ϊµ×²ã´æ´¢ÏµÍ³£¬Òò´ËÒ²¾ßÓÐͨ¹ý´óÁ¿ÈÝ´í·Ö²¼Ê½½ÚµãÀ´´æ´¢´óÁ¿µÄÊý¾ÝµÄÄÜÁ¦¡£ÓëÆäËûµÄÁд洢Êý¾Ý¿âÀàËÆ£¬HBaseÒ²Ìṩ»ùÓÚRESTºÍThriftµÄ·ÃÎÊAPI¡£ 

ÓÉÓÚ´´½¨ÁËË÷Òý£¬HBase¿ÉÒÔΪһЩ¼òµ¥µÄ²éѯÌṩ¶ÔÄÚÈÝ¿ìËÙµÄËæ»ú·ÃÎÊ¡£¶ÔÓÚ¸´ÔӵIJÙ×÷£¬HBaseΪHadoop MapReduceÌṩÊý¾ÝÔ´ºÍ´æ´¢Ä¿±ê¡£Òò´ËHBaseÔÊÐíϵͳÒÔÊý¾Ý¿âµÄ·½Ê½ÓëMapReduce½øÐн»»¥£¬¶ø²»ÊÇͨ¹ýµ×²ãµÄHDFS¡£

 Hive

Êý¾Ý²Ö¿â»òÕßÊÇʹ±¨¸æºÍ·ÖÎö¸ü¼òµ¥µÄ´æ´¢·½Ê½ÊÇSMAQϵͳµÄÒ»¸öÖØÒªÓ¦ÓÃÁìÓò¡£×î³õÔÚFacebook¿ª·¢µÄHive£¬ÊÇÒ»¸ö½¨Á¢ÔÚHadoopÖ®ÉÏÊÇÊý¾Ý²Ö¿â¿ò¼Ü¡£ÀàËÆÓÚHBase£¬HiveÌṩһ¸öÔÚHDFSÉϵĻùÓÚ±íµÄ³éÏ󣬼ò»¯Á˽ṹ»¯Êý¾ÝµÄ¼ÓÔØ¡£ÓëHBaseÏà±È£¬HiveÖ»ÄÜÔËÐÐMapReduce job½øÐÐÅúÁ¿Êý¾Ý·ÖÎö¡£ÈçÏÂÃæ²éѯÄDz¿·ÖÃèÊöµÄ£¬HiveÌṩÁËÒ»¸öÀàSQLµÄ²éѯÓïÑÔÀ´Ö´ÐÐMapReduce job¡£

 Cassandra and Hypertable

CassandraºÍ Hypertable¶¼ÊǾßÓÐBigTableģʽµÄÀàËÆÓÚHBaseµÄÁд洢Êý¾Ý¿â¡£ 

 ×÷ΪApacheµÄÒ»¸öÏîÄ¿£¬Cassandra×î³õÊÇÔÚFacebook²úÉúµÄ¡£ÏÖÔÚÓ¦ÓÃÔںܶà´ó¹æÄ£µÄwebÕ¾µã£¬°üÀ¨Twitter, Facebook, Reddit and Digg¡£Hypertable²úÉúÓÚZvents£¬ÏÖÔÚÒ²ÊÇÒ»¸ö¿ªÔ´ÏîÄ¿¡£

ÕâÁ½¸öÊý¾Ý¿â¶¼ÌṩÓëHadoop MapReduce½»»¥µÄ½Ó¿Ú£¬ÔÊÐíËüÃÇ×÷ΪHadoop MapReduce jobµÄÊý¾ÝÔ´ºÍÄ¿±ê¡£ÔÚ¸ü¸ß²ã´ÎÉÏ£¬CassandraÌṩÓëPig²éѯÓïÑԵÉ(²Î¼û²éѯÕ½Ú)£¬¶øHypertableÒѾ­ÓëHive¼¯³É¡£ 

NoSQLÊý¾Ý¿âµÄMapReduceʵÏÖ

ĿǰΪֹÎÒÃÇÌáµ½µÄ´æ´¢½â¾ö·½°¸¶¼ÊÇÒÀÀµÓÚHadoop½øÐÐMapReduce¡£»¹ÓÐһЩNoSQLÊý¾Ý¿âΪÁ˶Դ洢Êý¾Ý½øÐв¢ÐмÆËã±¾Éí¾ßÓÐÄÚ½¨µÄMapreduceÖ§³Ö¡£ÓëHadoopϵͳµÄ¶à×é¼þSMAQ¼Ü¹¹²»Í¬£¬ËüÃÇÌṩһ¸öÓÉstorage, MapReduce and queryÒ»Ìå×é³ÉµÄ×Ô°üº¬ÏµÍ³¡£ 

»ùÓÚHadoopµÄϵͳͨ³£ÊÇÃæÏòÅúÁ¿´¦Àí·ÖÎö£¬NoSQL´æ´¢Í¨³£ÊÇÃæÏòʵʱӦÓá£ÔÚÕâЩÊý¾Ý¿âÀMapReduceͨ³£Ö»ÊÇÒ»¸ö¸½¼Ó¹¦ÄÜ£¬×÷ΪÆäËû²éѯ»úÖÆµÄÒ»¸ö²¹³ä¶ø´æÔÚ¡£±ÈÈ磬ÔÚRiakÀ¶ÔMapReduce jobͨ³£ÓÐÒ»¸ö60ÃëµÄ³¬Ê±ÏÞÖÆ£¬¶øÍ¨³£À´Ëµ£¬ Hadoop ÈÏΪһ¸öjob¿ÉÄÜÔËÐÐÊý·ÖÖÓ»òÕßÊýСʱ¡£ 

ÏÂÃæµÄÕâЩNoSQLÊý¾Ý¿â¶¼¾ßÓÐMapReduce¹¦ÄÜ£º

CouchDB£¬Ò»¸ö·Ö²¼Ê½Êý¾Ý¿â£¬ÌṩÁ˰ë½á¹¹»¯µÄÎĵµ´æ´¢¹¦ÄÜ¡£Ö÷ÒªÌØµãÊÇÌṩºÜÇ¿µÄ¶à¸±±¾Ö§³Ö£¬ÒÔ¼°¿ÉÒÔ½øÐзֲ¼Ê½¸üС£ÔÚCouchDBÀ²éѯÊÇͨ¹ýʹÓÃjavascript¶¨ÒåMapReduceµÄmapºÍreduce½×¶ÎʵÏֵġ£

MongoDB£¬±¾ÉíºÜÀàËÆÓÚCouchDB£¬µ«ÊǸü×¢ÖØÐÔÄÜ£¬¶ÔÓÚ·Ö²¼Ê½¸üУ¬¸±±¾£¬°æ±¾µÄÖ§³ÖÏà¶ÔÈõЩ¡£MapReduceÒ²ÊÇͨ¹ýjavascriptÃèÊöµÄ¡£

Riak£¬ÓëÇ°ÃæÁ½¸öÊý¾Ý¿âÒ²ºÜÀàËÆ¡£µ«ÊǸü¹Ø×¢¸ß¿ÉÓÃÐÔ¡£¿ÉÒÔʹÓÃjavascript»òÕßErlangÃèÊöMapReduce¡£

 Óë¹ØÏµÐÍÊý¾Ý¿âµÄ¼¯³É 

ÔںܶàÓ¦ÓÃÖУ¬Ö÷ÒªµÄÔ´Êý¾Ý´æ´¢ÔÚ¹ØÏµÐÍÊý¾Ý¿âÖУ¬±ÈÈçMysql»òÕßOracle¡£MapReduceͨ³£Í¨¹ýÁ½ÖÖ·½Ê½Ê¹ÓÃÕâЩÊý¾Ý£º

ʹÓùØÏµÐÍÊý¾Ý¿â×÷ΪԴ(±ÈÈçÉç½»ÍøÂçÖеÄÅóÓÑÁбí)

½«MapReduce½á¹ûÖØÐÂ×¢Èëµ½¹ØÏµÐÍÊý¾Ý¿â(±ÈÈç»ùÓÚÅóÓѵÄÐËȤ²úÉúµÄ²úÆ·ÍÆ¼öÁбí)

 Àí½âMapReduceÈçºÎÓë¹ØÏµÐÍÊý¾Ý¿â½»»¥ÊǺÜÖØÒªµÄ¡£×î¼òµ¥µÄ£¬Í¨¹ý×éºÏʹÓÃSQLµ¼³öÃüÁîºÍHDFS²Ù×÷£¬´ø·Ö¸ô·ûµÄÎı¾Îļþ¿ÉÒÔ×÷Ϊ´«Í³¹ØÏµÐÍÊý¾Ý¿âºÍHadoopϵͳ¼äµÄµ¼Èëµ¼³ö¸ñʽ¡£¸ü½øÒ»²½µÄ½²£¬»¹´æÔÚһЩ¸ü¸´ÔӵŤ¾ß¡£

 Sqoop¹¤¾ßÊÇÉè¼ÆÓÃÀ´½«Êý¾Ý´Ó¹ØÏµÐÍÊý¾Ý¿âµ¼Èëµ½Hadoopϵͳ¡£ËüÊÇÓÉCloudera¿ª·¢µÄ£¬Ò»¸öרעÓÚÆóÒµ¼¶Ó¦ÓõÄHadoopƽ̨¾­ÏúÉÌ¡£SqoopÊÇÓë¾ßÌåÊý¾Ý¿âÎ޹صģ¬ÒòΪËüʹÓÃÁËjavaµÄJDBCÊý¾Ý¿âAPI¡£¿ÉÒÔ½«Õû¸ö±íµ¼È룬Ҳ¿ÉÒÔʹÓòéѯÃüÁîÏÞÖÆÐèÒªµ¼ÈëµÄÊý¾Ý¡£

 SqoopÒ²Ìṩ½«MapReduceµÄ½á¹û´ÓHDFSµ¼»Ø¹ØÏµÐÍÊý¾Ý¿âµÄ¹¦ÄÜ¡£ÒòΪHDFSÊÇÒ»¸öÎļþϵͳ£¬ËùÒÔSqoopÐèÒªÒÔ·Ö¸ô·û±êʶµÄÎı¾ÎªÊäÈ룬ÐèÒª½«ËüÃÇת»»ÎªÏàÓ¦µÄSQLÃüÁî²ÅÄܽ«Êý¾Ý²åÈëµ½Êý¾Ý¿â¡£

 ¶ÔÓÚHadoopϵͳÀ´Ëµ£¬Í¨¹ýʹÓÃCascading APIÖеÄcascading.jdbcºÍ cascading-dbmigrateÒ²ÄÜʵÏÖÀàËÆµÄ¹¦ÄÜ¡£

 ÓëstreamingÊý¾ÝÔ´µÄ¼¯³É

¹ØÏµÐÍÊý¾Ý¿âÒÔ¼°Á÷ʽÊý¾ÝÔ´(±ÈÈçweb·þÎñÆ÷ÈÕÖ¾£¬´«¸ÐÆ÷Êä³ö)×é³ÉÁ˺£Á¿Êý¾ÝϵͳµÄ×î³£¼ûµÄÊý¾ÝÀ´Ô´¡£ClouderaµÄFlumeÏîÄ¿¾ÍÊÇÖ¼ÔÚÌṩÁ÷ʽÊý¾ÝÔ´ÓëHadoopÖ®¼ä¼¯³ÉµÄ·½±ã¹¤¾ß¡£FlumeÊÕ¼¯À´×ÔÓÚ¼¯Èº»úÆ÷ÉϵÄÊý¾Ý£¬½«ËüÃDz»¶ÏµÄ×¢Èëµ½HDFSÖС£FacebookµÄScribe·þÎñÆ÷Ò²ÌṩÀàËÆµÄ¹¦ÄÜ¡£ 

ÉÌÒµÐÔµÄSMAQ½â¾ö·½°¸

һЩMPPÊý¾Ý¿â¾ßÓÐÄÚ½¨µÄMapReduce¹¦ÄÜÖ§³Ö¡£MPPÊý¾Ý¿â¾ßÓÐÒ»¸öÓɲ¢ÐÐÔËÐеĶÀÁ¢½Úµã×é³ÉµÄ·Ö²¼Ê½¼Ü¹¹¡£ËüÃǵÄÖ÷Òª¹¦ÄÜÊÇÊý¾Ý²Ö¿âºÍ·ÖÎö£¬¿ÉÒÔʹÓÃSQL¡£ 

Greenplum£º»ùÓÚ¿ªÔ´µÄPostreSQL DBMS£¬ÔËÐÐÔÚ·Ö²¼Ê½Ó²¼þ×é³ÉµÄ¼¯ÈºÉÏ¡£MapReduce×÷ΪSQLµÄ²¹³ä£¬¿ÉÒÔ½øÐÐÔÚGreenplumÉϵĸü¿ìËÙ¸ü´ó¹æÄ£µÄÊý¾Ý·ÖÎö£¬¼õÉÙÁ˼¸¸öÊýÁ¿¼¶µÄ²éѯʱ¼ä¡£Greenplum MapReduceÔÊÐíʹÓÃÓÉÊý¾Ý¿â´æ´¢ºÍÍⲿÊý¾ÝÔ´×é³ÉµÄ»ìºÏÊý¾Ý¡£MapReduce²Ù×÷¿ÉÒÔʹÓÃPerl»òÕßPythonº¯Êý½øÐÐÃèÊö¡£ 

Aster Data µÄnClusterÊý¾Ý²Ö¿âϵͳҲÌṩMapReduceÖ§³Ö¡£MapReduce²Ù×÷¿ÉÒÔͨ¹ýʹÓÃAster DataµÄSQL-MapReduce¼¼Êõµ÷Óá£SQL-MapReduce¼¼Êõ¿ÉÒÔʹSQL²éѯºÍͨ¹ý¸÷ÖÖÓïÑÔ(C#, C++, Java, R or Python)µÄÔ´´úÂ붨ÒåµÄMapReduce job×éºÏÔÚÒ»¿é¡£

 ÆäËûµÄһЩÊý¾Ý²Ö¿â½â¾ö·½°¸Ñ¡ÔñÌṩÓëHadoopµÄÁ¬½ÓÆ÷£¬¶ø²»ÊÇÔÚÄÚ²¿¼¯³ÉMapReduce¹¦ÄÜ¡£

Vertica£ºÊÇÒ»¸öÌṩÁËHadoopÁ¬½ÓÆ÷µÄÁдæÊ½Êý¾Ý¿â¡£

Netezza£º×î½üÓÉIBMÊÕ¹º¡£ÓëClouderaºÏ×÷Ìá¸ßÁËËüÓëHadoopÖ®¼äµÄ»¥²Ù×÷ÐÔ¡£¾¡¹ÜËü½â¾öÁËÀàËÆµÄÎÊÌ⣬µ«ÊÇʵ¼ÊÉÏËüÒѾ­²»ÔÚÎÒÃǵÄSMAQÄ£ÐͶ¨ÒåÖ®ÄÚ£¬ÒòΪËü¼È²»¿ªÔ´Ò²²»ÔËÐÐÔÚÆÕͨӲ¼þÉÏ¡£ 

¾¡¹Ü¿ÉÒÔÈ«²¿Ê¹ÓÿªÔ´Èí¼þÀ´´´½¨Ò»¸ö»ùÓÚHadoopµÄϵͳ£¬µ«ÊǼ¯³ÉÕâÑùµÄÒ»¸öϵͳÈÔÈ»ÐèҪһЩŬÁ¦¡£ClouderaµÄÄ¿µÄ¾ÍÊÇʹµÃHadoop¸üÄÜÊÊÓ¦ÓÃÆóÒµ»¯µÄÓ¦Ó㬶øÇÒÔÚËüÃǵÄCloudera Distribution for Hadoop (CDH)ÖÐÒѾ­Ìṩһ¸öͳһµÄHadoop·¢Ðа档

 ²éѯ

ͨ¹ýÉÏÃæµÄjava´úÂë¿ÉÒÔ¿´³öʹÓóÌÐòÓïÑÔ¶¨ÒåMapReduce jobµÄmapºÍreduce¹ý³Ì²¢²»ÊÇÄÇôµÄÖ±¹ÛºÍ·½±ã¡£ÎªÁ˽â¾öÕâ¸öÎÊÌ⣬SMAQϵͳÒýÈËÁËÒ»¸ö¸ü¸ß²ãµÄ²éѯ²ãÀ´¼ò»¯MapReduce²Ù×÷ºÍ½á¹û²éѯ¡£

ºÜ¶àʹÓÃHadoopµÄ×é֯ΪÁËʹ²Ù×÷¸ü¼Ó·½±ã£¬ÒѾ­¶ÔHadoopµÄAPI½øÐÐÁËÄÚ²¿µÄ·â×°¡£ÓÐЩÒѾ­³ÉΪ¿ªÔ´ÏîÄ¿»òÕßÉÌÒµÐÔ²úÆ·¡£

²éѯ²ãͨ³£²¢²»½ö½öÌṩÓÃÓÚÃèÊö¼ÆËã¹ý³ÌµÄÌØÐÔ£¬¶øÇÒÖ§³Ö¶ÔÊý¾ÝµÄ´æÈ¡ÒÔ¼°¼ò»¯ÔÚMapReduce¼¯ÈºÉϵÄÖ´ÐÐÁ÷³Ì¡£

 Pig

ÓÉyahoo¿ª·¢£¬Ä¿Ç°ÊÇHadoopÏîÄ¿µÄÒ»²¿·Ö¡£PigÌṩÁËÒ»¸ö³ÆÎªPig LatinµÄ¸ß¼¶²éѯÓïÑÔÀ´ÃèÊöºÍÔËÐÐMapReduce job¡£ËüµÄÄ¿µÄÊÇÈÃHadoop¸üÈÝÒ×±»ÄÇЩÊìϤSQLµÄ¿ª·¢ÈËÔ±·ÃÎÊ£¬³ýÁËÒ»¸öJava API£¬Ëü»¹Ìṩһ¸ö½»»¥Ê½µÄ½Ó¿Ú¡£PigĿǰÒѾ­¼¯³ÉÔÚCassandra ºÍHBaseÊý¾Ý¿âÖС£ ÏÂÃæÊÇʹÓÃPigдµÄÉÏÃæµÄwordcountµÄÀý×Ó£¬°üÀ¨ÁËÊý¾ÝµÄ¼ÓÔØºÍ´æ´¢¹ý³Ì($0´ú±í¼Ç¼µÄµÚÒ»¸ö×Ö¶Î)¡£
input = LOAD 'input/sentences.txt' USING TextLoader();
words = FOREACH input GENERATE FLATTEN(TOKENIZE($0));
grouped = GROUP words BY $0;
counts = FOREACH grouped GENERATE group, COUNT(words);
ordered = ORDER counts BY $0;
STORE ordered INTO 'output/wordCount' USING PigStorage();

PigÊǷdz£¾ßÓбí´ïÁ¦µÄ£¬ËüÔÊÐí¿ª·¢Õßͨ¹ýUDFs(User Defined Functions )ÊéдһЩ¶¨ÖÆ»¯µÄ¹¦ÄÜ¡£ÕâЩUDFʹÓÃjavaÓïÑÔÊéд¡£¾¡¹ÜËü±ÈMapReduce API¸üÈÝÒ×Àí½âºÍʹÓ㬵«ÊÇËüÒªÇóÓû§È¥Ñ§Ï°Ò»ÃÅеÄÓïÑÔ¡£Ä³Ð©³Ì¶ÈÉÏËüÓëSQLÓÐЩÀàËÆ£¬µ«ÊÇËüÓÖÓëSQL¾ßÓкܴóµÄ²»Í¬£¬ÒòΪÄÇЩÊìϤSQLµÄÈËÃǺÜÄѽ«ËüÃǵÄ֪ʶÔÚÕâÀïÖØÓᣠ

Hive

ÕýÈçÇ°ÃæËùÊö£¬HiveÊÇÒ»¸ö½¨Á¢ÔÚHadoopÖ®ÉϵĿªÔ´µÄÊý¾Ý²Ö¿â¡£ÓÉFacebook´´½¨£¬ËüÌṩÁËÒ»¸ö·Ç³£ÀàËÆÓÚSQLµÄ²éѯÓïÑÔ£¬¶øÇÒÌṩһ¸öÖ§³Ö¼òµ¥ÄÚ½¨²éѯµÄweb½Ó¿Ú¡£Òò´ËËüºÜÊʺÏÓÚÄÇЩÊìϤSQLµÄ·Ç¿ª·¢ÕßÓû§¡£ 

ÓëPigºÍCascadingµÄÐèÒª½øÐбàÒëÏà±È£¬HiveµÄÒ»¸ö³¤´¦ÊÇÌṩ¼´Ï¯²éѯ¡£¶ÔÓÚÄÇЩÒѾ­³ÉÊìµÄÉÌÎñÖÇÄÜϵͳÀ´Ëµ£¬HiveÊÇÒ»¸ö¸ü×ÔÈ»µÄÆðµã£¬ÒòΪËüÌṩÁËÒ»¸ö¶ÔÓڷǼ¼ÊõÓû§¸ü¼ÓÓѺõĽӿڡ£ClouderaµÄHadoop·¢ÐаæÀO³ÉÁËHive£¬¶øÇÒͨ¹ýHUEÏîÄ¿ÌṩÁËÒ»¸ö¸ü¸ß¼¶µÄÓû§½Ó¿Ú£¬Ê¹µÃÓû§¿ÉÒÔÌá½»²éѯ²¢ÇÒ¼à¿ØMapReduce jobµÄÖ´ÐС£ 

Cascading, the API Approach

CascadingÌṩÁËÒ»¸ö¶ÔHadoopµÄMapReduce APIµÄ°ü×°ÒÔʹËü¸üÈÝÒ×±»javaÓ¦ÓóÌÐòʹÓá£ËüÖ»ÊÇÒ»¸öΪÁËÈÃMapReduce¼¯³Éµ½¸ü´óµÄϵͳÖÐʱ¸ü¼òµ¥µÄÒ»¸ö°ü×°²ã¡£Cascading°üÀ¨Èçϼ¸¸öÌØÐÔ£º

Ö¼ÔÚ¼ò»¯MapReduce job¶¨ÒåµÄÊý¾Ý´¦ÀíAPI

Ò»¸ö¿ØÖÆMapReduce jobÔÚHadoop¼¯ÈºÉÏÔËÐеÄAPI

·ÃÎÊ»ùÓÚJvmµÄ½Å±¾ÓïÑÔ£¬±ÈÈçJython, Groovy, or JRuby.

ÓëHDFSÖ®ÍâµÄÊý¾ÝÔ´µÄ¼¯³É£¬°üÀ¨Amazon S3£¬web·þÎñÆ÷

ÌṩMapReduce¹ý³Ì²âÊÔµÄÑéÖ¤»úÖÆ

CascadingµÄ¹Ø¼üÌØÐÔÊÇËüÔÊÐí¿ª·¢Õß½«MapReduce jobÒÔÁ÷µÄÐÎʽ½øÐÐ×é×°£¬Í¨¹ý½«Ñ¡¶¨µÄһЩpipesÁ¬½ÓÆðÀ´¡£Òò´ËºÜÊÊÓÃÓÚ½«Hadoop¼¯³Éµ½Ò»¸ö¸ü´óµÄϵͳÖС£ Cascading±¾Éí²¢²»Ìṩ¸ß¼¶²éѯÓïÑÔ£¬ÓÉËü¶øÑÜÉú³öµÄÒ»¸ö½ÐCascalogµÄ¿ªÔ´ÏîÄ¿Íê³ÉÁËÕâÏ×÷¡£Cascalogͨ¹ýʹÓÃClojure JVMÓïÑÔʵÏÖÁËÒ»¸öÀàËÆÓÚDatalogµÄ²éѯÓïÑÔ¡£¾¡¹ÜºÜÇ¿´ó£¬CascalogÈÔȻֻÊÇÒ»¸öС·¶Î§ÄÚʹÓõÄÓïÑÔ£¬ÒòΪËü¼È²»ÏñHiveÄÇÑùÌṩһ¸öÀàSQL£¬Ò²²»ÏñPigÄÇÑùÊ**ý³ÌÐԵġ£ÏÂÃæÊÇʹÓÃCascalogÍê³ÉµÄwordcoutµÄÀý×Ó£º
      (defmapcatop split [sentence]
            (seq (.split sentence "\\s+")))
      (?<- (stdout) [?word ?count] 
            (sentence ?s) (split ?s :> ?word)
            (c/count ?count))

ʹÓÃSolr½øÐÐËÑË÷

´ó¹æÄ£Êý¾ÝϵͳµÄÒ»¸öÖØÒª×é¼þ¾ÍÊÇÊý¾Ý²éѯºÍÕªÒª¡£Êý¾Ý¿â²ã±ÈÈçHBaseÌṩÁ˶ÔÊý¾ÝµÄ¼òµ¥·ÃÎÊ£¬µ«ÊDz¢²»¾ß±¸¸´ÔÓµÄËÑË÷ÄÜÁ¦¡£ÎªÁ˽â¾öËÑË÷ÎÊÌâ¡£¿ªÔ´µÄËÑË÷ºÍË÷Òýƽ̨Solrͨ³£ÓëNoSQLÊý¾Ý¿â×éºÏʹÓá£SolrʹÓÃLuenceËÑË÷¼¼ÊõÌṩһ¸ö×Ô°üº¬µÄËÑË÷·þÎñÆ÷²úÆ·¡£±ÈÈ磬¿¼ÂÇÒ»¸öÉç½»ÍøÂçÊý¾Ý¿â£¬MapReduce¿ÉÒÔʹÓÃһЩºÏÀíµÄ²ÎÊýÓÃÀ´¼ÆËã¸öÈ˵ÄÓ°ÏìÁ¦£¬Õâ¸öÊýÖµ»á±»Ð´»Øµ½Êý¾Ý¿â¡£Ö®ºóʹÓÃSolr½øÐÐË÷Òý£¬¾ÍÔÊÐíÔÚÕâ¸öÉç½»ÍøÂçÉϽøÐÐһЩ²Ù×÷£¬±ÈÈçÕÒµ½×îÓÐÓ°ÏìÁ¦µÄÈË¡£ 

×î³õÔÚCENT¿ª·¢£¬ÏÖÔÚ×÷ΪApacheÏîÄ¿µÄSolr£¬ÒѾ­´ÓÒ»¸öµ¥Ò»µÄÎı¾ËÑË÷ÒýÇæÑÝ»¯ÎªÖ§³Öµ¼º½ºÍ½á¹û¾ÛÀà¡£´ËÍ⣬Solr»¹¿ÉÒÔ¹ÜÀí´æ´¢ÔÚ·Ö²¼Ê½·þÎñÆ÷Éϵĺ£Á¿Êý¾Ý¡£ÕâʹµÃËü³ÉΪÔÚº£Á¿Êý¾ÝÉϽøÐÐËÑË÷µÄÀíÏë½â¾ö·½°¸£¬ÒÔ¼°¹¹½¨ÉÌÒµÖÇÄÜϵͳµÄÖØÒª×é¼þ¡£ 

×ܽá

MapReduceÓÈÆäÊÇHadoopʵÏÖÌṩÁËÔÚÆÕͨ·þÎñÆ÷ÉϽøÐзֲ¼Ê½¼ÆËãµÄÇ¿ÓÐÁ¦µÄ·½Ê½¡£ÔÙ¼ÓÉÏ·Ö²¼Ê½´æ´¢ÒÔ¼°Óû§ÓѺõIJéѯ»úÖÆ£¬ËüÃÇÐγɵÄSMAQ¼Ü¹¹Ê¹µÃº£Á¿Êý¾Ý´¦Àíͨ¹ýСÐÍÍŶÓÉõÖÁ¸öÈË¿ª·¢Ò²ÄÜʵÏÖ¡£ 

ÏÖÔÚ¶ÔÊý¾Ý½øÐÐÉîÈëµÄ·ÖÎö»òÕß´´½¨ÒÀÀµÓÚ¸´ÔÓ¼ÆËãµÄÊý¾Ý²úÆ·ÒѾ­±äµÃºÜÁ®¼Û¡£Æä½á¹ûÒѾ­ÉîÔ¶µÄÓ°ÏìÁËÊý¾Ý·ÖÎöºÍÊý¾Ý²Ö¿âÁìÓòµÄ¸ñ¾Ö£¬½µµÍÁ˸ÃÁìÓòµÄ½øÈëÃż÷£¬ÅàÑøÁËÐÂÒ»´úµÄ²úÆ·£¬·þÎñºÍ×éÖ¯·½Ê½¡£ÕâÖÖÇ÷ÊÆÔÚMike LoukidesµÄ"What is Data Science?"±¨¸æÖÐÓиüÉîÈëµÄÚ¹ÊÍ¡£ 

LinuxµÄ³öÏÖ½ö½öͨ¹ýһ̨°ÚÔÚ×ÀÃæÉϵÄlinux·þÎñÆ÷´ø¸øÄÇЩ´´ÐµĿª·¢ÕßÃÇÒÔÁ¦Á¿¡£SMAQÓµÓÐͬÑù´óµÄDZÁ¦À´Ìá¸ßÊý¾ÝÖÐÐĵÄЧÂÊ£¬´Ù½ø×éÖ¯±ßÔµµÄ´´Ð£¬¿ªÆôÁ®¼Û´´½¨Êý¾ÝÇý¶¯ÒµÎñµÄÐÂʱ´ú¡£

        ±¾ÎÄ·­Òë×ÔThe SMAQ stack for big data 

        Ó¢ÎÄÔ­ÎÄ£ºhttp://radar.oreilly.com/2010/09/the-smaq-stack-for-big-data.html

SMAQ´ú±íÁË´æ´¢£¬MapReduceºÍ²éѯ¡£

×ªÔØÇë×¢Ã÷ÒëÕߣºphylips@bmy

³ö´¦£ºhttp://duanple.blog.163.com/blog/static/709717672011016103028473/ 

 
 
  • ±êÇ©£ºSMAQ º£Á¿Êý¾Ý 
  • ·¢±íÆÀÂÛ£º
    ÔØÈëÖС£¡£¡£

     
     
     

    ÃÎÏè¶ùÍøÕ¾ ÃηÉÏèµÄµØ·½ http://www.dreamflier.net
    ÖлªÈËÃñ¹²ºÍ¹úÐÅÏ¢²úÒµ²¿TCP/IPϵͳ ±¸°¸ÐòºÅ£ºÁÉICP±¸09000550ºÅ

    Powered by Oblog.