¼ò½é
Hadoop ÊÇÒ»¸öÁé»îµÄ¿ª·ÅÔ´Âë Java ¿ò¼Ü£¬ÓÃÓÚÔÚÒ»°ãÓ²¼þÍøÂçÉÏÖ´Ðдó¹æÄ£Êý¾Ý´¦Àí¡£ËüµÄ˼ÏëÀ´Ô´ÓÚ×î³õÓÉ Google Labs ¿ª·¢µÄ MapReduce ºÍ Global File System (GFS) ¼¼Êõ£¬ÓÉÓÚ¾ßÓиßЧ¡¢¿É¿¿ºÍ¿ÉÉìËõµÄÓŵ㣬ËüÔ½À´Ô½Á÷ÐÐÁË¡£Hadoop ÏÖÔÚÊǶ¥¼¶ Apache ÏîÄ¿£¬IBM¡¢Google¡¢Yahoo! ºÍ Facebook µÈÐí¶à¹«Ë¾¶¼Ö§³ÖºÍʹÓà Hadoop£¬ËüÒѾ³ÉΪ´ó¹æÄ£Êý¾Ý´¦Àí·½ÃæÊÂʵÉϵÄÐÐÒµ±ê×¼¿ò¼Ü¡£
Hadoop ¶ÔÓÚÔÆ¼ÆËãÒâζ×Åʲô£¿ÔƼÆËãµÄÄ¿µÄÖ®Ò»ÊÇ£¬ÒÔ¾¡¿ÉÄܵ͵ĿªÏúΪ¼ÆËã»ú×ÊÔ´Ìṩ¸ß¿ÉÓÃÐÔ¡£Hadoop Äܹ»´¦ÀíÊýǧ¸ö½ÚµãºÍ PB Á¿¼¶µÄÊý¾Ý£¬¿ÉÒÔ×Ô¶¯µØ´¦Àí×÷Òµµ÷¶È¡¢²¿·Öʧ°ÜºÍ¸ºÔØÆ½ºâ£¬Òò´ËËüÊÇʵÏÖÕâ¸öÄ¿±êµÄÍêÃÀ¹¤¾ß¡£
ÒªÏë³ä·ÖʹÓüÆËã»ú×ÊÔ´£¬ÓÅ»¯ÐÔÄÜÊǷdz£ÖØÒªµÄ£¬°üÀ¨ CPU¡¢ÄÚ´æºÍ I/O£¨´ÅÅ̺ÍÍøÂ磩¡£Hadoop ¿ÉÒÔ×Ô¶¯µØ¸Ä½øÐÔÄÜ£¬Í¬Ê±ÏòÓû§Ìṩ½Ó¿Ú£¬ÈÃËûÃÇ¿ÉÒÔÕë¶Ô×Ô¼ºµÄÓ¦ÓóÌÐòÓÅ»¯ÐÔÄÜ¡£±¾ÎĽéÉÜÖØÒªµÄ Hadoop ¿ÉÅäÖòÎÊýÒÔ¼°·ÖÎöºÍµ÷ÓÅÐÔÄܵķ½·¨¡£
½¨Á¢»·¾³
²¿Êð Hadoop »·¾³µÄ²½Öè
ÔÚÖ´ÐÐÐÔÄܵ÷ÓÅ֮ǰ£¬ÐèÒªÏȹ¹½¨ Hadoop ¼¯Èº»·¾³¡£²½ÖèÈçÏ£º
×¼±¸¼¯Èº½Úµã£¬ÔÚÕâЩ½ÚµãÉϰ²×° Linux OS¡¢JDK 1.6 ºÍ ssh¡£È·±£Ã¿¸ö½ÚµãÉ϶¼ÔÚÔËÐÐ sshd¡£
·ÃÎÊ The Apache Software Foundation Õ¾µã£¬ÏÂÔØÎȶ¨µÄ Hadoop ·¢Ðа档
Ñ¡Ôñ×Ô¼ºµÄ NameNode (NN)¡¢JobTracker (JT) ºÍ Secondary NameNode (SNN)£»ÆäËû½ÚµãÊÇ DataNode (DN) ºÍ TaskTracker (TT)¡£±¾ÎļÙÉèÑ¡Ôñ host001 ×÷Ϊ NN£¬host002 ×÷Ϊ JT£¬host003 ×÷Ϊ SNN¡£
Èà NN¡¢JT ºÍ SNN Äܹ»Í¨¹ý ssh ÎÞÐèÃÜÂëµØ·ÃÎÊËùÓÐ DN ºÍ TT¡£
ÔÚÿ¸ö½ÚµãÉϽâѹÏÂÔØµÄ Hadoop ·¢Ðа棻ÏÂÃæÊ¹Óà $HADOOP_HOME ´ú±í½âѹλÖá£
ÔÚ NN ÉÏ£¬½øÈë $HADOOP_HOME Ŀ¼²¢ÐÞ¸ÄÅäÖÃÎļþ¡£
°Ñ host003 Ìí¼Óµ½ $HADOOP_HOME/conf/masters ÖС£
°ÑËùÓÐ DN/TT ½ÚµãµÄ IP µØÖ·/Ö÷»úÃûÌí¼Óµ½ $HADOOP_HOME/conf/slaves ÖУ¬Ã¿¸öÖ÷»úÒ»ÐС£ ×¢Ò⣺Èç¹ûʹÓÃÖ÷»úÃû£¬¾ÍÐèҪͨ¹ýÅäÖà /etc/hosts ÎļþÈ·±£¼¯ÈºÖеÄËùÓнڵ㶼֪µÀÿ¸öÖ÷»úÃû¡£
°ÑÒÔÏÂÊôÐÔÌí¼Óµ½ $HADOOP_HOME/conf/core-site.xml ÖÐÒÔÉèÖà NN IP/¶Ë¿Ú£º
<property>
<name>fs.default.name</name>
<value>hdfs://host001:9000</value>
</property>
°ÑÒÔÏÂÊôÐÔÌí¼Óµ½ $HADOOP_HOME/conf/mapred-site.xml ÖÐÒÔÉèÖà JT IP/¶Ë¿Ú£º
<property>
<name> mapred.job.tracker </name>
<value>host002:9001</value>
</property>
×¢Ò⣺Èç¹ûʹÓà Hadoop 0.21.0£¬Õâ¸öÊôÐÔÃûÓ¦¸ÃÊÇ mapreduce.jobtracker.address¡£
Èç¹ûÔÚ NN ÉÏÓжà¸öÍøÂç½Ó¿Ú£¬ÄÇô°ÑÒÔÏÂÊôÐÔÌí¼Óµ½ $HADOOP_HOME/conf/hdfs-site.xml ÖУº
<property>
<name>dfs.datanode.dns.nameserver</name>
<value>eth1</value>
<description>The name of the Network Interface from which a data node
should report its IP address.
</description>
</property>
°ÑÉÏÃæÌáµ½µÄËùÓÐÅäÖÃÎļþ´Ó NN ¸´ÖƵ½¼¯ÈºÖÐËùÓÐÆäËû½ÚµãÉ쵀 $HADOOP_HOME/conf/ Ŀ¼ÖС£
ÔÚ NN ÉϽøÈë $HADOOP_HOME/bin Ŀ¼¡£
ʹÓà $./hadoop namenode -format ÃüÁî¸ñʽ»¯ NN¡£
Æô¶¯ start-all.sh ½Å±¾ÒÔÆô¶¯ Hadoop ÊØ»¤½ø³Ì¡£
¸üÏêϸµÄÐÅÏ¢²Î¼û Hadoop Common¡£×¢Ò⣺Èç¹ûÑ¡Óà Hadoop 0.21.0£¬ÄÇô±ØÐëʹÓõ±Ç°µÄ JDK£¨ÓÉ JIRA HADOOP-6941 ¸ú×Ù£©¡£
°²×°²¢ÅäÖà nmon ÐÔÄܼàÊÓ¹¤¾ß
nmon ÊÇÒ»¸öϵͳ¹ÜÀí¡¢µ÷Óźͻù×¼²âÊÔ¹¤¾ß£¬¿ÉÒÔ¼ò±ãµØ¼àÊÓ´óÁ¿ÖØÒªµÄÐÔÄÜÐÅÏ¢¡£¿ÉÒÔÔÚÕû¸öÐÔÄܵ÷ÓŹý³ÌÖÐʹÓà nmon ×÷Ϊ¼àÊÓ¹¤¾ß¡£°´ÒÔϲ½Öè°²×°²¢ÅäÖà nmon£¬½¨Á¢×Ô¼ºµÄÐÔÄܼàÊÓϵͳ£º
´Ó nmon for Linux Õ¾µãÏÂÔØ nmon ¶þ½øÖưü¡£ÕÒµ½ÊʺÏÄúµÄ Linux OS µÄ°æ±¾£¬°ÑËü¸´ÖƵ½ Hadoop ¼¯ÈºµÄËùÓнڵ㡣ÏÂÃæÊ¹Óà $NMON_HOME ´ú±í·ÅÖà nmon ¶þ½øÖÆ´úÂëµÄλÖá£
ÒòΪÒѾÈà NN¡¢JT ºÍ SNN Äܹ»Í¨¹ý ssh ÎÞÐèÃÜÂëµØ·ÃÎÊËùÓÐÆäËû½Úµã£¬¶øÇÒ½«ÔÚ JT ÉÏÌá½»ËùÓÐ map/reduce ×÷Òµ£¬ËùÒÔÑ¡Ôñ JT ×÷ΪÖÐÐĽڵãÊÕ¼¯ËùÓÐ nmon Êý¾Ý¡£µÇ¼ JT ½Úµã£¬È»ºóÖ´ÐÐÒÔϲ½Öè¡£
ʹÓÃÒÔÏÂÃüÁîÔÚ JT (host002) ÉÏ´´½¨Ò»¸öĿ¼£¨ÀýÈç /home/hadoop/perf_share£©²¢Í¨¹ý NFS ¹²ÏíËü£º
´´½¨Ä¿Â¼£º$mkdir /home/hadoop/perf_share
ÐÞ¸Ä /etc/exports Îļþ£¬ÔÚÆäÖаüº¬ÒÔÏÂÐУº/home/hadoop/perf_share *(rw,sync)
ÖØÐÂÆô¶¯ NFS ·þÎñ£º$/etc/rc.d/init.d/nfs restart
ÔÚËùÓÐÆäËû½ÚµãÉÏ´´½¨Õâ¸öĿ¼²¢°ÑËüÃ**Ò×°µ½ JT É쵀 perf_share Ŀ¼£º
$mkdir/home/hadoop/perf_share
$mount host002: /home/hadoop/perf_share /home/hadoop/perf_share
´´½¨ÒÔϽű¾ÒÔ±ãÔÚËùÓнڵãÉÏÆô¶¯ nmon£º
hosts=( shihc008 shihc009 shihc010 shihc011 shihc012 shihc013 shihc014 shihc015
shihc016 shihc017)
# Remove all data in /home/hadoop/perf_share
for host in ${hosts[@]}
do
ssh $host "cd /home/hadoop/perf_share;rm -rf *"
done
# Start nmon on all nodes
for host in ${hosts[@]}
do
ssh $host " /usr/bin/nmon -f -m /home/hadoop/perf_share -s 30 -c 360"
done
ÔÚ×îºóµÄ nmon ÃüÁîÖУ¬-f ±íʾϣÍû°ÑÊý¾Ý±£´æµ½ÎļþÖУ¬²¢²»ÔÚÆÁÄ»ÉÏÏÔʾ£»-m ±íʾ±£´æÊý¾ÝµÄλÖã»-s 30 ±íʾϣÍûÿ 30 Ãë²¶×½Ò»´ÎÊý¾Ý£»-c 360 ±íʾÐèÒª 360 ¸öÊý¾Ýµã£¨¼´¿ìÕÕ£©£¬×ÜÊý¾ÝÊÕ¼¯Ê±¼äΪ 30x360 Ã룬¼´ 3 Сʱ¡£
´Ó nmonanalyser wiki ÏÂÔØ nmonanalyser£¨Õâ¸ö Excel µç×Ó±í¸ñ½ÓÊÜ nmon µÄÊä³öÎļþ£¬Éú³ÉһЩƯÁÁµÄͼ±íÒÔ°ïÖú·ÖÎö£©£¬ÓÃËü·ÖÎöÊÕ¼¯µ½µÄ¼àÊÓÊý¾Ý¡£
Hadoop ¿ÉÅäÖòÎÊý
Hadoop ÌṩÐí¶àÅäÖÃÑ¡ÏÓû§ºÍ¹ÜÀíÔ±¿ÉÒÔͨ¹ýËüÃǽøÐм¯ÈºÉèÖú͵÷ÓÅ¡£core/hdfs/mapred-default.xml ÖÐÓÐÐí¶à±äÁ¿£¬¿ÉÒÔÔÚ core/hdfs/mapred-site.xml Öи²¸ÇËüÃÇ¡£Ò»Ð©±äÁ¿Ö¸¶¨ÏµÍ³ÉϵÄÎļþ·¾¶£¬¶øÆäËû±äÁ¿¶Ô Hadoop µÄÄÚ²¿½øÐÐÉîÈëµÄµ÷Õû¡£
ÐÔÄܵ÷ÓÅÖ÷ÒªÓÐËĸö·½Ã棺CPU¡¢ÄÚ´æ¡¢´ÅÅÌ I/O ºÍÍøÂç¡£±¾ÎĽéÉÜÓëÕâËĸö·½Ãæ×îÏà¹ØµÄ²ÎÊý£¬Äú¿ÉÒÔʹÓúóÃæ½éÉܵķ½·¨Ñо¿ *-default.xml ÖÐµÄÆäËû²ÎÊý¡£
Óë CPU Ïà¹ØµÄ²ÎÊý£º mapred.tasktracker.map ºÍ reduce.tasks.maximum
¾ö¶¨ÓÉÈÎÎñ¸ú×ÙÆ÷ͬʱÔËÐÐµÄ map/reduce ÈÎÎñµÄ×î´óÊýÁ¿¡£ÕâÁ½¸ö²ÎÊýÓë CPU ÀûÓÃÂÊ×îÏà¹Ø¡£ÕâÁ½¸ö²ÎÊýµÄĬÈÏÖµ¶¼ÊÇ 2¡£¸ù¾Ý¼¯ÈºµÄ¾ßÌåÇé¿öÊʵ±µØÔö¼ÓËüÃǵÄÖµ£¬Õâ»áÌá¸ß CPU ÀûÓÃÂÊ£¬ÓÉ´ËÌá¸ßÐÔÄÜ¡£ÀýÈ磬¼ÙÉ輯ȺÖеÄÿ¸ö½ÚµãÓÐ 4 ¸ö CPU£¬Ö§³Ö²¢·¢¶àỊ̈߳¬Ã¿¸ö CPU ÓÐÁ½¸öºË£»ÄÇÃ´ÊØ»¤½ø³ÌµÄ×ÜÊý²»Ó¦¸Ã³¬¹ý 4x2x2=16 ¸ö¡£¿¼Âǵ½ DN ºÍ TT ÒªÕ¼ÓÃÁ½¸ö£¬map/reduce ÈÎÎñ×î¶à¿ÉÒÔÕ¼Óà 14 ¸ö£¬ËùÒÔÕâÁ½¸ö²ÎÊý×îºÏÊʵÄÖµÊÇ 7¡£
ÔÚ mapred-site.xml ÖÐÉèÖô˲ÎÊý¡£
ÓëÄÚ´æÏà¹ØµÄ²ÎÊý£º mapred.child.java.opts
ÕâÊÇÓÃÓÚ JVM µ÷ÓŵÄÖ÷Òª²ÎÊý¡£Ä¬ÈÏÖµÊÇ -Xmx200m£¬Õâ¸øÃ¿¸ö×ÓÈÎÎñÏ̷߳ÖÅä×î¶à 200 MB ÄÚ´æ¡£Èç¹û×÷ÒµºÜ´ó£¬¿ÉÒÔÔö¼ÓÕâ¸öÖµ£¬µ«ÊÇÓ¦¸ÃÈ·±£Õâ²»»áÔì³É½»»»£¬½»»»»áÑÏÖØ½µµÍÐÔÄÜ¡£
ÎÒÃÇÀ´Ñо¿Ò»ÏÂÕâ¸ö²ÎÊýÈçºÎÓ°Ïì×ÜÄÚ´æÊ¹ÓÃÁ¿¡£¼ÙÉè map/reduce ÈÎÎñµÄ×î´óÊýÁ¿ÉèÖÃΪ 7£¬mapred.child.java.opts ±£³ÖĬÈÏÖµ¡£ÄÇô£¬ÕýÔÚÔËÐеÄÈÎÎñµÄÄڴ濪ÏúΪ 2x7x200 MB =2800 MB¡£Èç¹ûÿ¸ö¹¤×÷Õ߽ڵ㶼ÓÐ DN ºÍ TT ÊØ»¤½ø³Ì£¬Ã¿¸öÊØ»¤½ø³ÌÔÚĬÈÏÇé¿öÏÂÕ¼Óà 1 GB Äڴ棬ÄÇô·ÖÅäµÄ×ÜÄÚ´æ´óԼΪ 4.8 GB¡£
ÔÚ mapred-site.xml ÖÐÉèÖô˲ÎÊý¡£
Óë´ÅÅÌ I/O Ïà¹ØµÄ²ÎÊý£º mapred.compress.map.output¡¢mapred.output.compress ºÍ mapred.map.output.compression.codec
ÕâЩ²ÎÊý¿ØÖÆÊÇ·ñ¶ÔÊä³ö½øÐÐѹËõ£¬ÆäÖÐ mapred.compress.map.output ÓÃÓÚ map Êä³öѹËõ£¬mapred.output.compress ÓÃÓÚ×÷ÒµÊä³öѹËõ£¬mapred.map.output.compression.codec ÓÃÓÚѹËõ´úÂë¡£ÕâЩѡÏîÔÚĬÈÏÇé¿ö϶¼ÊǽûÓõġ£
ÆôÓÃÊä³öѹËõ¿ÉÒÔ¼Ó¿ì´ÅÅÌ£¨±¾µØ/Hadoop Distributed File System (HDFS)£©Ð´²Ù×÷£¬¼õÉÙÊý¾Ý´«ÊäµÄ×Üʱ¼ä£¨ÔÚ shuffle ºÍ HDFS д½×¶Î£©£¬µ«ÊÇÔÚÁíÒ»·½ÃæÑ¹Ëõ/½âѹ¹ý³Ì»áÔö¼Ó¿ªÏú¡£
¸ù¾Ý¸öÈ˾Ñ飬ÆôÓÃѹËõ¶ÔÓÚʹÓÃËæ»ú¼ü/ÖµµÄ²Ù×÷ÐòÁÐÊÇÎÞЧµÄ¡£½¨ÒéÖ»ÔÚ´¦Àí´óÁ¿ÓÐ×éÖ¯µÄÊý¾Ý£¨ÓÈÆäÊÇ×ÔÈ»ÓïÑÔÊý¾Ý£©Ê±ÆôÓÃѹËõ¡£
ÔÚ mapred-site.xml ÖÐÉèÖÃÕâЩ²ÎÊý¡£
io.sort.mb ²ÎÊý
Õâ¸ö²ÎÊýÉèÖÃÓÃÓÚ map ¶ËÅÅÐòµÄ»º³åÇø´óС£¬µ¥Î»ÊÇ MB£¬Ä¬ÈÏÖµÊÇ 100¡£Õâ¸öÖµÔ½´ó£¬Òç³öµ½´ÅÅ̾ÍÔ½ÉÙ£¬Òò´Ë»á¼õÉÙ map ¶ËµÄ I/O ʱ¼ä¡£×¢Ò⣬Ôö¼ÓÕâ¸öÖµ»áµ¼ÖÂÿ¸ö map ÈÎÎñÐèÒªµÄÄÚ´æÔö¼Ó¡£
¸ù¾Ý¸öÈ˾Ñ飬ÔÚ map Êä³öºÜ´ó¶øÇÒ map ¶Ë I/O ºÜƵ·±µÄÇé¿öÏ£¬Ó¦¸Ã³¢ÊÔÔö¼ÓÕâ¸öÖµ¡£
ÔÚ mapred-site.xml ÖÐÉèÖô˲ÎÊý¡£
io.sort.factor ²ÎÊý
Õâ¸ö²ÎÊýÉèÖÃÔÚ map/reduce ÈÎÎñÖÐͬʱºÏ²¢µÄÊäÈëÁ÷£¨Îļþ£©ÊýÁ¿¡£Õâ¸öÖµÔ½´ó£¬Òç³öµ½´ÅÅ̾ÍÔ½ÉÙ£¬Òò´Ë»á¼õÉÙ map/reduce µÄ I/O ʱ¼ä¡£×¢Ò⣬Èç¹û¸øÃ¿¸öÈÎÎñ·ÖÅäµÄÄÚ´æ²»¹»´ó£¬Ôö¼ÓÕâ¸öÖµ¿ÉÄܻᵼÖ¸ü¶àÀ¬»øÊÕ¼¯»î¶¯¡£
¸ù¾Ý¸öÈ˾Ñ飬Èç¹û³öÏÖ´óÁ¿Òç³öµ½´ÅÅÌ£¬¶øÇÒÅÅÐòºÍ shuffle ½×¶ÎµÄ I/O ʱ¼äºÜ¸ß£¬¾ÍÓ¦¸Ã³¢ÊÔÔö¼ÓÕâ¸öÖµ¡£
ÔÚ mapred-site.xml ÖÐÉèÖô˲ÎÊý¡£
mapred.job.reduce.input.buffer.percent ²ÎÊý
Õâ¸ö²ÎÊýÉèÖÃÓÃÓÚÔÚ reduce ½×¶Î±£´æ map Êä³öµÄÄÚ´æµÄ°Ù·Ö±È£¨Ïà¶ÔÓÚ×î´ó¶Ñ´óС£©£¬Ä¬ÈÏÖµÊÇ 0¡£µ± shuffle ½áÊøÊ±£¬ÄÚ´æÖÐÊ£ÓàµÄ map Êä³ö±ØÐëÉÙÓÚÕâ¸öãÐÖµ£¬È»ºó reduce ½×¶Î²ÅÄܹ»¿ªÊ¼¡£Õâ¸öÖµÔ½´ó£¬´ÅÅÌÉϵĺϲ¢¾ÍÔ½ÉÙ£¬Òò´Ë»á¼õÉÙ reduce ½×¶Î±¾µØ´ÅÅÌÉ쵀 I/O ʱ¼ä¡£×¢Ò⣬Èç¹û¸øÃ¿¸öÈÎÎñ·ÖÅäµÄÄÚ´æ²»¹»´ó£¬Ôö¼ÓÕâ¸öÖµ¿ÉÄܻᵼÖ¸ü¶àÀ¬»øÊÕ¼¯»î¶¯¡£
¸ù¾Ý¸öÈ˾Ñ飬Èç¹û map Êä³öºÜ´ó¶øÇÒÔÚ reduce µ½ÅÅÐò½×¶Î±¾µØ´ÅÅÌ I/O ºÜƵ·±£¬Ó¦¸Ã³¢ÊÔÔö¼ÓÕâ¸öÖµ¡£
ÔÚ mapred-site.xml ÖÐÉèÖô˲ÎÊý¡£
mapred.local.dir ºÍ dfs.data.dir ²ÎÊý
ÕâÁ½¸ö²ÎÊý¾ö¶¨°Ñ Hadoop ÖеÄÊý¾Ý·ÅÔÚʲôµØ·½£¬mapred.local.dir ¾ö¶¨´æ´¢ MapReduce ÖмäÊý¾Ý£¨ map Êä³öÊý¾Ý£©µÄλÖã¬dfs.data.dir ¾ö¶¨´æ´¢ HDFS Êý¾ÝµÄλÖá£
¸ù¾Ý¸öÈ˾Ñ飬°ÑÕâЩλÖ÷ÖÉ¢ÔÚÿ¸ö½ÚµãÉϵÄËùÓдÅÅÌÉÏ¿ÉÒÔʵÏÖ´ÅÅÌ I/O ƽºâ£¬Òò´Ë»áÏÔÖø¸Ä½ø´ÅÅÌ I/O ÐÔÄÜ¡£
ÔÚ mapred-site.xml ÖÐÉèÖà mapred.local.dir£¬ÔÚ hdfs-site.xml ÖÐÉèÖà dfs.data.dir¡£
ÓëÍøÂçÏà¹ØµÄ²ÎÊý£º topology.script.file.name
Õâ¸ö²ÎÊýÖ¸ÏòÒ»¸öÓû§¶¨ÒåµÄ½Å±¾£¬Õâ¸ö½Å±¾Åжϻú¼Ü-Ö÷»ú£¨rack-host£©Ó³ÉäÒÔÅäÖûú¼Ü¸ÐÖª¡£ÔÚ core-site.xml ÎļþÖÐÉèÖô˲ÎÊý¡£
»ú¼Ü¸ÐÖªÊǶÔÓÚÌá¸ßÍøÂçÐÔÄÜ×îÖØÒªµÄÅäÖã¬Ç¿ÁÒ½¨Òé°´
http://hadoop.apache.org/common/docs/current/cluster_setup.html#Hadoop+Rack+Awareness ºÍ
http://wiki.apache.org/hadoop/topology_rack_awareness_scripts ÉϵÄ˵Ã÷ÅäÖÃËü¡£
mapred.reduce.parallel.copies ²ÎÊý
Õâ¸ö²ÎÊý¾ö¶¨°Ñ map Êä³ö¸´ÖƵ½ reduce ËùʹÓõÄÏß³ÌÊýÁ¿£¬Ä¬ÈÏÖµÊÇ 5¡£Ôö¼ÓÕâ¸öÖµ¿ÉÒÔÌá¸ßÍøÂç´«ÊäËÙ¶È£¬¼Ó¿ì¸´ÖÆ map Êä³öµÄ¹ý³Ì£¬µ«ÊÇÒ²»áÔö¼Ó CPU ʹÓÃÁ¿¡£
¸ù¾Ý¸öÈ˾Ñ飬Ôö¼ÓÕâ¸öÖµµÄЧ¹û²»Ì«Ã÷ÏÔ£¬½¨ÒéÖ»ÔÚ map Êä³ö·Ç³£´óµÄÇé¿öÏÂÔö¼ÓÕâ¸öÖµ¡£
×¢Ò⣺ÉÏÃæÁгöµÄ²ÎÊýÃû¶¼ÊÇ Hadoop 0.20.x Öеģ»Èç¹ûʹÓà 0.21.0£¬Ãû³Æ¿ÉÄÜÓб仯¡£³ýÁË Hadoop ²ÎÊýÖ®Í⣬»¹ÓÐһЩ»áÓ°Ïì
ÌåÐÔÄܵÄϵͳ²ÎÊý£¬±ÈÈç»ú¼Ü¼ä´ø¿í¡£
ÈçºÎµ÷ÓźÍÌá¸ßÐÔÄÜ
½éÉÜÁËÉÏÃæµÄÔ¤±¸ÖªÊ¶Ö®ºó£¬ÏÖÔÚÌÖÂÛÈçºÎµ÷ÓźÍÌá¸ßÐÔÄÜ¡£¿ÉÒÔ°ÑÕû¸ö¹ý³Ì»®·ÖΪÒÔϲ½Öè¡£
²½Öè 1£ºÑ¡Ôñ²âÊÔ»ù×¼
Õû¸ö Hadoop ¼¯ÈºµÄÐÔÄÜÓÉÁ½¸ö·½Ãæ¾ö¶¨£ºHDFS I/O ÐÔÄÜºÍ MapReduce ÔËÐÐʱÐÔÄÜ¡£Hadoop ±¾ÉíÌṩ¼¸¸ö»ù×¼£¬±ÈÈçÓÃÓÚ HDFS I/O ²âÊ﵀ TestDFSIO ºÍ dfsthroughput£¨°üº¬ÔÚ hadoop-*-test.jar ÖУ©¡¢ÓÃÓÚ×ÜÌåÓ²¼þ²âÊ﵀ Sort£¨°üº¬ÔÚ hadoop-*-examples.jar ÖУ©ºÍ Gridmix£¨ËüÄ£ÄâÍø¸ñ»·¾³ÖеĻìºÏ¹¤×÷¸ºÔØ£¬·ÅÔÚ $HADOOP_HOME/src/benchmarks Ŀ¼ÖУ©¡£¿ÉÒÔ¸ù¾Ý×Ô¼ºµÄ²âÊÔÐèÇóÑ¡ÔñÈκλù×¼¡£
ÔÚËùÓÐÕâЩ»ù×¼ÖУ¬µ±ÊäÈëÊý¾ÝºÜ´óʱ£¬Sort ¿ÉÒÔͬʱ·´Ó³ MapReduce ÔËÐÐʱÐÔÄÜ£¨ÔÚ ¡°Ö´ÐÐÅÅÐò¡± ¹ý³ÌÖУ©ºÍ HDFS I/O ÐÔÄÜ£¨ÔÚ ¡°°ÑÅÅÐò½á¹ûдµ½ HDFS¡± ¹ý³ÌÖУ©¡£ÁíÍ⣬Sort ÊÇ Apache ÍÆ¼öµÄÓ²¼þ»ù×¼¡££¨¿ÉÒÔͨ¹ý Hadoop Wiki ÕÒµ½Ïà¹ØÐÅÏ¢¡££©Òò´Ë£¬±¾ÎÄʹÓà Sort ×÷ΪʾÀý²âÊÔ»ù×¼½²½âÐÔÄܵ÷ÓÅ·½·¨¡£
²½Öè 2£º¹¹½¨»ùÏß
²âÊÔ»·¾³£º
»ù×¼£ºSort
ÊäÈëÊý¾Ý¹æÄ££º500 GB
Hadoop ¼¯Èº¹æÄ££º10 ¸ö DN/TT ½Úµã
ËùÓнڵ㶼ÊÇÏàͬÀàÐ͵Ä
½ÚµãÐÅÏ¢£º
Linux OS
Á½¸ö 4 ºË´¦ÀíÆ÷£¬Ö§³Ö²¢·¢¶àÏß³Ì
32 GB ÄÚ´æ
5 ¸ö 500 GB ´ÅÅÌ
²âÊԽű¾£ºÏÂÃæÊDzâÊÔʹÓõĽű¾£¨¹ØÓÚÔËÐÐ Sort »ù×¼µÄ¸ü¶àÐÅÏ¢²Î¼û Hadoop Wiki£©¡£ËùÓнű¾¶¼Ó¦¸ÃÔÚ JT ½ÚµãÉÏÔËÐС£
×¢Ò⣺°ÑÉÏÃæÌáµ½µÄ start_nmon.sh ½Å±¾ºÍÒÔϽű¾·ÅÔÚ´æ´¢²âÊÔ½á¹ûµÄĿ¼ÖС£
baseline_test.sh

run_sort_baseline.sh

»ùÏß²âÊÔʹÓõIJÎÊýÖµ£º
Hadoop ²ÎÊýÖµ£º
mapred.tasktracker.map.tasks.maximum = 2 £¨Ä¬ÈÏÖµ£©
mapred.tasktracker.reduce.tasks.maximum = 2 £¨Ä¬ÈÏÖµ£©
mapred.reduce.parallel.copies = 5 £¨Ä¬ÈÏÖµ£©
mapred.child.java.opts = -Xmx200m £¨Ä¬ÈÏÖµ£©
mapred.job.reduce.input.buffer.percent = 0 £¨Ä¬ÈÏÖµ£©
io.sort.mb = 100 £¨Ä¬ÈÏÖµ£©
io.sort.factor = 10 £¨Ä¬ÈÏÖµ£©
mapred.local.dir = /hadoop/sdb
dfs.data.dir = /hadoop/sdc, /hadoop/sdd, /hadoop/sde
ϵͳ²ÎÊýÖµ£º
»ú¼Ü¼ä´ø¿í = 1 Gb
»ùÏß²âÊÔ½á¹û£º
Ö´ÐÐʱ¼ä£º10051 Ãë
×ÊԴʹÓÃÁ¿»ã×Ü£º

ÏêϸµÄͼ±í£º
»ñµÃËùÓÐ nmon Êý¾ÝÖ®ºó£¬¿ÉÒÔʹÓà nmonanalyser Éú³Éͼ±í¡£ÒòΪ nmonanalyser ÊÇÒ»¸ö Excel µç×Ó±í¸ñ£¬ËùÒÔÖ»Ðè´ò¿ªËü£¬µ¥»÷ analyse nmon data£¬Ñ¡Ôñ nmon Îļþ¡£È»ºó¾Í¿ÉÒԵõ½¾¹ý·ÖÎöµÄͼ±í¡£
ͼ 1. ʹÓà nmonanalyser ·ÖÎö nmon Êý¾Ý

nmonanalyser ¶ÔÓÚ»ùÏß²âÊÔÉú³ÉµÄÏêϸͼ±íÈçÏ£º
ͼ 2. NameNode ͼ±í

ͼ 3. JobTracker ͼ±í

ͼ 4. DataNode/TaskTracker ͼ±í

²½Öè 3£ºÑ°ÕÒÆ¿¾±
ÐèÒª¸ù¾Ý¼àÊÓÊý¾ÝºÍͼ±í×ÐϸµØÑо¿ÏµÍ³Æ¿¾±¡£ÒòΪÖ÷ÒªµÄ¹¤×÷¸ºÔØ·ÖÅ䏸 DN/TT ½Úµã£¬ËùÒÔÓ¦¸ÃÊ×Ïȹ۲ì DN/TT ½ÚµãµÄ×ÊԴʹÓÃÁ¿£¨ÏÂÃæÖ»¸ø³ö DN/TT ½ÚµãµÄ nmon ͼ±íÒÔ½Úʡƪ·ù£©¡£
ͨ¹ýÑо¿»ùÏß¼àÊÓÊý¾ÝºÍͼ±í£¬¿ÉÒÔ·¢ÏÖϵͳÖÐÓм¸¸öÆ¿¾±£ºÔÚ map ½×¶Î£¬Ã»Óгä·ÖʹÓà CPU£¨´ó¶àÊýʱºò²»µ½ 40%£©£¬¶øÇÒ´ÅÅÌ I/O Ï൱Ƶ·±¡£
²½Öè 4£º´òÆÆÆ¿¾±
Ê×Ïȳ¢ÊÔÌá¸ß map ½×¶ÎµÄ CPU ÀûÓÃÂÊ¡£Ç°Ãæ¶Ô Hadoop ²ÎÊýµÄ˵Ã÷Ö¸³ö£¬ÒªÏëÌá¸ß CPU ÀûÓÃÂÊ£¬ÐèÒªÔö¼Ó mapred.tasktracker.map ºÍ reduce.tasks.maximum ²ÎÊýµÄÖµ¡£
ÔÚ²âÊÔ»·¾³ÖУ¬Ã¿¸ö½ÚµãÓÐÁ½¸öÖ§³Ö²¢·¢¶àÏß³ÌµÄ 4 ºË´¦ÀíÆ÷£¬ËùÒÔÓÐ 16 ¸ö¿ÉÓõÄλÖ㬿ÉÒÔ°ÑÕâÁ½¸ö²ÎÊýÉèÖÃΪ 7¡£
ΪÁËÍê³ÉÕâÒ»Ð޸ģ¬ÐèÒªÔÚ mapred-site.xml ÖÐÉèÖà mapred.tasktracker.map ºÍ reduce.tasks.maximum ²ÎÊý£¬ÖØÐÂÆô¶¯¼¯Èº£¬ÔÙ´ÎÆô¶¯ baseline_test.sh£¨ÒòΪÔÚ mapred-site.xml ÎļþÖнøÐÐÅäÖã¬ËùÒÔÕâÀï²»ÐèÒªÐ޸Ľű¾£©¡£Ð޸ĺóµÄ mapred-site.xml ÈçÏÂËùʾ£º

ÏÂÃæÊǵ÷ÓźóµÄ²âÊÔ½á¹û£º
Ö´ÐÐʱ¼ä£º8599 Ãë
×ÊԴʹÓÃÁ¿»ã×Ü£º
ͼ 5. µ÷ÓźóµÄ DataNode/TaskTracker ͼ±í

²½Öè 5£ºÐÂÒ»ÂÖµ÷ÓÅ£¬Öظ´²½Öè 3 ºÍ 4
Ôö¼Óÿ¸ö TaskTracker ÖÐ map/reduce ÈÎÎñµÄ×î´óÊýÁ¿Ö®ºó£¬¹Û²ì»ñÈ¡µÄÊý¾ÝºÍͼ±í£¬¿ÉÒÔ¿´µ½ÔÚ map ½×¶ÎÒѾ³ä·ÖʹÓà CPU ÁË¡£
ÊÇÓë´Ëͬʱ£¬´ÅÅÌ I/O ƵÂÊÈÔÈ»ºÜ¸ß£¬ËùÒÔÐèÒªÐÂÒ»ÂÖµ÷ÓÅ-¼àÊÓ-·ÖÎö¹ý³Ì¡£
ÐèÒªÖØ¸´ÕâЩ²½Ö裬ֱµ½ÏµÍ³ÖÐûÓÐÆ¿¾±£¬Ã¿ÖÖ×ÊÔ´¶¼³ä·ÖʹÓÃΪֹ¡£
×¢Ò⣬ÿ´Îµ÷ÓŲ»Ò»¶¨»áÌá¸ßÐÔÄÜ¡£Èç¹û³öÏÖÐÔÄÜϽµ£¬ÐèÒª»Ö¸´ÒÔǰµÄÅäÖ㬳¢ÊÔÓÃÆäËûµ÷ÓÅ´ëÊ©´òÆÆÆ¿¾±¡£ÔÚÕâ´Î²âÊÔÖУ¬×îÖÕÈ¡µÃµÄÓÅ»¯½á¹ûÈçÏ£º
Ö´ÐÐʱ¼ä£º5670 Ãë
ϵͳ²ÎÊýÖµ£º»ú¼Ü¼ä´ø¿í = 1Gb
×ÊԴʹÓÃÁ¿»ã×Ü£º
ͼ 6. DataNode/TaskTracker ͼ±í - µÚ¶þÂÖµ÷ÓÅ

²½Öè 6£º¿ÉÉìËõÐÔ²âÊԺ͸Ľø
ΪÁ˽øÒ»²½¼ìÑéµ÷ÓŽá¹û£¬ÐèÒªÔÚʹÓÃÓÅ»¯ºóµÄÅäÖõÄÇé¿öÏÂÔö¼Ó¼¯Èº¹æÄ£ºÍÊäÈëÊý¾Ý¹æÄ££¬´Ó¶ø²âÊÔÅäÖõĿÉÉìËõÐÔ¡£¾ßÌåµØËµ£¬°Ñ¼¯Èº¹æÄ£Ôö¼Óµ½ 30 ¸ö½Úµã£¬°ÑÊäÈëÊý¾Ý¹æÄ£Ôö¼Óµ½ 1.5TB£¬È»ºóÔÙ´ÎÖ´ÐÐÉÏÃæµÄ²âÊÔ¹ý³Ì¡£
ÓÉÓÚÆª·ùÓÐÏÞ£¬ÕâÀï²»ÏêϸÃèÊöµ÷ÓŹý³Ì¡£¼àÊӺͷÖÎö·½·¨ÓëÉÏÃæÌáµ½µÄÍêÈ«Ïàͬ£¬·¢ÏÖµÄÖ÷Ҫƿ¾±³öÏÖÔÚÍøÂçÖС£µ±ÊäÈëÊý¾ÝÔö¼Óµ½ TB Á¿¼¶Ê±£¬»ú¼Ü¼ä´ø¿í±äµÃ²»×ã¡£°Ñ»ú¼Ü¼ä´ø¿íÔö¼Óµ½ 4 Gb£¬10 ½Úµã¼¯ÈºÓÅ»¯ºóµÄËùÓÐÆäËû²ÎÊý±£³Ö²»±ä£¬×îÖÕµÄÖ´ÐÐʱ¼äÊÇ 5916 Ã룬ÕâÏ൱½Ó½ü 10 ½Úµã¼¯ÈºÓÅ»¯ºóµÄ½á¹û£¨5670 Ã룩¡£
½áÊøÓï
ÄúÏÖÔÚÁ˽âÁËÈçºÎ¼àÊÓ Hadoop ¼¯Èº¡¢Ê¹ÓüàÊÓÊý¾Ý·ÖÎöϵͳƿ¾±ºÍÓÅ»¯ÐÔÄÜ¡£Ï£ÍûÕâЩ֪ʶÄܹ»°ïÖúÄú³ä·ÖʹÓà Hadoop ¼¯Èº£¬¸ü¸ßЧµØÍê³É×÷Òµ¡£¿ÉÒÔʹÓñ¾ÎÄÃèÊöµÄ·½·¨½øÒ»²½Ñо¿ Hadoop µÄ¿ÉÅäÖòÎÊý£¬Ñ°ÕÒ²ÎÊýÅäÖÃÓ벻ͬ×÷ÒµÌØÕ÷Ö®¼äµÄ¹ØÁª¡£
ÁíÍ⣬ÕâÖÖ»ùÓÚ²ÎÊýµÄµ÷ÓÅ±È½Ï ¡°¾²Ì¬¡±£¬ÒòΪһÌײÎÊýÅäÖÃÖ»¶ÔÓÚÒ»Àà×÷ÒµÊÇ×îÓŵġ£ÎªÁË»ñµÃ¸ü´óµÄÁé»îÐÔ£¬ÄúÓ¦¸ÃÑо¿ Hadoop µÄµ÷¶ÈËã·¨£¬Ñ°ÕÒÌá¸ß Hadoop ÐÔÄܵÄз½·¨¡£
http://www.uml.org.cn/yunjisuan/201106025.asp