载入中。。。 'S bLog
 
载入中。。。
 
载入中。。。
载入中。。。
载入中。。。
载入中。。。
载入中。。。
 
填写您的邮件地址,订阅我们的精彩内容:


 
谷歌工程师讲解云计算视频
[ 2011/2/13 13:47:00 | By: 梦翔儿 ]
 

谷歌工程师讲解云计算视频:http://v.youku.com/v_show/id_XMTEwMDEwOTU2.html

叶平是 Google 的软件工程师。他于 1996 年在台湾大学物理系取得博士学位,主修实验粒子物理学。于 1998 年起在中研院担任助研究员,2001 年换至台湾大学高能物理实验室任客座及兼任助理教授。自 1993 年起,他参与过美国费米加速器实验室、美国太空总署、欧洲粒子物理中心、日本高能加速器研究机构的粒子物理实验或天文粒子观测,之后于 2007 年夏天加入 Google。叶平目前是 Google 台湾繁体中文搜寻品质工作及大学计划和台湾云计算计划的负责人。

转自:http://www.programmer.com.cn/515/

========

Evolution of Computing:

  • network computing; -- Network is Computer.(client/server); Seperation of Funtionalities
  • cluster computing; -- Tightly coupled compputing resources: CPU, Storage, Data (usually within a LAN)
  • grid computing; -- Resource sharing across domains; Decentralized; Open standards
  • utility computing. -- Don't buy computers, lease computing powering; Ownership model


Next step: Cloud computing

Definition: services and data, put in the cloud, accessible, scalability

Powers of Cloud Computing:

  1. commodity hardware;
  2. infrastucture software:
  • GFS;
  • BigTable;
  • MapReduce

Web Application:

  • Serving;
  • Database;
  • Storage;
  • Data Processing

Google's Solution:

  • Goolge AppEngine;
  • BigTable;
  • GFS;
  • MapReduce

Implimentation:

  • Hadoop -- GFS and MapReduce
  • HBase -- BigTable
  • HDFS -- GFS
  • ...

GFS:

  • Files are broken into chunks;
  • Chunks triplicated across three different servers;
  • Metadata managed by Master; (chunkId, path, serverId, etc)
  • Data transfers happen directly between clients and chunk servers

GFS used in Google:

  • 200+ clusters;
  • File system clusters of up to 5000+ machine;
  • Pools of 10000+ clients;
  • Read is Faster than Write;

BigTable:

  • 3-dimension: row, column, timestamp;
  • Distributed;
  • Scalable;
  • Self-Managing -- add/remove servers; load balance;
  • System structure: -- BigTable Cell (ie, cluster)
    • BigTable client library
    • Master; -- metadata operations; load balance
    • Tablet Server; -- serves data
    • Cluster scheduling system -- handle failover, monitoring;
    • GFS -- holds tablet data, logs;
    • Lock service -- hold metadata, master election
  • Currently -500 cells


Distributed Data Processing:


1. Technical Issues:
  • File Management: where to store files? Distributed; -- Master
  • Granularity: Splitting;
  • Job Allocation: assign which task to which node? prefer local job;
  • Fault-recovery: what if a node crashes?Redundancy, Crash-detection, Job re-allocation;
2. MapReduce:
  • Map: (in_key, in_value) --> ( (keyj, value) | j = 1, 2, ...)
  • Reduce: (key, (value1, value2, ... )) --> (key, f_value)
3. Configuration:
  • 200,000 mappers;
  • 500 reducers;
  • 2000 nodes

Data Playground:

MapReduce + BigTable + GFS

Summary:
Cloud Computing is about  scalable web applications and data processing.

Reference:
http://v.youku.com/v_show/id_XMTEwMDEwOTU2.html

 
 
发表评论:
载入中。。。

 
 
 

梦翔儿网站 梦飞翔的地方 http://www.dreamflier.net
中华人民共和国信息产业部TCP/IP系统 备案序号:辽ICP备09000550号

Powered by Oblog.