The Phoenix System for MapReduce Programming
http://mapreduce.stanford.edu/
这里有一些数据集:
Input Datasets
- Full input datasets for the sample applications: we provide small, medium, and large datasets for each application.
histogram (~512 MB)
MD5: da6e1853d22100b29590c0bb307b0251
linear regression (~212 MB)
MD5: d9ef0440ddb8b425bb9d6c2b89e62ee0
string match (~212 MB)
MD5: 32eb9fdc722e395a02add2b7cde6666d
reverse index (~154 MB)
MD5: 964568f6fca53aa4ae82539d798cd705
word count (~59 MB)
MD5: 903969c78d2dbd44357fdf7cbe750bc7
原来论文是多核mapreduce的,另外这个mapreduce的工作组也不错。
http://graal.ens-lyon.fr/mapreduce/