CS322:
(Social and Information) Network Analysis
Autumn 2009
Resources
Datasets
Snap network datasets
Yahoo! Webscope Catalog of datasets
- Note: Jure Leskovec will have to apply for any sets you want, and we must agree not to distribute them further.
There may be a delay, so get requests in early.
Coauthorship and Citation Networks
Internet Topology
Wikipedia
Movie Ratings
Who trusts whom data at Trustlet
Mark Newman's pointers
Software Tools
a C++ libary for working with massive network datsets (Windows, Linux, Mac).
a program for large network analysis (Windows or Linux via Wine).
an exploratory data analysis and visualization tool for graphs and networks.
software framework for information visualization (Linux, MacOSX, Windows).
software for social network analysis (Windows).
a graph visualization software
a python package for the study of the structure of complex networks.
a large-scale network analysis, modeling and visualization toolkit
tools for fitting heavy-tailed distributions to data
Websites
Some websites that may be interesting to do analysis on:
Similar Courses
from: http://snap.stanford.edu/na09/resources.html
==========
DBLP 数据集:
Data characteristics:
- Over 1,200,000 objects
- Over 2,480,000 links
- 12 object attributes
- 6 link attributes
Additional information:
|
Citation links omitted; click to enlarge |
The PROXIMITY DBLP database presents information on computer science publications listed in the DBLP Computer Science Bibliography. The data in this dataset were derived from a snapshot of the bibliography as of April 12, 2006. The PROXIMITY DBLP dataset maps each entry in the original DBLP data to one of six types of objects representing different types of publications. It includes links from publications to their authors and editors and from papers to the journal, proceedings, or book in which they appear, as well as citation links from one publication to another.
See the README for additional information on the DBLP database.
Acknowledgments:
Please include the following acknowledgment in all publications that describe work using this database:
The PROXIMITY DBLP database is based on data from the DBLP Computer Science Bibliography with additional preparation performed by the Knowledge Discovery Laboratory, University of Massachusetts Amherst.
===============
http://kdl.cs.umass.edu/data/msn/msn-info.html
类似大规模社交网络用数据集,也可以在这个网址左侧可以找到
Databases
HEP-Th
Can-o-sleep
Mobile Social Networks
DBLP
===============
社会计算,图挖掘方向的一些数据集。
1.snap.stanford.edu/na09/resources.html 这个网站给出了非常多的 有用的数据集包括:dblp data, kdd data,imdb database ,邮件网络,博客网络,等等。此外还给出了一些实用的工具进行网络分析,数据呈现等。
2。citeseerx.ist.psu.edu/about/metadata 此地址给出了citeseer 数据的下载方式,citeseer数据包括合作者,引文等信息。关于citeseer的下载办法,参见本博客的另一篇文章citeseer data的下载方法。
3。Cora dataset 的下载地址www.cs.umass.edu/~mccallum/code-data.html 关于更详细的数据介绍请看hi.baidu.com/zhudaohui/blog/item/4e6f86fdc4df791e08244d12.html
4。dblp 数据下载地址dblp.uni-trier.de/xml/ dblp 数据量较大,数据包括 合作者,日期,但是一般不包引文