Data Downloading

                

 

กก


Data Set for Evolutionary Theme Pattern Discovery:

The Asia Tsunami Data Set is collected and preprocessed by Hang Su and me. This data set contains 7468 news articles from 10 news sources (from Dec. 04 to Feb. 05). We only post the data in Lemur format because of the size of the data. The Lemur format organizes articles as follows:

<DOC docid1> 
Contents
...
</DOC>
<DOC docid2>
...
</DOC>
...

You can download the .gz file here.

The KDD Data Set is collected and preprocessed by myself. This data set contains 496 papers from 6 years' KDD proceedings (99~05). They are in free text format, which are transformed from .pdf files with pdf2text.

You can download the .gz file here.

Reference:

Qiaozhu Mei, ChengXiang Zhai, Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining, SIGKDD 2005

For any problems or questions about the usage of the data, please contact qmei2 AT uiuc DOT edu. 


กก
Back to Homepage