CS491/CS591-CXZ Text Data Mining Seminar (Fall 2004)
Instructor: ChengXiang Zhai
| Home | Basic Information | Schedule | Teams|Resources |
Resources
Reading List
- General Text Mining Algorithms
- Clustering
- [Steyvers et al. 04] Probabilistic Author-Topic Models for Information Discovery Authors: Mark Steyvers, Padhraic Smyth, Michal Rosen-Zvi, Thomas Griffiths, SIGKDD04 ( download)
- [Sarawagi et al. 03] Cross-Training: Learning Probabilistic Mappings Between Topics
Authors: Sunita Sarawagi, Soumen Chakrabarti, Shantanu Godbole, SIGKDD03 ( download)
- [Dhillon et al. 03] Information-Theoretic Co-clustering
Authors: Inderjit Dhillon, Subramanyam Mallela, Dharmendra Modha, SIGKDD03 ( download)
- [Popescul & Ungar 04] Cluster-based Concept Invention for Statistical Relational Learning Authors: Alexandrin Popescul, Lyle Ungar, SIGKDD04 (download)
- Text stream analysis
- [Kleinberg 02] Bursty and hierarchical structure in streams, Jon Kleinberg, SIGKDD02 ( download)
- [Zhu 03] Efficient Elastic Burst Detection in Data streams, Y. Zhu, SIGKDD03. ( download)
- Exploiting external (non-textual) resources
- [Agichtein & Ganti 04] Mining Reference Tables for Automatic Text Segmentation Authors: Eugene Agichtein, Venkatesh Ganti SIGKDD04
( download)
- [Cohen & Sarawagi 04] Exploiting Dictionaries in Named Entity Extraction: Combining SemiMarkov Extraction Processes and Data Integration Methods Authors: William Cohen, Sunita Sarawagi, SIGKDD04 (download)
- Web Mining
- Summarization
- [Kummamuru et al. 04] A Hierarchical Monothetic Document Clustering Algorithm for Summarization and Browsing Search Results (page 658)K. Kummamuru, R. Lotlikar, S. Roy, IBM India Research LabK. Singal, IIT-GuwahatiR. Krishnapuram, IBM India Research Lab WWW2004 (download)
- [Hu & Liu 04] Mining and Summarizing Customer Reviews Authors: Minqing Hu, Bing Liu , SIGKDD04(download)
- [Nadamoto & Tanaka 03] A Comparative Web Browser (CWB) for Browsing and
Comparing Web Pages, Akiyo Nadamoto, Katsumi Tanaka, WWW03 ( download)
- [Dave et al. 03] Mining the peanut gallery: Opinion extraction and
semantic classification of product reviews,
Kushal Dave, Steve Lawrence, David M. Pennock, WWW03 ( download)
- Topic/Theme mining
- [Agrawal et al. 03] Mining Newsgroups Using Networks Arising From
Social Behavior, Rakesh Agrawal, Sridhar Rajagopalan,
Ramakrishnan Srikant, Yirong Xu, WWW03 ( download)
- [Dumais & Horvitz 04] Newsjunkie: Providing Personalized Newsfeeds via Analysis of Information Novelty (page 482)E. Gabrilovich, Technion, Microsoft ResearchS. Dumais, E. Horvitz, Microsoft Research, WWW 2004 ( download)
- [Huang et al. 04] LiveClassifier: Creating Hierarchical Text Classifiers through Web Corpora (page 184)C.-C. Huang, S.-L. Chuang, Academia SinicaL.-F. Chien, Academia Sinica, National Taiwan University, WWW 2004 (download)
- Lexical Relationship Mining
- [Liu et al. 03] Mining Topic-Specific Concepts and Definitions on the
Web, Bing Liu, Chee Wee Chin, Hwee Tou Ng, WWW03
( download)
- [Cui et al. 04] Unsupervised Learning of Soft Patterns for Generating Definitions from Online News (page 90)H. Cui, M.-Y. Kan, T.-S. Chua, National University of Singapore , WWW2004 (download)
- [Cimiano et al. 04] Towards the Self-Annotating Web (page 462)P. Cimiano, S. Handschuh, University of Karlsruhe, S. Staab, University of Karlsruhe, Ontoprise GmbH, WWW 2004 ( download)
- [Etzioni et al. 04] Web-Scale Information Extraction in KnowItAll (Preliminary Results) (page 100)O. Etzioni, M. Cafarella, D. Downey, S. Kok, A.-M. Popescu, T. Shaked, S. Soderland, D. S. Weld, A. Yates, University of Washington, WWW 2004 (download)
- Biology Literature Mining
- Biology entity extraction
- [Rost 04] Protein Names Precisely Peeled Off Free Text, Sven Mika - Columbia University, Burkhard Rost - CUBIC/C2B2/NESG, Dept Biochemistry and Molecular Biophysics, Columbia University 2004 ISMB
(pdf)
- [Tuason et al. 04] Biological Nomenclatures: A Source of Lexical Knowledge and Ambiguity , O. Tuason, L. Chen, H. Liu, J.A Blake, and C. Friedman; Pacific Symposium on Biocomputing 9:238-249(2004) (pdf )
- [Hanisch et al. 03]
Playing Biology's Name Game: Identifying Protein Names in Scientific Text , D. Hanisch, J. Fluck, HT. Mevissen, R. Zimmer; Pacific Symposium on Biocomputing 8:403-414(2003). (pdf)
- [Yu & Agichtein 03] Extracting Synonymous Gene and Protein Terms from Biological Literature, Hong Yu and Eugene Agichtein ISMB 2003 (pdf)
- [Liu & Friedman 03] Mining Terminological Knowledge in Large Biomedical Corpora , H. Liu, C. Friedman; Pacific Symposium on Biocomputing 8:415-426(2003). (pdf)
- [Narayanaswamy et al. 03] A Biological Named Entity Recognizer , M. Narayanaswamy, K. E. Ravikumar, K. Vi jay-Shanker; Pacific Symposium on Biocomputing 8: ( MS word file )
- Beyond entities
- [Schwartz & Hearst 03] A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text , A.S. Schwartz, M.A. Hearst; Pacific Symposium on Biocomputing 8:451-462(2003). ( pdf )
- [Yeh et al. 03] Evaluation of Text Data mining for Database Curation: LessonsLearned from the KDD Challenge Cup, Alexander Yeh, Lynette Hirschman, Alexander Morgan ISMB 2003 (pdf)
- [Srinivasan & Libbus 04] Mining MEDLINE for Implicit Links between Dietary Substances and Diseases, Padmini Srinivasan - University of Iowa, Bisharah Libbus - National Library of Medicine ISMB 2004 ( pdf)
Useful Web Sites
A roadmap to text mining and web mining