CS410 Text Information Systems (Spring 2008)

Instructor: ChengXiang Zhai

| Home | Basic Information | Schedule |
| Readings | Assignments | Project | Resources |



Announcements:



As the amount of online textual information (e.g., web pages, email, news articles, office documents, and scientific literature) grows explosively, it is increasingly important to develop tools to help us manage and exploit the huge amount of information. Web search engines, such as Google, Yahoo!, and MSN, are good examples of such tools, and they are now an essential part of everyone's life. In this course, you will learn the underlying technologies of these and other powerful tools for managing text information. You will be able to learn the basic principles and algorithms for managing text information as well as obtain handson project experience with extending existing tools or developing completely new tools.

Unlike structured information, which is typically managed with a relational database, textual information is unstructured and poses special challenges due to the difficulty in precisely understanding natural language and users' information needs. In this course, we will introduce a variety of techniques for accessing and mining text information. The course emphasizes basic principles and pratically useful algorithms. Topics to be covered include text analysis, retrieval models (e.g., Boolean, vector space, and probabilistic), text categorization, text filtering, clustering, retrieval system design and implementation, and applications to web information management.

The course is lecture-based. Grading is based on regular assignments, a midterm examination, and a final course project. The assignments generally involve implementing and experimenting with an algorithm with real text data. The final course project is to give the students handson experience on developing some novel text information management tools. Group work is encouraged.