CS410 Text Information Systems (Spring 2008)
Instructor: ChengXiang Zhai
| Home | Basic Information | Schedule |
| Readings | Assignments | Project | Resources |
Announcements:
Project presentation schedule is now generated. Please click here. Also read
presentation guidelines.
-
The survey form for the Facebook extra-credit assignment is available here.
Please print it out, finish the form, and turn it in by 2pm, April 16, Wednesday.
-
Assignment #5 is available here. It's due April 18, Fri.
Solutions to Midterm is here. Here are
grading criteria and grade
distributions.
- Assignment #4 is available here. It's due April 9, Wed.
- Please check out this note for coverage of NDCG and beta-gamma thresholding.
-
An extra credit assignment about testing a Facebook application is available here (Microsoft Word,ppt)
-
A sample midterm from last year and its solutions are available ( sample midterm; solutions). These questions are meant to give you
some idea about what the questions might look like. The coverage of topics in this sample midterm
may be different from that in your midterm exam.
-
There was a typo in the M-step formula on slides 12-13. This has now been corrected.
Moreover, a pdf version of this presentation is also available for convenience to those
Linux users who can't easily view ppt files.
-
The project proposal is due March 26.
-
A review list for midterm exam is available here.
- Assignment #3 is available here. It's due March 12, Wed.
-
Solutions to Homework #1 are available here.
-
Assignment #2 is available here. It's due Feb 27, Wed.
-
The first assignment is available here. It's due Feb 13, Wed.
-
The class will meet on Wednesdays and Fridays, 2-3:15pm in Siebel Center, room 1105.
The first class will be on Jan 16, Wednesday.
- Learn how Web search engines work and skills for developing better search engines yourself.
- Get handson project experience with developing real-world applications, such as intelligent
software tools for personalized search, organization website enhancement, customer service
email management and mining, or scientific literature summarization and mining.
- Open the door to the increasing number of job positions in Search Technology companies such as
Google, Yahoo!, and Microsoft.
- ...
As the amount of online textual information (e.g., web pages, email,
news articles, office documents, and scientific literature)
grows explosively, it is increasingly important to develop tools
to help us manage and exploit the huge amount of information. Web search engines,
such as Google, Yahoo!, and MSN, are good examples of such tools, and they
are now an essential part of everyone's life.
In this course, you will learn the underlying technologies of these
and other powerful tools for managing text information. You will be able
to learn the basic principles and algorithms for managing text information as well as
obtain handson project experience with extending existing tools or
developing completely new tools.
Unlike structured information, which is typically managed with a relational database, textual information is unstructured and poses special challenges due to the difficulty in precisely understanding natural language and users' information needs. In this course, we will introduce a variety of techniques for accessing and mining text information.
The course emphasizes basic principles and pratically useful algorithms.
Topics to be covered include text analysis, retrieval models (e.g., Boolean, vector space, and probabilistic), text categorization, text filtering, clustering, retrieval system design and implementation, and applications to web information management.
The course is lecture-based. Grading is based on regular assignments, a midterm examination, and a final
course project. The assignments generally involve implementing and experimenting with an algorithm with real text data. The final course project is to give the students handson experience on developing
some novel text information management tools. Group work is encouraged.