CS410 Text Information Systems (Spring 2015)

Instructor: ChengXiang Zhai

| Home | Basic Information | Schedule |
| Readings | Assignments | Project | Resources |

Basic Information


Textbooks and readings


Students should come with good programming skills. CS225 or CS400 or an equivalent course is required. Knowledge of basic probability and statistics is a plus. If you are not sure whether you have the right background, please contact the instructor.


The course is lecture-based with a midterm examination. There are individual and group assignments, which often involve using a retrieval toolkit to implement an algorithm and/or experiment with real text data.

Course Policy and Grading

  1. Attendance
  2. Attendance is mandatory, but use common sense if you are sick or run into any emergency situation. In case you cannot go to a class, you must send (or ask some one to send) an explanation message to the instructor no later than 24 hours after the class. For example, if you cannot go to a class on Wednesday, you need to send a message before 2:00pm the next day (i.e., Thursday). Note that attending the lectures is often the only chance for you to learn certain materials as you may not find them in any textbook or other readings.

  3. Assignments
  4. The assignments are designed to ensure that every student has a deep and precise understanding of the major algorithms and gains handson experience with using a retrieval toolkit, thus the students are generally required to complete them independently unless it is a group assignment. Assignments may be of two flavors of a mixture of them: 1) short written problem sets to test your understanding of materials; 2) experimentation and machine problems to provide an opportunity to work on a toolkit or experiment with algorithms.

    Discussion with others is allowed to the extent of helping understand the material. The course newsgroup may be a good place for discussions. The purpose of student collaboration is to facilitate learning, not to circumvent it. The actual solution must be done by each student alone, and the student should be ready to reproduce their solution upon request. If any substantial discussion happens, every one involved must write down the names of the people that he/she has discussed with and the nature or topic of discussion. In any case, you must exercise academic integrity. See the University Policy on Academic Integrity, especially the section on plagiarism.

    Late submission of an assignment would result in a reduced grade for the assignment, unless an extension has been granted by the instructor. An assignment is worth full credit at the beginning of class on the due date (later if an extension has been granted). It is worth at most 90% credit for the next 24 hours. It is worth at most 75% credit for the following 24 hours. It is worth 50% credit after that. Unless in exceptional cases, assignments will generally not be accepted if they are two weeks later after the due date, which means that if your assignment is turned in 14 days later than the due date, it would not be graded and you would receive zero credit for the assignment. If you need an extension, please ask for it by sending email to the instructor as soon as the need for it is known. Extensions that are requested promptly will be granted more liberally.

  5. Clarifications about homework collaboration and homeword question answering
  6. There is a difference between asking questions about the course content and asking questions about homework:

  7. Midterm examination
  8. There will be a midterm exam later in the second half of the semester. The purpose is to ensure that students have a good understanding of all the basic concepts and techniques. It will be a closed book exam in the classroom lasting for 75 minutes for the students on campus; the examination for online students will be managed by the department based on our standard procedure for taking an exam remotely for online education programs.
  9. In-class short quizzes
  10. Multiple in-class short quizzes may be given at the end of some randomly chosen classes. The quiz questions should be very easy to answer if you have really paid attention to what has been discussed in all the lectures. Besides being one factor contributing to grading, quizzes also serve as a way to keep track of class attendance.
  11. The course project
  12. The purpose of the course project is twofold: (1) to give the students opportunities to apply what has been learned from the course to solve some real world text information management and analysis problems; (2)to allow the students to explore ideas and techniques for text information management and analysis by working on a real problem. Team work is allowed and encouraged. There will be a number of "instructor-designed" project topics available for you to choose, but you are also very welcome, indeed encouraged, to come up with any interesting topic on your own. You will be asked to do a short presentation of your course project and submit a 4-6 pages written project report at the end of the semester. See Project Page for details.

  13. The extra "1 hour" literature review
  14. Every student who takes the course for 4 credit hours is required to finish a literature review on a topic in the scope of the course. The topic will be selected by the student with approval of the instructor. Often the selected topic would be related to the course project that the student is involved in, but it does not have to. It can also be a topic covered in the lectures. You must decide a topic for the literature review by March 19, 2015 (the latest), and finish the literature review by April 14, 2015.

    In the case of multiple students working on the same project as a team, they can each choose to focus on a different sub-topic or a topic not related to the course project to finish a separate review. They can also work together as a group to finish a much more comprehensive review. However, the literature does not have to be tied with the course project. That is, you can work with a different group of people to complete a literature review on topic X, while working with another group to complete a course project on topic Y.

    A group literature review must include a clear statement about the work division so as to show that every one has indeed contributed significantly to the combined review. The length of such a group review is also expected to be much longer than an individual review, though there is no strict requirement of the length. In particular, if k people work together, you are not required to write a report of k times as long as an individual review would be. Indeed, your report is expected to be shorter than that due to removed redundancy. However, your report should show "sufficient" work of each of you, where "sufficient" is defined as "reading at least 6 papers" and "writing at least 3~4 pages" by each person. Please list specific names when you post the topic of your group survey.

    The goal of your literature review is to synthesize a set of papers about a topic. Note that this is different from a simple list of paragraphs covering each paper; instead, you should try to organize the papers you've read into a structure and connect them so that you can discuss their similarity and differences. Your review should help a reader to see a clear overall picture of the papers that you reviewed. It is unnecessary to cover many details of any paper as a reader can and will read the original paper if he or she is interested. That is, your review is mostly to provide an entry point to the relevant literature. Picture it as the first reading that someone would look at in order to learn about research work on the topic and try to write your review for such a reader.

    Decide whether you would like to do a broad shallow survey or a narrow deep survey. A broad shallow survey can cover many papers (e.g., more than 10 or even 20 papers), but only briefly mention what is in each paper. A narrow one can cover just 6~10 papers, but with more detailed description of each paper. Either strategy is acceptable. In the first case, you can read broadly, but you don't need to read each paper in detail; in the second, you will read fewer papers, but you will also need to understand each paper in more detail. You may make this decision based on which strategy would help you most in finishing your project or based on your own preferences.

    Select a set of "seed" papers first. You can find them by doing general literature search on the Web and focus on papers that have high citations. Then you can check which papers have cited them. Focus on newer papers on your topic. It's better to read the newest papers and go backward to read relatively older papers. This way your survey will be reflecting the most recent progress, making it more useful. If you find any existing survey or review of the topic or a related topic, read it and try to build on top of it, rather than repeat what has already been surveyed.

    Start reading papers as soon as possible. It takes sometime to read a paper especially if a paper is complicated. So you should act early to ensure you will be able to read at least 6 papers in detail or 10 papers in a shallow way by the end of the semester.

    If you aren't sure about which papers to focus on, please email the instructor for a discussion.

    Check out this presentation for the typical structure of a literature survey paper.

    As another option, your literature review may also be on a topic covered in our lectures. In such a case, your literature review will be more like a short tutorial introduction to a topic, or a relatively self-contained review of a topic that we covered. For example, you may write a short introduction to the Vector Space Retrieval Model. You may use the lecture slides as the basis, but you should find and add some additional readings to enrich the content. For example, you may find a few resources where a reader can find more detailed explanations of some concepts or methods. The length is flexible as long as you have at least covered thoroughly the slides of the lecture(s) on that topic. Such a literature review may be very useful to you (as well as your peers) for preparing for the midterm exam. Your literature review is due on Tuesday, April 14, 2015, 11:59pm. Submit your literature review by posting it at this literature review wiki page (there will be instructions there on how to post it). -->

  15. Grading
  16. Grading will be based on the following weighting scheme:

    For students taking the course for 4 credit hours, this weighting scheme is only applied to 75 points out of the 100 points. The remaining 25 points are based on the literature review, which will be graded as "pass or fail", contributing either 25 or 0 points to your final grade.