CS410 Text Information Systems (Spring 2013)

Instructor: ChengXiang Zhai

| Home | Basic Information | Schedule |
| Readings | Assignments | Project | Resources |




Basic Information

Administrative

Textbooks and readings

Prerequisites

Students should come with good programming skills. CS225 or CS400 or an equivalent course is required. Knowledge of basic probability and statistics is a plus. If you are not sure whether you have the right background, please contact the instructor.

Format

The course is lecture-based with a midterm examination. There are individual and group assignments, which often involve using a retrieval toolkit to implement an algorithm and/or experiment with real text data.

Course Policy and Grading



  1. Assignments
  2. The assignments are designed to ensure that every student has a deep and precise understanding of the major algorithms and gains handson experience with using a retrieval toolkit, thus the students are generally required to complete them independently unless it is a group assignment. Discussion with others is allowed to the extent of helping understand the material. The course newsgroup may be a good place for discussions. The purpose of student collaboration is to facilitate learning, not to circumvent it. The actual solution must be done by each student alone, and the student should be ready to reproduce their solution upon request. If any substantial discussion happens, every one involved must write down the names of the people that he/she has discussed with and the nature or topic of discussion. In any case, you must exercise academic integrity. See the University Policy on Academic Integrity, especially the section on plagiarism.

    Late submission of an assignment would result in a reduced grade for the assignment, unless an extension has been granted by the instructor. An assignment is worth full credit at the beginning of class on the due date (later if an extension has been granted). It is worth at most 90% credit for the next 24 hours. It is worth at most 50% credit for the following 24 hours. It is worth 25% credit after that. Assignments will not be accepted if they are later than a week after the due date, which means that if your assignment is turned in later than seven days after the due date, it would not be graded and you would receive zero credit for the assignment. If you need an extension, please ask for it by sending email to the instructor as soon as the need for it is known. Extensions that are requested promptly will be granted more liberally.



  3. Clarifications about homework collaboration and homeword question answering
  4. There is a difference between asking questions about the course content and asking questions about homework:

  5. Midterm examination
  6. There will be a midterm exam later in the second half of the semester. The purpose is to ensure that students have a good understanding of all the basic concepts and techniques. It will be a closed book exam in the classroom lasting for 75 minutes for the students on campus; the examination for online students will be managed by the department based on our standard procedure for taking an exam remotely for online education programs.
  7. The course project
  8. The purpose of the course project is twofold: (1) to give the students opportunities to apply what has been learned from the course to solve some real world text information management and analysis problems; (2)to allow the students to explore ideas and techniques for text information management and analysis by working on a real problem. Team work is allowed and encouraged. There will be a number of "instructor-designed" project topics available for you to choose, but you are also very welcome, indeed encouraged, to come up with any interesting topic on your own. You will be asked to do a short presentation of your course project and submit a 4-6 pages written project report at the end of the semester. See Project Page for details.

  9. The extra "1 hour" literature review
  10. Every student who takes the course for 4 credit hours is required to finish a literature review on a topic in the scope of the course. The topic will be selected by the student with approval of the instructor. Often the selected topic would be related to the course project that the student is involved in, but it does not have to. Since the project proposal is due in the middle of the semester, you should plan to finalize the topic for your literature review around that time. In the case of multiple students working on the same project as a team, they can each choose to focus on a different sub-topic or a topic not related to the course project to finish a separate review. They can also work together as a group to finish a much more comprehensive review. Such a group literature review must include a clear statement about the work division so as to show that every one has indeed contributed significantly to the combined review. The length of such a group review is also expected to be much longer than an individual review, though there is no strict requirement of the length. In particular, if k people work together, you are not required to write a report of k times as long as an individual review would be. Indeed, your report is expected to be shorter than that due to removed redundancy. However, your report should show "sufficient" work of each of you, where "sufficient" is defined as "reading at least 6 papers" and "writing at least 3~4 pages" by each person. Please list specific names when you post the topic of your group survey.

    The goal of your literature review is to synthesize a set of papers about a topic. Note that this is different from a simple list of paragraphs covering each paper; instead, you should try to organize the papers you've read into a structure and connect them so that you can discuss their similarity and differences. Your review should help a reader to see a clear overall picture of the papers that you reviewed. It is unnecessary to cover many details of any paper as a reader can and will read the original paper if he or she is interested. That is, your review is mostly to provide an entry point to the relevant literature. Picture it as the first reading that someone would look at in order to learn about research work on the topic and try to write your review for such a reader.

    Decide whether you would like to do a broad shallow survey or a narrow deep survey. A broad shallow survey can cover many papers (e.g., more than 10 or even 20 papers), but only briefly mention what is in each paper. A narrow one can cover just 6~10 papers, but with more detailed description of each paper. Either strategy is acceptable. In the first case, you can read broadly, but you don't need to read each paper in detail; in the second, you will read fewer papers, but you will also need to understand each paper in more detail. You may make this decision based on which strategy would help you most in finishing your project or based on your own preferences.

    Select a set of "seed" papers first. You can find them by doing general literature search on the Web and focus on papers that have high citations. Then you can check which papers have cited them. Focus on newer papers on your topic. It's better to read the newest papers and go backward to read relatively older papers. This way your survey will be reflecting the most recent progress, making it more useful. If you find any existing survey or review of the topic or a related topic, read it and try to build on top of it, rather than repeat what has already been surveyed.

    Start reading papers as soon as possible. It takes sometime to read a paper especially if a paper is complicated. So you should act early to ensure you will be able to read at least 6 papers in detail or 10 papers in a shallow way by the end of the semester.

    If you aren't sure about which papers to focus on, please email the instructor for a discussion.

    Check out this presentation for the typical structure of a literature survey paper. Your literature review is due on Thursday, May 9, 2013, 11:59pm. Submit your literature review by posting it at this literature review wiki page (there are instructions there on how to post it).

  11. Grading
  12. Grading will be based on the following weighting scheme:

    For students taking the course for 4 credit hours, this weighting scheme is only applied to 75 points out of the 100 points. The remaining 25 points are based on the literature review, which will be graded as "pass or fail", contributing either 25 or 0 points to your final grade.