Information Retrieval

CS 6501 Fall 2014

HW2—Simulated User Study:Finding Health-related Information in Online Medical Forums

Posted: October 17th, 2014

This assignment simulates the process in which commercial search engines record users' search behaviors for search engine optimization and personalization.

In this assignment, you are asked to use this system to search, browse and judge results' relevance quality. With the provided system, you are supposed to behave similarly as what you usually do in Google search.

To perform this task, please first take a look at the topics and corresponding discussions we have crawled in MP1, and design some information needs that are mostly interesting and meaningful to you.

We have created a search engine based on the online medical forum discussions you have crawled in MP1. The system can be found here. The login username is your computing ID and we have created a password for everyone of you. Please check the comments on your collab assignment page of HW2 to find the password.

Screen Shot of the Forum Search System
Figure 1. Screen Shot of Annotation System

Figure 1 illustrates the interface and basic functions supported in the system, which is very similar to an ordinary search engine system. After submitting the query, you just follow what you usually do in a web search engine, e.g., examine the top ranked documents, click the results, go to the next page or reformulate the query if necessary. For each query, the system will return top 100 ranked documents, but you do not necessarily need to examine all of them. For each returned document, the system will present a text snippet highlighting the matched query terms. By clicking the "read more" link, you will find the full text content of that discussion post. After examining the returned documents (either the short snippet or full post content), please provide your judgment through the dropdown menu. Once you clicked the "Confirm" button, the annotated will be recorded and you can no longer change it.

For example, the information need I have is "What are the major symptoms of depression in early stage?" To find related information from online discussions, I would first formalize a query "symptoms of depression" and by examining the returned results, I get some useful information, like "insomnia or early waking". I will label the document containing that information as "Excellent" or "Good" and the others that I have examined but do not match this query as "Bad". Then I would reformulate multiple queries to further explore my information need, e.g., "depression in early stage," and "early signs of depression".

For each of your query, please carefully study the returned documents (the more the better), and judge those you have examined into 5 grades:

4 - Perfect               
3 - Excellent            
2 - Good          
1 - Fair            
0 - Bad           

For each information need, you will stop further exploration by either your information need has been fully satisfied; or you get frustrated and decide to give up further attempt. If it is the former case, please list the answer, e.g., text segments, you have found; otherwise, try to briefly explain why you are frustrated, e.g., no relevant result has been returned after several attempts, keyword queries are insufficient for you to describe this information need, or due the lack of domain knowledge it is too difficult for you to come up with proper keywords.

Submission

The assignment consists of 100 total points.

Each student is required to design 10 different information needs related to medical and health topics (10pts each), and only use the provided search engine to perform the corresponding search task. For each information need, please report the queries you have issued according to the order they were submitted to the search engine. During the search process, please label the corresponding returned results you have examined. If you have found a satisfying answer, extract the text segment containing the answer; otherwise give your explanation why this information need has not been satisfied after the search.

Take the aforementioned case as example again, you should report something like this:

Information Need: "What are the major symptoms of depression in early stage?"
Query 1: "symptoms of depression"
Query 2: "depression in early stage"
Query 3: "early signs of depression"
Answer: feeling sad, crying spells, change in eating habits, insomnia or early waking, sleeping "too much," lack of energy / motivation, thoughts of death, feeling suicidal, loss of interest in life / things that were once pleasurable, feelings of guilt / shame / worthlessness, feelings of "unreality," irritability, physical complaints (lightheadedness, headache, muscle aches, digestive problems, etc)

For the unsatisfactory search task, use the label Explanation to explain why you think the search was unsatisfactory, i.e., replace Answer with Explanation.

You may reformulate as many queries as you prefer, until your information need has been satisfied or you get frustrated and want to stop.

Please report your information seeking process in above format. We will also check your confirmed annotations recorded in the system, the answers you claimed and the explanations you provided, please make sure your report is consistent with your logged behaviors.

The report should be in PDF format.

NOTE: please keep in mind that your annotations will be used as relevance labels for creating our last machine problem, in which you are asked to improve this medical forum search engine. Therefore, accurate annotation is very important!

Deadline

The deadline for this assignment is 11:59pm, Tuesday, October 28th.