Research Summary for 2010-2011

My research interest lies in the intersection of text mining and machine learning, and aims to develop general and effective text mining techniques to understand users' intentions from their generated content and behavior. (Slides) (Poster)
  1. Latent Aspect Rating Analysis

    To support a deeper and more detailed understanding of user reviews, I propose and study a novel text mining problem called Latent Aspect Rating Analysis (LARA), which aims at analyzing opinions expressed in each review at the level of topical aspects to discover each individual reviewer's latent rating on each aspect as well as the relative importance weight on different aspects when forming the overall judgment.
    A unified framework is introduced by incorporating topic modeling technique to identify the aspect segments and infer the latent aspect weights/ratings jointly.
  2. Online Forum Discussion Structure Modeling

    The replying structure in the online threaded discussions conveys important information to guide both users and automated algorithms to digest and retrieve relevant information hidden in the discussions.
    A probabilistic method is proposed to formalize and learn such replying structures: various kinds of features are introduced to capture the structural dependency within the replying relations.
  3. Latent Topical Structure Modeling

    Recognizing and modeling document structure is a fundamental problem in text mining research. Because languages are intrinsically cohesive and coherent, modeling and discovering the latent topical structures within documents would be beneficial for many text analysis tasks.
    Structural Topic Model is proposed to model and analyze both latent topics and topical transition structure in the text documents. The learned structure demonstrates its effectiveness in real applications, such as sentence annotation in information extraction task and sentence ordering in multi-document summarization task, where correctly recognizing the document structure is crucial.

Publications

  1. Hongning Wang, Yue Lu and ChengXiang Zhai. Latent Aspect Rating Analysis without Aspect Keyword Supervision. The 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'2011), p618-626. (PDF) (slides)
  2. Hongning Wang, Chi Wang, ChengXiang Zhai and Jiawei Han. Learning Online Discussion Structures by Conditional Random Fields. The 34th Annual International ACM SIGIR Conference (SIGIR'2011) P435-444 (PDF) (slides)
  3. Hongning Wang, Duo Zhang and ChengXiang Zhai. Structural Topic Model for Latent Topical Structure Analysis. The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL HTL'2011) P1526-1535 (PDF)