Kuiyu's Tier 2 Grant
- 1 Title
- 2 Feed back after first round (one of 15 shortlisted) in May 2011
- 3 My response for the Aug 2011 resubmission
- 4 Extract (no graphics) of 10 page write-up
ROSE (Reputable Opinion Social Environment) via Connectors, Mavens, and Salesmen
Feed back after first round (one of 15 shortlisted) in May 2011
The proposal reflects a great deal of thought and preparation, including time spent trying to make the ideas clear. It builds on an existing system, ROSE, that has gathered 10M reviews and analyzed them in various ways. It is now proposed to invite members and create a social network devoted to sharing reviews of products. There are many things to like about this proposal, but also a number of concerns that lowered the evaluations. One concern, raised by an outside reviewer, was that the work was too close to commercial. The committee does not believe that to be the case, and in fact encourages an eventual commercialization of research supported by Tier-2 grants (although the commercialization itself cannot be so supported).
Social networking is considered by many to be the "next big thing." The proposal addresses several aspects of social networking, with an emphasis on product reviews. One concern is that the management of reviews is not as "out of hand" as the proposal claims. Amazon, for example, has ways to rate reviews and organize them. The contention that the reviews written by your friends are somehow more reliable than those written by random people sounds dubious, although there are probably certain products and kinds of people (e.g., clothing for teenagers) where the opinions of friends matter more than others.
The attempt to identify and counteract the work of "trolls" who write false reviews seems important. We would have liked to see more about that issue.
On the other hand, the proposal seemed to emphasize sentiment analysis more than appears justified. While sentiment analysis is an important and challenging problem in certain fields (e.g., news articles), it seems less important for product reviews, which are usually accompanied by a numerical rating that tells you the sentiment of the review.
At the heart of the proposal is the detection of unreliable reviews using a scoring system, christened CMS score, following Malcolm Gladwell's idea of connectors, mavens and salesmen. As admitted in the proposal, the scoring scheme may be viewed as an extension of Kleinberg's HITS algorithm. This is the most innovative part of the proposal. There is a problem with this methodology, however. While hubs and authorities always converges, not so for connectors and mavens, which involves both positive and negative edges. In particular, consider a "network" with only two participants, A and B. A votes down B, and nothing else happens. There are two stable solutions: either A is a bad connector, misidentifying a good maven B, or A is a good connector, correctly identifying a bad maven B. Note that in HITS, with only positive edges, there is no problem; A is a hub and B is an authority.
Finally, is there enough motivation to get people joining this network? Will they write enough reviews (previous work concentrated on mining sources of existing reviews)? Will invitation-only work? Things like FaceBook and GMail only became important when they allowed everyone in, despite the risks that presents.
My response for the Aug 2011 resubmission
- negative vote weights: The reviewer correctly pointed out that in the simple example of one person down voting another person, the system fails to converge. Thus we will evaluate various algorithmic and policy approaches to deal with negative weights. One method is to reverse the direction of all negative weights, i.e., A down votes B is changed to B up votes A. Then we will have to figure out strategies to prevent people gaming the system by down voting everyone else.
- due to the recent proliferation of professional review writers (water army). Netizens in China has just about lost faith in online reviews. Typically, they have to read an excessively large amount of reviews in order to get a feel of the real sentiment on the ground.
- the invitation + CMS system was designed from scratch to counter trolls. Content based analysis has also been added to deal with this problem.
- The invitation only system may not scale as fast as a open system. But given the prevalence of professional review writers (in the hundreds of thousands), we think this is one of the key differentiator of ROSE to be effective
Extract (no graphics) of 10 page write-up
In July of 2010, a large number of essays warning people on the perils of consuming deep-sea fish oil began to surface in major online forums and blog sites in China, many with shocking headlines like “deep-sea fish oil no better than underground oil”. This caused quite a stir among netizens, triggering numerous online discussions, which eventually led to a massive boycott of one peculiar product, “Yili” Brand “QQ Infant Milk”. Naturally, “QQ Infant Milk” sales went downhill, which prompted Yili to launch a police report. A two month long police investigation revealed that the entire incident was actually master-minded by Yili's rival, Mengniu Dairy. Apparently, the marketing director of Mengniu's competing product, “Star of the Future”, has spent close to 280,000 RMB to fund the one-month long smear campaign, which employed tens of thousands of online writers to fabricate reviews and posts, at the measly rate of RMB0.5 per post .
For this particular incident, the victim Yili Diary was lucky, because the four perpetrators (two online marketing firms owners and Mengniu's director) were arrested on October 16, 2010. Everyday, countless entities including celebrities, brands, products are under subtle attack by professional shills (people paid to write positive/negative online reviews). To escape detection, more moderate smearing has been employed, e.g., people posting fabricated doubts about the reliability of a brand/product. Recent incidents include the online smear battle between Juggernaut Tencent and anti-virus software firm 360, with the latter amassing 100,000 positive posts.
In fact, this phenomena is so prevalent that there is a term for shills in China, called the “water army” (网络水军), who are paid to fabricate pseudo-original articles/postings on a regular basis, currently at RMB0.3 – RMB0.5 per posting. According to some estimates, the water army in China can range anywhere from tens to hundreds of thousand strong. To some, the water army is even a lucrative profession that pays an average of 2000 RMB per month , and many stay-at-home white-collar workers join the water army on a part-time basis just to earn some extra income.
The problem of fake reviews is not limited to only China, but is especially serious in China due to the economic disparity between the haves and have-nots; which led to the rise of the water army. Elsewhere in the world, major review sites like Tripadvisor, which has 45 million reviews of 500,000 hotel properties, are also starting to feel the pinch, and have taken steps to deal with fake reviews by red-flagging dishonest hotels who employ shills .
There were 457 million Internet users in China by the end of 2010. As a result, online social media in China has accumulated massive amount of valuable peer reviews on almost anything. Unfortunately, due to the proliferation of fake reviews, Chines netizens no longer trust simplistic online ratings, but instead have to rely on reading tons of actual reviews (to manually weed out the fake reviews), in order to form their own trusted opinion.
Ironically, before the rise of the water army, Chinese netizens already have to read a lot of reviews, now, they have to read even more reviews in order to form a reliable and trusted opinion of a product or service!
We are unaware of any site that provides an effective automatic filtering of online reviews based on trustworthiness. Sites like Amazon.com, which solely relies on peer review ratings, are susceptible to manipulation. Popular forums like stackoverflow.com, which allow users to up/down vote each post still require heavy human intervention and moderation to weed out fake posts. Google products currently only returns an unordered list of product reviews, and it does not filter fake reviews.
We are also unaware of any site that provides a concise and convincingly useful summary of reviews. The vast majority of review sites like reviewgist.com and kakaku.com (top Japanese review site) shows only user submitted numerical product ratings in the overall product report card based on some generic product characteristic like service quality, value-for-money, etc. For users to participate in these existing review sites, they have to painstakingly submit online surveys on each of the generic product feature.
In the realm of social review sharing, micro-review social-network add-ons like Blippr allows users to share reviews with friends, but do not provide any aggregation or product feature extraction functionality other than a percentage score, i.e., you know your friends collectively rated something as 100 and strangers rated it as 87 out of 100, but you don’t know the rationale behind the rating unless you painstakingly read through some reviews. Thus, the problem of faked and useless reviews is again prominent.
All in all, existing social network and or review sites do not help to reduce the frustration and time needed to read tons of reviews in order to distil the real reviews from the fakes, i.e., users won’t trust the system ratings unless they have actually read a significant number of reviews. In fact, 77% of people spend an average of 12 hours online researching consumer electronics products, according to a 2006 report by the Yahoo Consumer Electronics of America study.
ROSE (Reputable Opinion Social Environment) aims to design a trusted online social ecosystem where people can readily share (consume and contribute) reputable opinions and reviews. In particular, ROSE will analyze and score each review based on its trustworthiness. That way, users can save significant time by reading a summary of the most trustworthy reviews. To achieve this, we assemble the best of graph analysis, content analysis, and socio-economic incentives to create a working ROSE social network overlay with three objectives: 1. Build a social network abstraction layer for trusted opinion sharing, 1. ROSE allows users to share opinions with their online social network of friends; 2. ROSE will serve as a practical testbed for evaluating our algorithms and policies; 3. ROSE will piggyback on existing social networks like facebook and twitter as a plugin; 4. ROSE will include a simple review search engine for searching products and reviews. 2. Investigate multiple graph-based strategies for filtering out fake reviews as follows: 1. Explore different online membership policies and incentives to defeat the water army; 2. Design incentives to encourage members to participate by contributing reviews and rating reviews minimally (least intrusively), e.g., using vote-up and vote-down buttons; 3. Examine how multiple graphs can be combined effectively. We will have to deal with at least 3 types of graphs, namely, social-network graphs (friends), voting graphs (of content), and referral graphs (recruitment downline). 3. Investigate content analysis strategies for fine grain opinion analysis as follows: 1. Natural language processing (NLP) and statistical approaches to extract fine-grain product features from online reviews. 2. Writing style analysis to capture the most common templates used by the water army. 3. Aggregation approaches to present a concise and comprehensive graphical and text summary of opinions at various resolutions such as feature, sentence, and review level. 4. Ranking algorithms to score the usefulness of each review, which will allow users to see just the most useful top K reviews for each product feature, e.g., top K reviews from my friends, top K reviews from domain experts, top K reviews from the populace. 5. Authorship analysis to identify authors behind shady posts. 6. We aim to be the leading Chinese sentiment social network analysis research group in the world. Our research focus will be on the Chinese language content, although our algorithms can be applied to any language with little or no modification.
Various graph-based methods have been introduced to detect untrustworthy users, mainly for peer-to-peer application. For example, Eigentrust  computes a local trust value between every user and everybody else, based on transitive trust, i.e., trust whoever your friend trust. StereoTrust  first figure out the group that a stranger belongs to, and then uses the trustworthiness of the group to infer the trustworthiness of the stranger.
Content based methods have also been proposed, but mostly for detecting non-reviews, e.g., advertisement or off-topic posts. Moreover, detecting a faked review is significantly harder than detecting a useless review, this is because faked reviews are carefully crafted by professionals solely for the purpose to deceive a user. A recent study on fake hotel reviews reveal that they tend to use a third-person writing style and typically include fabricated stories, e.g., my colleague bought this phone and she like it very much. .
Policy based approaches typically use very simplistic heuristics, e.g., imposing a time delay before each newly registered user is allowed to post a review, members must earn points before they can post. These policies are useful as an initial deterrent to the novice, but will not withstand onslaught by the professional water army, each of which is known to control hundreds of active forum accounts.
ROSE is the first framework to systematically combine the above 3 approaches: graph-analysis, content-analysis, and behavioural policies to solve the problem of detecting fake reviews.
- The PI Dr. Kuiyu CHANG is the leading Chinese Sentiment Analysis expert outside of China, and has been working on a Chinese opinion search engine since 2005. One of his Ph.D. students from China applied to NTU specifically to work under him on Chinese Sentiment Analysis. He was also a collaborator for the A*Star PSF funded project on Social Network, which studied online content networks related to terrorism. Dr. Chang is winner of two international best paper awards, and has worked in a successful U.S. start-up that was acquired in 6 months for US$70m. As a result he has extensive hands-on experience leading web software development projects. He is currently developing a sentiment analysis plugin for the Twitter social network, funded by a MoE Tier-1 grant.
- Co-investigator Dr. Jin-Cheon Na is the leading Sentiment Analysis expert in Singapore (with over 10 refereed papers in the area). He has worked extensively on information extraction, which provides various essential Natural Language Processing methods for content analysis.
- Co-investigator Dr. Weihong Huang is a world-reknown economist and expert in chaos theory. He will use coupled map lattice theory to investigate i) how the members of social network interact with each other via up/down votes to effectively self-police the system, ii) how the network forms and grows; iii) the dynamic stability of the social network.
- Co-investigator Dr. Jie Zhang is a rising star on trust in social and vehicular networks, and has published extensively (over 15 refereed papers) in trust propagation in social networks. He is also winner of 4 international best paper awards.
- Visiting Professor Dr. C. Lee Giles (IEEE/ACM Fellow) from Penn State University will provide invaluable insights and guidance from his experience in building the world’s first citation engine CiteSeer.
- Industrial collaborator Dr. Alvin Chan from Brandtology Private Limited will share his expertise in creating large scale crawlers and opinion analysis infrastructure. He will also help provide abundant data for our project, in addition to seeking out venues for commercialization.
The project will be hosted at NTU's Centre for Advanced Information Systems (CAIS), which is a leading research centre in the field. CAIS counts among its 60+ members (20+ professors and 40+ Ph.D. students) one Nanyang Assistant Professor. The PI has several servers housed in CAIS that is running the Search Engine component of ROSE. Lastly, this project is aligned with 3 out of the 5 NTU peaks of excellence, and thus will be strongly supported by NTU at the president/provost level: New Media – No existing review social network goes to the extent as ROSE in promoting trust and sharing from inception via its invitation-only and CMS reputation system. The New Silk Road – Upon completion of the ROSE testbed, it could very well become China’s window to the world for trusted sentiment and opinion mining. Businesses and individuals alike can use ROSE to research and share trustworthy opinions and trends in the Chinese market. Innovation Asia – Our project is one of the few in Singapore that is keenly focused on applied research with tangible outcome. With Dr. Alvin Chan from the leading online sentiment analysis firm Brandtology, ROSE is poised to make a major real-world impact. We are negotiating with a number of VC firms (CNC Capital, Hejun VC, Integral-group) and software parks in China (Beijing ZhongGuanCun and Hangzhou Software Park), and also government agencies including IDA Shanghai, on deploying ROSE in China.
7 Preliminary Studies
7.1 ROSE Portal
The ROSE search engine (component 8 of Figure 1) was first developed in 2005 and is currently in its fifth revision as shown in http://rose.mosuma.com and the figure below. In fact, approximately 60% (Component 1, 2, 3, 4, 5, 8) of the 9 components in Figure 1 has already been prototyped. What remains to be developed include the social network analyzer (component 6), the user portal (component 7) and social network plugin (component 9). Extensive manpower is also needed to annotate and generate training data (to be fed into components 5 and 6) and to maintain the ontology (component 3) for comprehensive coverage. We currently have 27 million Chinese reviews on 500,000 products from 4 product genres: hotels, restaurants, mobile phones, and stocks (shanghai stock exchange). In search engine related research, we have previously studied indexing of movie review documents based on sentiments , and the effectiveness of web search results for genre and sentiment classification . We have also looked at sentiment based search in digital libraries , and fuzzy search .
7.2 Graph Analysis
We have over 20 related publications in this area (full list in CVs). We have previously evaluated controversial users (spammers) in Wikipedia networks . We have also explored how information networks evolved in Wikipedia  and whether users follow experts in Wikipedia . In terms of social network visualization, we have previously developed tools to visualize two related semantic networks . We have studied extensively the problem of trust and reputation in online marketplace and social networks, including ways to share semantic web trust ratings , designs to promote honesty in E-marketplaces , social network based approach to personalized recommendation . We have also studied how to detect influence between blog posts . We have recently designed a credibility model  based on theories developed in sociology, political science and information science for evaluating the credibility of messages that is user-specific and that is sensitive to the social network in which the user resides. It combines different types of information credibility, including cluster credibility, public credibility, experienced credibility, and role based credibility. The cluster credibility distinguishes the third-party reporting of media content from users in the local social community or in different clusters of social network.
7.3 Content Analysis
We have over 20 publications in this area (full list in CVs). We have previously proposed an unsupervised machine learning method to automatically construct a product hierarchical concept model from online Chinese product reviews . We have also explored various methods to mine Chinese sentiments, including a machine learning based approach  and a NLP approach . We have investigated Chinese sentence representation using an adjacency matrix . Recently, we have also investigated sentiment detection in micro-blogs . For general sentiment analysis, we have looked at aspect-based sentiment analysis of movie reviews , as well as comparison of sentiment expression in Movie Reviews between Genres , linguistic approaches to sentiment analysis , and the use of negation phrases in product review classification . We have also looked at review selection approaches to generate feature based ratings .
Recently we have looked at the problem of implicit feature extraction, e.g., extracting features like “weight” from implicit reviews like “heavy”, and proposed an association rule approach to extract implicit features. Our study in implicit feature mining is one of the first of its kind in the field of opinion mining. Recently, we have successfully made use of the difference in term distribution between domain-specific and generic corpus to automatically extract product features from a review. Our method, called Global Domain Topic Relevance (GDTR), achieved better precision and recall (figure on the left) compared to established methods published in KDD and WWW.
- 《新京报》, 蒙牛"雇佣"网络"打手"诽谤伊利"QQ星儿童奶", , http://media.people.com.cn/GB/40606/13008777.html, 2010
- 三晋都市报, 网络公关催生网络黑社会：三大帮派愚弄网民, , http://tech.qq.com/a/20101030/000086_2.htm, 2010
- Kaleeh Sakakeeny, Buying Fake Reviews on Trip Advisor, , http://technorati.com/lifestyle/travel/article/buying-fake-reviews-on-trip-advisor/, 2011
- S.D. Kamvar and M.T. Schlosser and H. Garcia-Molina, The eigentrust algorithm for reputation management in p2p networks, WWW, , 2003
- Xin Liu and Anwitaman Datta and Krzysztof Rzadca and Ee-Peng Lim, StereoTrust: A Group Based Personalized Trust Model, ACM CIKM (Conference on Information and Knowledge Management), , 2009
- Myle Ott and Yejin Choi and Claire Cardie and Jeffrey T. Hancock, Finding Deceptive Opinion Spam by Any Stretch of the Imagination, ACL, , 2011
- Ceren Budak and Divyakant Agrawal and Amr El Abbadi, Where The Blogs Tip: Connectors, Mavens, Salesmen and Translators of the Blogosphere, KDD Workshop on Social Media Analytics, , 2010
- Malcolm Gladwell, The Tipping Point: How Little Things Can Make a Big Difference, , , 2000
- Jon M Kleinberg, Authoritative sources in a hyperlinked environment, , 604--632, 1999
- Guang Qiu and Kangmiao Liu and Jiajun Bu and Chun Chen and Zhiming Kang, Extracting opinion topics for Chinese opinions using dependence grammar, ADKDD (Workshop on Data mining and audience intelligence for advertising), , 2007
- Li Zhuang and Feng Jing and Xiaoyan Zhu, Movie Review Mining and Summarization, CIKM, , 2006
- Yunqing Xia and Ruifeng Xu and Kam-Fai Wong and Fang Zheng, The Unified Collocation Framework for Opinion Mining, International Conference on Machine Learing and Cybernetics, , 2007
- Bo Pang and Lillian Lee and Shivakumar Vaithyanathan, Thumbs up? Sentiment Classification using Machine Learning Techniques, EMNLP, , 2002
- Chao Zhou and Guang Qiu and Kangmiao Liu and Jiajun Bu and Ming-cheng Qu and Chun Chen, SOPING: a Chinese Customer Review Mining System, SIGIR, , 2008
- Guang Qiu and Can Wang and Jiajun Bu and Kangmiao Liu and Chun Chen, Incorporating the Syntactic Knowledge in Opinion Mining in User-generated Content, WWW, , 2008
- Peter D. Turney, Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews, ACL, , 2002
- Soo-Min Kim and Eduard Hovy, Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text, SST '06 Proceedings of the Workshop on Sentiment and Subjectivity in Text, , 2006
- J.-C. Na and T. T. Thet and A. H. Nasution and F.M.A. Hassan, Sentiment-Based Digital Library of Movie Review Documents Using Fedora, , -, Accepted
- J.-C. Na and T. T. Thet, Effectiveness of Web Search Results for Genre and Sentiment Classification, , 709--726, 2009
- J.-C. Na and C. Khoo and S. Chan and N. B. Hamzah, Sentiment-Based Search in Digital Libraries, Joint Conference on Digital Libraries (JCDL), 143--144, 2005
- H. J. Wu and J.-C. Na and C. Khoo, A Hybrid Approach To Fuzzy Name Search Incorporating Language-Based and Text-Based Principles, , 3--19, 2007
- E.-P. Lim and A.T. Kwee and N.L. Ibrahim and A. Sun and A. Datta and Kuiyu Chang and Maureen, Visualizing and Exploring Evolving Information Networks in Wikipedia, ICADL (International Conference on Asia-Pacific Digital Libraries), 50--60, 2010
- Yi Zhang and Aixin Sun and A. Datta and Kuiyu Chang and E.-P. Lim, Do Wikipedians Follow Domain Experts? A Domain-specific Study on Wikipedia Contribution, JCDL (ACM/IEEE Joint Conference on Digital Libraries), 119--128, 2010
- E.-P. Lim and Maureen and N.L. Ibrahim and Aixin Sun and A. Datta and Kuiyu Chang, SSnetViz: A Visualization Engine for Heterogeneous Semantic Social Networks, ICEC (11th International Conference on Electronic Commerce), , 2009
- Jie Zhang and R. Cohen, A Comprehensive Approach for Sharing Semantic Web Trust Ratings, , 302--319, 2007
- Jie Zhang and R. Cohen, Design of a Mechanism for Promoting Honesty in E-Marketplaces, 22nd Conference on Artificial Intelligence (AAAI), , 2007
- A. Seth and Jie Zhang, A Social Network Based Approach to Personalized Recommendation of Participatory Media Content, International Conference on Weblogs and Social Media (ICWSM), , 2008
- K.-W. Tan and J.-C. Na and Y.-L. Theng, Influence Detection between Blog Posts through Blog Features, Content Analysis, and Community Identity, , -, 2011
- A. Seth and J. Zhang and R. Cohen, Bayesian Credibility Modeling for Personalized Recommendation in Participatory Media, Proceedings of UMAP, , 2010
- Bin Shi and Kuiyu Chang, Generating a Concept Hierarchy for Sentiment Analysis, IEEE International Conference on Systems, Man, and Cybernetics, 1284, 2008
- Bin Shi and Kuiyu Chang, Mining Chinese Reviews, ICDM Workshop on Data Miing for Design and Marketing, 585--589, 2006
- Zhen Hai and Kuiyu Chang and Qingbao Song and Jung-Jae Kim, A Statistical NLP Approach for Feature and Sentiment Identification from Chinese Reviews, CIPS-SIGHAN CLP (First Joint Conference on Chinese Language Processing), 105--112, 2010
- Bin Shi and Kuiyu Chang, An Adjacency Matrix Approach for Extracting User Sentiments, Data Mining for Design and Marketing, 251--276, 2009
- Guangxia Li and Steven Chu Hong Hoi and Kuiyu Chang, Micro-blogging Sentiment Detection by Collaborative Online Learning, IEEE ICDM (International Conference on Data Mining), , 2010
- T.T. Thet and J.-C. Na and C. Khoo, Aspect-Based Sentiment Analysis of Movie Reviews on Discussion Boards, , -, 2010
- J.-C. Na and T. T. Thet and C. Khoo, Comparing Sentiment Expression in Movie Reviews from Four Online Genres, , 317--338, 2010
- T. T. Thet and J.-C. Na and C. Khoo and S. Shakthikumar, Sentiment Analysis of Movie Reviews on Discussion Boards using a Linguistic Approach, Workshop on Topic-Sentiment Analysis for Mass Opinion Measurement, 81--84, 2009
- J.-C. Na and C. Khoo and H. J. Wu, Use of Negation Phrases in Automatic Sentiment Classification of Product Review, , 180--191, 2005
- C. Long and Jie Zhang and X. Zhu, A Review Selection Approach for Accurate Feature Rating Estimation, International Conference on Computational Linguistics (COLING), , 2010
- C. Long and Jie Zhang and M. Huang and X. Zhu and M. Li and B. Ma, Specialized Review Selection for Feature Rating Estimation, IEEE/WIC/ACM International Conference on Web Intelligence (WI), , 2009
- Zhen Hai and Kuiyu Chang and Jung-Jae Kim, Feature Identification via Co-occurrence Association Rule Mining, ACL CICLing (Conference on Intelligent Text Processing and Computational Linguistics), 393--404, 2011