Kuiyu's Tier 2 Grant

Title
ROSE (Reputable Opinion Social Environment) via Connectors, Mavens, and Salesmen

Feed back after first round (one of 15 shortlisted) in May 2011
The proposal reflects a great deal of thought and preparation, including time spent trying to make the ideas clear. It builds on an existing system, ROSE, that has gathered 10M reviews and analyzed them in various ways. It is now proposed to invite members and create a social network devoted to sharing reviews of products. There are many things to like about this proposal, but also a number of concerns that lowered the evaluations. One concern, raised by an outside reviewer, was that the work was too close to commercial. The committee does not believe that to be the case, and in fact encourages an eventual commercialization of research supported by Tier-2 grants (although the commercialization itself cannot be so supported).

Social networking is considered by many to be the "next big thing." The proposal addresses several aspects of social networking, with an emphasis on product reviews. One concern is that the management of reviews is not as "out of hand" as the proposal claims. Amazon, for example, has ways to rate reviews and organize them. The contention that the reviews written by your friends are somehow more reliable than those written by random people sounds dubious, although there are probably certain products and kinds of people (e.g., clothing for teenagers) where the opinions of friends matter more than others.

The attempt to identify and counteract the work of "trolls" who write false reviews seems important. We would have liked to see more about that issue.

On the other hand, the proposal seemed to emphasize sentiment analysis more than appears justified. While sentiment analysis is an important and challenging problem in certain fields (e.g., news articles), it seems less important for product reviews, which are usually accompanied by a numerical rating that tells you the sentiment of the review.

At the heart of the proposal is the detection of unreliable reviews using a scoring system, christened CMS score, following Malcolm Gladwell's idea of connectors, mavens and salesmen. As admitted in the proposal, the scoring scheme may be viewed as an extension of Kleinberg's HITS algorithm. This is the most innovative part of the proposal. There is a problem with this methodology, however. While hubs and authorities always converges, not so for connectors and mavens, which involves both positive and negative edges. In particular, consider a "network" with only two participants, A and B. A votes down B, and nothing else happens. There are two stable solutions: either A is a bad connector, misidentifying a good maven B, or A is a good connector, correctly identifying a bad maven B. Note that in HITS, with only positive edges, there is no problem; A is a hub and B is an authority.

Finally, is there enough motivation to get people joining this network? Will they write enough reviews (previous work concentrated on mining sources of existing reviews)? Will invitation-only work? Things like FaceBook and GMail only became important when they allowed everyone in, despite the risks that presents.

My response for the Aug 2011 resubmission

 * negative vote weights: The reviewer correctly pointed out that in the simple example of one person down voting another person, the system fails to converge. Thus we will evaluate various algorithmic and policy approaches to deal with negative weights. One method is to reverse the direction of all negative weights, i.e., A down votes B is changed to B up votes A. Then we will have to figure out strategies to prevent people gaming the system by down voting everyone else.


 * due to the recent proliferation of professional review writers (water army). Netizens in China has just about lost faith in online reviews. Typically, they have to read an excessively large amount of reviews in order to get a feel of the real sentiment on the ground.


 * the invitation + CMS system was designed from scratch to counter trolls. Content based analysis has also been added to deal with this problem.
 * The invitation only system may not scale as fast as a open system. But given the prevalence of professional review writers (in the hundreds of thousands), we think this is one of the key differentiator of ROSE to be effective

1 Background
In July of 2010, a large number of essays warning people on the perils of consuming deep-sea fish oil began to surface in major online forums and blog sites in China, many with shocking headlines like “deep-sea fish oil no better than underground oil”. This caused quite a stir among netizens, triggering numerous online discussions, which eventually led to a massive boycott of one peculiar product, “Yili” Brand “QQ Infant Milk”. Naturally, “QQ Infant Milk” sales went downhill, which prompted Yili to launch a police report. A two month long police investigation revealed that the entire incident was actually master-minded by Yili's rival, Mengniu Dairy. Apparently, the marketing director of Mengniu's competing product, “Star of the Future”, has spent close to 280,000 RMB to fund the one-month long smear campaign, which employed tens of thousands of online writers to fabricate reviews and posts, at the measly rate of RMB0.5 per post [1].

For this particular incident, the victim Yili Diary was lucky, because the four perpetrators (two online marketing firms owners and Mengniu's director) were arrested on October 16, 2010. Everyday, countless entities including celebrities, brands, products are under subtle attack by professional shills (people paid to write positive/negative online reviews). To escape detection, more moderate smearing has been employed, e.g., people posting fabricated doubts about the reliability of a brand/product. Recent incidents include the online smear battle between Juggernaut Tencent and anti-virus software firm 360, with the latter amassing 100,000 positive posts.

In fact, this phenomena is so prevalent that there is a term for shills in China, called the “water army”  (网络水军), who are paid to fabricate pseudo-original articles/postings on a regular basis, currently at RMB0.3 – RMB0.5 per posting. According to some estimates, the water army in China can range anywhere from tens to hundreds of thousand strong. To some, the water army is even a lucrative profession that pays an average of 2000 RMB per month [2], and many stay-at-home white-collar workers join the water army on a part-time basis just to earn some extra income.

The problem of fake reviews is not limited to only China, but is especially serious in China due to the economic disparity between the haves and have-nots; which led to the rise of the water army. Elsewhere in the world, major review sites like Tripadvisor, which has 45 million reviews of 500,000 hotel properties, are also starting to feel the pinch, and have taken steps to deal with fake reviews by red-flagging dishonest hotels who employ shills [3].

There were 457 million Internet users in China by the end of 2010. As a result, online social media in China has accumulated massive amount of valuable peer reviews on almost anything. Unfortunately, due to the proliferation of fake reviews, Chines netizens no longer trust simplistic online ratings, but instead have to rely on reading tons of actual reviews (to manually weed out the fake reviews), in order to form their own trusted opinion.

Ironically, before the rise of the water army, Chinese netizens already have to read a lot of reviews, now, they have to read even more reviews in order to form a reliable and trusted opinion of a product or service!

We are unaware of any site that provides an effective automatic filtering of online reviews based on trustworthiness. Sites like Amazon.com, which solely relies on peer review ratings, are susceptible to manipulation. Popular forums like stackoverflow.com, which allow users to up/down vote each post still require heavy human intervention and moderation to weed out fake posts. Google products currently only returns an unordered list of product reviews, and it does not filter fake reviews.

We are also unaware of any site that provides a concise and convincingly useful summary of reviews. The vast majority of review sites like reviewgist.com and kakaku.com (top Japanese review site) shows only user submitted numerical product ratings in the overall product report card based on some generic product characteristic like service quality, value-for-money, etc. For users to participate in these existing review sites, they have to painstakingly submit online surveys on each of the generic product feature.

In the realm of social review sharing, micro-review social-network add-ons like Blippr allows users to share reviews with friends, but do not provide any aggregation or product feature extraction functionality other than a percentage score, i.e., you know your friends collectively rated something as 100 and strangers rated it as 87 out of 100, but you don’t know the rationale behind the rating unless you painstakingly read through some reviews. Thus, the problem of faked and useless reviews is again prominent.

All in all, existing social network and or review sites do not help to reduce the frustration and time needed to read tons of reviews in order to distil the real reviews from the fakes, i.e., users won’t trust the system ratings unless they have actually read a significant number of reviews. In fact, 77% of people spend an average of 12 hours online researching consumer electronics products, according to a 2006 report by the Yahoo Consumer Electronics of America study.

2 Aims
ROSE (Reputable Opinion Social Environment) aims to design a trusted online social ecosystem where people can readily share (consume and contribute) reputable opinions and reviews. In particular, ROSE will analyze and score each review based on its trustworthiness. That way, users can save significant time by reading a summary of the most trustworthy reviews. To achieve this, we assemble the best of graph analysis, content analysis, and socio-economic incentives to create a working ROSE social network overlay with three objectives: 1. Build a social network abstraction layer for trusted opinion sharing, 1. ROSE allows users to share opinions with their online social network of friends; 2. ROSE will serve as a practical testbed for evaluating our algorithms and policies; 3. ROSE will piggyback on existing social networks like facebook and twitter as a plugin; 4. ROSE will include a simple review search engine for searching products and reviews. 2. Investigate multiple graph-based strategies for filtering out fake reviews as follows: 1. Explore different online membership policies and incentives to defeat the water army; 2. Design incentives to encourage members to participate by contributing reviews and rating reviews minimally (least intrusively), e.g., using vote-up and vote-down buttons; 3. Examine how multiple graphs can be combined effectively. We will have to deal with at least 3 types of graphs, namely, social-network graphs (friends), voting graphs (of content), and referral graphs (recruitment downline). 3. Investigate content analysis strategies for fine grain opinion analysis as follows: 1. Natural language processing (NLP) and statistical approaches to extract fine-grain product features from online reviews. 2. Writing style analysis to capture the most common templates used by the water army. 3. Aggregation approaches to present a concise and comprehensive graphical and text summary of opinions at various resolutions such as feature, sentence, and review level. 4. Ranking algorithms to score the usefulness of each review, which will allow users to see just the most useful top K reviews for each product feature, e.g., top K reviews from my friends, top K reviews from domain experts, top K reviews from the populace. 5. Authorship analysis to identify authors behind shady posts. 6. We aim to be the leading Chinese sentiment social network analysis research group in the world. Our research focus will be on the Chinese language content, although our algorithms can be applied to any language with little or no modification.

3 Significance
Various graph-based methods have been introduced to detect untrustworthy users, mainly for peer-to-peer application. For example, Eigentrust [4] computes a local trust value between every user and everybody else, based on transitive trust, i.e., trust whoever your friend trust. StereoTrust [5] first figure out the group that a stranger belongs to, and then uses the trustworthiness of the group to infer the trustworthiness of the stranger.

Content based methods have also been proposed, but mostly for detecting non-reviews, e.g., advertisement or off-topic posts. Moreover, detecting a faked review is significantly harder than detecting a useless review, this is because faked reviews are carefully crafted by professionals solely for the purpose to deceive a user. A recent study on fake hotel reviews reveal that they tend to use a third-person writing style and typically include fabricated stories, e.g., my colleague bought this phone and she like it very much. [6].

Policy based approaches typically use very simplistic heuristics, e.g., imposing a time delay before each newly registered user is allowed to post a review, members must earn points before they can post. These policies are useful as an initial deterrent to the novice, but will not withstand onslaught by the professional water army, each of which is known to control hundreds of active forum accounts.

ROSE is the first framework to systematically combine the above 3 approaches: graph-analysis, content-analysis, and behavioural policies to solve the problem of detecting fake reviews.

5 Investigators

 * The PI Dr. Kuiyu CHANG is the leading Chinese Sentiment Analysis expert outside of China, and has been working on a Chinese opinion search engine since 2005. One of his Ph.D. students from China applied to NTU specifically to work under him on Chinese Sentiment Analysis. He was also a collaborator for the A*Star PSF funded project on Social Network, which studied online content networks related to terrorism. Dr. Chang is winner of two international best paper awards, and has worked in a successful U.S. start-up that was acquired in 6 months for US$70m. As a result he has extensive hands-on experience leading web software development projects. He is currently developing a sentiment analysis plugin for the Twitter social network, funded by a MoE Tier-1 grant.
 * Co-investigator Dr. Jin-Cheon Na is the leading Sentiment Analysis expert in Singapore (with over 10 refereed papers in the area). He has worked extensively on information extraction, which provides various essential Natural Language Processing methods for content analysis.
 * Co-investigator Dr. Weihong Huang is a world-reknown economist and expert in chaos theory. He will use coupled map lattice theory to investigate i) how the members of social network interact with each other via up/down votes to effectively self-police the system, ii) how the network forms and grows; iii) the dynamic stability of the social network.
 * Co-investigator Dr. Jie Zhang is a rising star on trust in social and vehicular networks, and has published extensively (over 15 refereed papers) in trust propagation in social networks. He is also winner of 4 international best paper awards.
 * Visiting Professor Dr. C. Lee Giles (IEEE/ACM Fellow) from Penn State University will provide invaluable insights and guidance from his experience in building the world’s first citation engine CiteSeer.
 * Industrial collaborator Dr. Alvin Chan from Brandtology Private Limited will share his expertise in creating large scale crawlers and opinion analysis infrastructure. He will also help provide abundant data for our project, in addition to seeking out venues for commercialization.

6 Environment
The project will be hosted at NTU's Centre for Advanced Information Systems (CAIS), which is a leading research centre in the field. CAIS counts among its 60+ members (20+ professors and 40+ Ph.D. students) one Nanyang Assistant Professor. The PI has several servers housed in CAIS that is running the Search Engine component of ROSE. Lastly, this project is aligned with 3 out of the 5 NTU peaks of excellence, and thus will be strongly supported by NTU at the president/provost level: New Media – No existing review social network goes to the extent as ROSE in promoting trust and sharing from inception via its invitation-only and CMS reputation system. The New Silk Road – Upon completion of the ROSE testbed, it could very well become China’s window to the world for trusted sentiment and opinion mining. Businesses and individuals alike can use ROSE to research and share trustworthy opinions and trends in the Chinese market. Innovation Asia – Our project is one of the few in Singapore that is keenly focused on applied research with tangible outcome. With Dr. Alvin Chan from the leading online sentiment analysis firm Brandtology, ROSE is poised to make a major real-world impact. We are negotiating with a number of VC firms (CNC Capital, Hejun VC, Integral-group) and software parks in China (Beijing ZhongGuanCun and Hangzhou Software Park), and also government agencies including IDA Shanghai, on deploying ROSE in China.

7.1 ROSE Portal
The ROSE search engine (component 8 of Figure 1) was first developed in 2005 and is currently in its fifth revision as shown in http://rose.mosuma.com and the figure below. In fact, approximately 60% (Component 1, 2, 3, 4, 5, 8) of the 9 components in Figure 1 has already been prototyped. What remains to be developed include the social network analyzer (component 6), the user portal (component 7) and social network plugin (component 9). Extensive manpower is also needed to annotate and generate training data (to be fed into components 5 and 6) and to maintain the ontology (component 3) for comprehensive coverage. We currently have 27 million Chinese reviews on 500,000 products from 4 product genres: hotels, restaurants, mobile phones, and stocks (shanghai stock exchange). In search engine related research, we have previously studied indexing of movie review documents based on sentiments [18], and the effectiveness of web search results for genre and sentiment classification [19]. We have also looked at sentiment based search in digital libraries [20], and fuzzy search [21].

7.2 Graph Analysis
We have over 20 related publications in this area (full list in CVs). We have previously evaluated controversial users (spammers) in Wikipedia networks [21]. We have also explored how information networks evolved in Wikipedia [22] and whether users follow experts in Wikipedia [23]. In terms of social network visualization, we have previously developed tools to visualize two related semantic networks [24]. We have studied extensively the problem of trust and reputation in online marketplace and social networks, including ways to share semantic web trust ratings [25], designs to promote honesty in E-marketplaces [26], social network based approach to personalized recommendation [27]. We have also studied how to detect influence between blog posts [28]. We have recently designed a credibility model [29] based on theories developed in sociology, political science and information science for evaluating the credibility of messages that is user-specific and that is sensitive to the social network in which the user resides. It combines different types of information credibility, including cluster credibility, public credibility, experienced credibility, and role based credibility. The cluster credibility distinguishes the third-party reporting of media content from users in the local social community or in different clusters of social network.

7.3 Content Analysis
We have over 20 publications in this area (full list in CVs). We have previously proposed an unsupervised machine learning method to automatically construct a product hierarchical concept model from online Chinese product reviews [30]. We have also explored various methods to mine Chinese sentiments, including a machine learning based approach [31] and a NLP approach [32]. We have investigated Chinese sentence representation using an adjacency matrix [33]. Recently, we have also investigated sentiment detection in micro-blogs [34]. For general sentiment analysis, we have looked at aspect-based sentiment analysis of movie reviews [35], as well as comparison of sentiment expression in Movie Reviews between Genres [36], linguistic approaches to sentiment analysis [37], and the use of negation phrases in product review classification [38]. We have also looked at review selection approaches to generate feature based ratings [39][40].

Recently we have looked at the problem of implicit feature extraction, e.g., extracting features like “weight” from implicit reviews like “heavy”, and proposed an association rule approach to extract implicit features[41]. Our study in implicit feature mining is one of the first of its kind in the field of opinion mining. Recently, we have successfully made use of the difference in term distribution between domain-specific and generic corpus to automatically extract product features from a review. Our method, called Global Domain Topic Relevance (GDTR), achieved better precision and recall (figure on the left) compared to established methods published in KDD and WWW.