Web 2.0 technologies have attracted an increasing number of active online writers and viewers. A deeper understanding of when customers will review and what motivates them to write online reviews is of both theoretical and practical significance. In this paper, we present a novel methodological framework, which consists of theoretical modeling and text-mining technologies, to study the relationships among customers’ review promptness, their review opinions, and their review motivations. We first study customers’ online “purchase-review” behavior dynamics; then, we introduce the LDA method to mine customers’ opinion from their review text; finally, we propose a theoretical model to explore some motivations for those people publishing review online. The analytical and experimental results with real data from a Chinese B2C website demonstrate that the behavior dynamics of customers’ online review are influenced by the multidimensional motivations, and some of them can be observed from their review behaviors, such as review promptness.
Online customer review is a review made by a customer who has purchased a product or service online. It is a form of customers’ feedback on e-commerce and online shopping sites. Now, online review has become an important channel for both consumers and producers to provide product information and recommendations from a customer’s perspective [
To understand the exact information from the massive and various online reviews, several interesting questions need to be answered properly. First, it is surprising that some B2C websites like
In the literature, the classic characterization of motivations as broadly extrinsic and intrinsic was used to discuss the motivations that contribute to online communities [
In addition, for those valuable reviews published on a B2C website, people may have further interests in exploring what is talked about in those reviews. However, the task of exploring information (opinions) from online reviews may become more obvious and thus serious to the phenomenon of the so-called
Certain top reviewed items in
B2C system | Product | # of reviews |
---|---|---|
|
Kindle Keyboard 3G | 36,112 |
Kindle Fire, Wi-Fi, | 15,692 | |
The Hunger Games | 5,524 | |
|
||
|
TP-LINK TL-WR841N 300M | 100,687 |
TP-LINK WR340G+ 54M | 64,308 | |
Philips HQ912 | 57,997 |
In the literature, quite a few research results have shown evidence of the existence of review motivations [
The rest of this paper is organized as follows. Section
In general, a typical online shopping experience has several steps: first, people buy a product online; then, they experience the product delivery and quality (function); finally, a review motivation generated and the contents were posted online (see Figure
The birth of an online review.
As we can see from Figure
The whole research processes are shown in Figure
The research framework.
Human behavior dynamics deals with the effects of multiple causal forces in human behavior, including network interactions, groups, social movements, and historical transitions, among many other concerns [
Since online communities bring together individuals with shared interest in joint action or sustained interaction, a very recent work presented by Johnson et al. [
Although online review has become popular in B2C systems, little effort has been undertaken to examine the dynamic aspects of online opinion formation. It is valuable to mention that Wu and Huberman studied the dynamics of online opinion formation by analyzing the temporal evolution of very large sets of users’ views [
Previously, there were also some research findings about the motivations of posting reviews online [
From the observations of actual review behavior, even if reviewers only write occasional reviews, they give us some immediate reasons why they might want to write a review [
Rate polarity shows extreme attitude and has positive impact on review promptness.
Social exchange theory is one of the basic theories of social economy [
According to the theory, if one person provides advice based on his or her knowledge, then he or she expects certain types of social rewards, such as approval, respect, or increased status in the eyes of the other individuals [
In current online shopping websites, membership and its level management strategy are introduced and used to provide customers with not only online review communities reputation but also incentives to promote them to post their online reviews. In general, customers with high membership levels can enjoy higher level of services, such as products discounts. Fu and Wang argued that, in practice, shopping sites taking incentives and membership level management strategies may promote reviewers to post more positive online reviews [
Membership shows extrinsic motivations, which has negative impact on users’ review promptness.
Opinion mining, also known as sentiment analysis [
Opinion summarization [
Feature extraction involves simplifying the amount of resources required to describe a large set of data accurately with a method of early expert annotation [
In this study, we move away from mining opinions only and seek to explain how the review motivations would affect customers’ behavior dynamics and review contents by the observations of actual review behavior. Due to the nature of virtual communities, the “online attractiveness” of reviewers, such as the online social status of a reviewer, plays a role in source credibility [
Product related reviews contents (quality and function) have positive impacts on review promptness.
In addition, Cho et al. showed that the performance of e-commerce platform (online shopping system) could also be an object of review [
Service related reviews contents (cost and service) have negative impacts on review promptness.
Based on the hypotheses, a theoretical model is shown in Figure
The theoretical model.
The data of customer reviews used here were extracted from the
Each observation contains the collection date, the product ID, the retail price on
Information extracted in an online review
Extracted data | Description | Notation |
---|---|---|
MEMBERSHIP_LEVEL | Customer membership level |
|
PURCHASE_TIME | The time stamp of the purchase/transaction |
|
SCORE | Customer’s rating |
|
REVIEW | Customer’s review contents |
|
REVIEW_TIME | The time stamp of the review |
|
In this work, we define the time interval between a user’s two actions of purchasing product online and publishing review online as his/her review promptness. Review promptness may reflect the initiative and efforts made by reviewers to post reviews [
Taking the data set as a whole, the descriptive information is summarized in Table
Descriptive statistics for the data set.
Online product | # of |
Minimum text length |
Maximum |
Average text length |
|
|
|
---|---|---|---|---|---|---|---|
ACER | 1248 | 14 | 382 | 72.7 | 0 | 178 | 18.0 |
ASUS | 254 | 21 | 334 | 72.1 | 0 | 168 | 18.5 |
DELL | 4018 | 6 | 381 | 66.7 | 0 | 104 | 14.8 |
HP | 414 | 11 | 303 | 59.1 | 0 | 173 | 24.5 |
iPad 2 | 18549 | 7 | 365 | 50.9 | 0 | 180 | 23.5 |
Macbook | 289 | 23 | 329 | 57.4 | 0 | 50 | 8.0 |
SamSung 110 | 408 | 19 | 378 | 64.7 | 0 | 128 | 13.7 |
Thinkpad | 1813 | 11 | 363 | 65.1 | 0 | 86 | 13.0 |
SamSung 530 | 427 | 21 | 307 | 69.5 | 0 | 164 | 15.0 |
SONY | 344 | 22 | 421 | 67.3 | 0 | 59 | 8.7 |
Teclast P85 | 6740 | 8 | 393 | 65.7 | 0 | 175 | 16.4 |
To study the distribution of
Therefore, all the “purchase-review” time intervals as well as their frequency in the data set will generate a data series of
To verify the assumption that the time interval between two consecutive customers’ behaviors, that is, purchase and review, follows a power-law distribution, an analysis is mainly made by using a linear regression and the least-squares method to fit the power-law function curve. Let
For the experimental data set, the fitted power-law distribution function is
Unbalance distribution of the time interval.
Distribution of the time interval
Power-law distribution of the time interval
Of course, the review contents were organized as natural language without any information tag making them valuable to mine information from. In this study, we are interested in finding clusters of words/topics in text. To that end, we introduce the LDA method [
Figure
Representation of LDA.
In the LDA method (as shown in Figure a word a document is a sequence of a topic
LDA is a popular topic modeling tool to learn a set of topics and feature words from Segment
Word segmentation is the problem of dividing a string of written language into its component words. In general, the noise phrases, stop words, and meaningless symbols are removed from the data set after word segmentation. In this work, we simply keep the useful word segments, most of these are nouns. Conduct LDA method on
Given a collection of unlabeled text documents, the LDA model seeks to discover hidden topics as distributions over the words in a fixed vocabulary. However, it is assumed that these topics are specified before any document has been generated. Thus, for any document in the corpus, the generative process contains two stages. First, a topic distribution vector modeled by a Dirichlet random variable has been chosen randomly to determine the topics appearing in a document. Then, for each word that is to appear in the document, a single topic is randomly selected from the distribution vector [
Initially, we use LDA to mine 20 topics. Some samples are shown in Table
Sample topics generated directly by LDA.
ID | Top 10 sample words and their probability |
---|---|
1 |
|
|
|
2 |
|
|
|
3 |
|
|
|
4 |
|
|
|
5 |
|
|
|
|
|
|
|
20 | 性 |
However, not all the topics in Table Inspect and annotate topics.
Two of the authors manually inspected the resulting topics [
Annotated topics.
Annotated topic |
Featured words |
---|---|
Quality | Quality, appearance, brand, and so forth |
Function | Function, experience, operation, and so forth |
Cost | Price, discount, gift, and so forth |
Service | Logistic, package, delivery, and so forth |
In order to simplify the calculation, we can say
Someone might argue that
One purpose of this work is to analyze the latent correlations between review promptness and the reviewed topics. As we mentioned before, different people may use various words to express the same topic for online shopping experience, leading to the sparse word distribution and increasing difficulty in analyzing customers’ common concerns.
To address this problem, we map the review of
For example,
The dependent variable is review time interval (TI) measured by the difference between the review time and purchase time. In order to test our hypotheses, we take the logarithm on dependent variable TI.
The independent variables, including review score and membership level, are mapped onto the score between “1” (low) and “5” (high). The review contents are binary variable to be measured as “1” if they disclose information and “0” for the other situations. All the variables used are summarized in Table
Variables and explanations.
Type | Variable | Notation | Explanation |
---|---|---|---|
Dependent | Time interval | TI | Time interval between purchase and review behavior. |
|
|||
Independent | Rating | Rating | The score rated by the reviewer. |
Membership | Member | Membership of an online reviewer. | |
map(Quality) | Quality | 1 for product quality feature reviewed, otherwise 0. | |
map(Function) | Function | 1 for product function feature reviewed, otherwise 0. | |
map(Cost) | Cost | 1 for cost feature reviewed, otherwise 0. | |
map(Service) | Service | 1 for service feature reviewed, otherwise 0. |
Finally, we use a linear specification for the review promptness estimation:
Initially, a correlation analysis including all of the variables used in estimations was conducted. Correlation values are shown in Table
Correlation analysis
TI | Rating | Member | Quality | Function | Cost | Service | |
---|---|---|---|---|---|---|---|
TI |
|
|
|
|
|
|
|
Rating |
|
|
|
|
|
|
|
Member |
|
|
|
|
|
|
|
Quality |
|
|
|
|
|
|
|
Function |
|
|
|
|
|
|
|
Cost |
|
|
|
|
|
|
|
Service |
|
|
|
|
|
|
|
The maximum correlation index was about
The results of the regression analysis for the model are shown in Table
Results of regression analysis (
Coefficient | Std. err. |
|
95% conf. interval | ||
---|---|---|---|---|---|
Constant | −0.61110 | 0.77463 | 0.43022 |
|
|
Rating | 0.89000 |
0.36704 | 0.01536 |
|
|
Rating2 | −0.10130 | 0.04369 | 0.02048 |
|
|
Member | 0.57535 |
0.08093 |
|
|
|
Member2 | −0.05447 |
0.01275 |
|
|
|
Quality | −0.02886 | 0.05024 | 0.56567 |
|
|
Function | 0.05975 | 0.04428 | 0.17738 |
|
|
Cost | −0.12621 |
0.04435 | 0.00445 |
|
|
Service | −0.44380 |
0.05342 | < |
|
|
As this study proposed, “Rating2” (
Table
The final test results are included in Table
Summary of results.
Description | Result | |
---|---|---|
H1 | People will rate a relative high score after long purchase time. | Supported |
H2 | People with high membership level will |
Rejected |
H3 | Longer time interval: review contents are more about product. | Rejected |
H4 | Longer time interval: review contents are more about Cost and Service. | Supported |
H1 is supported since people will rate a relative high score after a long purchase-review interval. If the product is ok, then there is nothing special to review—leading to more random comments. Moreover, the U-shape of
The finding that H2 is supported means that people with high membership level will publish a late review. For the low membership level people, they have much extrinsic gains (membership promotion; credits exchange) to publish reviews quickly online. For the high level people, they have poor gain (both intrinsic and extrinsic) from quick response. So, the customer loyalty rewards, such as membership levels, are effective in encouraging consumers to review their online shopping experiences. However, some people do not want to take the time to write high-quality reviews for information sharing, but they are willing to publish quick reviews for the rewards.
It is interesting that customers with a bigger purchase-review interval would like to present more text. First, after a long purchase-review interval, people would say few words on product since sufficient information could be released by others. So, H3 is rejected, whereas the finding that H4 is supported means that people have less comments about the product but more to say about the service and share experience about the service after they have used the product.
In this paper, we present a methodological framework to study the review promptness and some motivations of online reviewers. The analytical and experimental results with real data from a B2C website of The frequency of time intervals between consumers’ purchasing a good online and their publishing reviews follows a power-law distribution, providing new evidence for the study of human behavior online. The observations of actual review behavior, such as review quality, promptness, and attitude, are mostly consistent with reviewers’ motivations: If a consumer’s “purchase-review” time interval is relatively short, the customer’s evaluation contents are service related. On the contrary, a relatively long time interval means that the experience with a product/service is more complete and careful; thus, a customer may provide reviews about the function of the product.
These implications can help B2C sellers to manage consumers’ relationships and adjust online marketing strategies accordingly.
We should note some limitations of this work. First, Gilbert and Karahalios showed that the power-law curve governs Amazon’s review community [
The authors declare that they have no competing interests.