^{1}

^{1}

^{1}

Peer-to-Peer (P2P) lending has attracted increasing attention recently. As an emerging micro-finance platform, P2P lending plays roles in removing intermediaries, reducing transaction costs, and increasing the benefits of both borrowers and lenders. However, for the P2P lending investment, there are two major challenges, the deficiency of loans’ historical observations about the certain borrower and the ambiguity problem of estimated loans’ distribution. In order to solve the difficulties, this paper proposes a data-driven robust model of portfolio optimization with relative entropy constraints based on an “instance-based” credit risk assessment framework. The model exploits a nonparametric kernel approach to estimate P2P loans’ expected return and risk under the condition that the historical data of the same borrower is unavailable. Furthermore, we construct a robust mean–variance optimization problem based on relative entropy method for P2P loan investment decision. Using the real-world dataset from a notable P2P lending platform, Prosper, we validate the proposed model. Empirical results reveal that our model provides better investment performances than the existing model.

Peer-to-peer lending, as an emerging online micro-finance, provides services that bring borrowers and lenders together virtually and help them to lend to and borrow from each other directly. P2P lending platforms play roles in removing traditional financial intermediaries, reducing transaction costs, and increasing the benefits of both borrowers and lenders; therefore, they improve the efficiency of financial market. However, due to the absence of traditional financial intermediaries which can use collateral, certified accounts, and other means to enhance the creditworthiness of borrowers, the information asymmetry between borrowers and lenders severely exist and the credit risk of P2P loan investment is very high.

Credit risk of P2P lending refers to the potential monetary loss arising from the default of a borrower to a loan. Efficient and reasonable investment in P2P loans needs to be based on the reliable credit risk distribution assessment. It is very challenging to estimate the credit risk distribution of P2P loans for the difficulty of obtaining the historical returns (or losses) data of the loan waiting for investment. In other words, the historical yield data about the same borrower is usually unavailable. Moreover, even the distribution of loans’ returns (or losses) is approximated from the limited available data or the expert knowledge, the approximation is usually not accurate, and it is also known as the distribution ambiguity (probability measure uncertainty) problem. In this paper, we formulate a data-driven robust portfolio optimization model based on an “instance-based” credit risk assessment method for investment decisions in P2P lending.

To help personal lenders mitigate the risk, the current online P2P lending platforms have taken some risk-reducing measures, such as filtering out the high-risk borrower whose FICO score is lower than a threshold, making a preliminary rating on each loan and providing investors with risk level of each loan. Thus, each loan is marked as a grade, like AA, A, B, C, D, E, or NR, and the loans with the same grade are considered to have the same risk level. These rating-based models are more suitable for traditional banks and lending institutions, since they have the capability to grant large amounts of loans to diversify their investments. However, the individual investors just possess small amount of funds; they need more refined risk assessment methods and investment strategies.

Similar to bond investment, P2P investors can fund a portion, not the whole, of each loan. Therefore, investors can decide which loans to invest and, meanwhile, determine the amount of investment for each loan. This mechanism allows investors to construct a credit portfolio to mitigate risk.

Markowitz [

For P2P lending investment, as mentioned above, such procedures face at least two major challenges, i.e., the deficiency of loans’ historical observations and the ambiguity problem of estimated loans’ distribution (probability measure uncertainty problem). Thus, this paper proposes a data-driven robust model of portfolio optimization based on relative entropy constraints combined with an instance-based credit risk assessment method.

Specifically, we use the “instance-based” credit risk assessment method proposed by Guo et al. [

Our work is somewhat related to the paper by Guo et al. [

The rest of this paper is organized as follows. Section

In order to assess risk and assist investment decisions making in P2P lending, researchers have done many studies: Emekter et al. [

The above researches investigate the factors determining the credit risk and analyze the performance of P2P loans; however, they do not propose a mechanism which assist individual investors in allocating loans effectively and making optimal investment decisions.

To help personal lenders mitigate the risk, the popular online P2P platforms, like Lending Club and Prosper, have developed credit scoring systems to assess the creditworthiness of each borrower based on data mining or machine learning techniques. There is a large body of existing literatures concerned with credit rating using data mining techniques, for example, linear discriminate analysis (LDA) [

In the portfolio selection problem, full knowledge of the assets’ distribution is usually assumed to determine the optimal portfolio. In most real-life applications, we need to approximate the assets’ distribution. However, the approximations are not necessarily accurate, and it is known as the distribution ambiguity (probability measure uncertainty) problem.

The robust optimization algorithm is an attractive way to solve the portfolio selection problem under distribution ambiguity. As the exact parameters are unavailable, Natarajan et al. [

Since relative entropy has the ability to measure the difference between two probability distributions (probability measures), it can be used to construct the uncertainty set for robust optimization. In the studies of Hansen and Sargent [

In recent years, research on data-driven methods has been well studied. In this framework, it is assumed that investors only possess the information about history data of asset return. Bertsimas et al. [^{2} test, Anderson-Darling test, and some other testing tools to construct uncertainty sets and take the worst case of each set to formulate the robust optimization. They assume that the uncertainty sets are defined by certain structures and sizes based on the data points available. While the structure of uncertainty set in our study is not predefined, we consider the uncertainty of mean, covariance, and distribution synthetically. Kang et al. [

Using historical data to evaluate future performance and potential loss is a convention. However, unlike bonds or stocks investment, the historical yield data about the same P2P borrower is usually unavailable. Thus, the risk assessment of new loan is very challenging. In this section, we briefly introduce the instance-based credit risk assessment model proposed by Guo et al. [

In this instance-based assessment framework, the expected return of each loan is estimated as a weighted average of historical observations of other borrowers’ closed loans. Specifically, for a new loan

The weighted returns of the past loans are assumed as historical observations of a new loan. According to this line of thought, taking variance as the risk measure, weighted variance of past loans are used to assess the new loan’s risk, that is,

The absolute deviation between two loans’ default probabilities is used to measure the similarity; the smaller the absolute deviation, the more the similarity, and, therefore, the larger the weight. In particular, absolute deviation of default probabilities between loans_{ij} = _{i}_{j}_{i} and_{j} are the default probabilities of loans

Kernel regression is a nonparameter statistical method to investigate the nonlinear relation between random variables, which is based on the kernel density estimation. First of all, the preliminaries of kernel estimation are introduced.

Given_{j},

(a)

There are a range of commonly used kernel functions, such as uniform, triangular, biweight, triweight, and Gaussian [

Many literatures reveal that the choice of kernel function does not affect the estimation significantly; however, the choice of the bandwidth is a vital issue [

In the following, we introduce the kernel regression model proposed by Nadaraya [^{2}-valued. With the sample set, _{j},_{j})

For the instance-based credit risk modeling, the set of historical observations is represented as _{j},_{j})_{j} and_{j} are the default probability and return rate of the

Comparing (

Similar to bond investment, P2P lenders can invest a portion of each loan. Thus, P2P loan investment decisions can be transformed into a credit portfolio optimization problem. This section introduces the portfolio optimization model for investment decisions in P2P lending, which accounts for the uncertainty of the distribution of the loans. We start from the classical mean-variance optimization model proposed by Markowitz [

In the classical mean-variance optimization model, the optimal asset allocation strategy is identified by solving the tradeoff between risk and return according to investors’ risk preference. A portfolio that invests in^{n}, where each weight denotes the proportion of wealth allocated to an asset. Then the return and risk of the portfolio become ^{n} and^{n×n} are the expected return and the covariance matrix of the assets’ returns under the probability measure (or probability distribution)^{n} denotes the set of feasible portfolios and

In reality, the assumption that the expected return

The investors might consider a set of probability measures, i.e., an uncertainty set, to cover a range of scenarios based on their assessments, and then use robust optimization to obtain approximate optimal strategies for the worst scenarios within the uncertainty set. In this paper, we define

Let

Yam et al. [

In the Section

The value of the portfolio remains at its initial value, i.e.,

Short-selling is forbidden; thus

For each loan, the amount that lender can invest is no more than the borrower request,_{i}; thereby, _{i}, where M is the total investment amount and investor has available.

In this section, we investigate the validity of the robust mean-variance portfolio optimization model in P2P lending using the real-world dataset from a notable P2P lending platform, Prosper. All numerical experiments are performed by using MATLAB on PC.

The dataset for empirical study is from a notable P2P lending platform in the United States, Prosper. It consists of 17,001 loans including 3039 default loans and 13908 completed loans, whose issue dates within the period from November 2005 to March 2014.

Using the data, a credit scoring model is learnt to transform the loan attributes into the default probability. The loan attributes are as follows: the borrower’s FICO score which reflects borrower’s creditworthiness, the borrower’s number of inquiries in the past six months, the monetary amount of the loan, the homeownership status of the borrower, the debt-to-income ratio of the borrower, the borrower’s current delinquencies representing the number of accounts delinquent, and the borrower’s number of public records in the past 10 years (Row 1-7 in Table

Description of variables.

Variable | Description |
---|---|

_{1} | FICO score of the borrower |

_{2} | The number of inquiries of the borrower in the last 6 months |

_{3} | The monetary amount of the loan |

_{4} | The homeownership status of the borrower (0 = rent, 1 = own) |

_{5} | The debt-to-income ratio of the borrower |

_{6} | The number of accounts delinquent |

_{7} | The number of public records in the past 10 years |

| Dependent variable (0 = completed, 1 = default) |

There exist many credit scoring models to predict the default probability of a loan, such as: Xgboost model [

We randomly divide the dataset into two parts, one containing 40% of all loans for determining the optimal bandwidth

In this paper, we propose a robust credit portfolio optimization model for investment decisions in P2P lending. In order to show its effectiveness, we compare it with a benchmark model proposed by Guo et al. [

IOM is the instance-based model proposed by Guo et al. [

RIOM is the robust instance-based model in this study. Expected return and risk of each loan are also assessed based on the “instance-based” assessment framework. However, we use the robust model of credit portfolio optimization based on relative entropy method, Equation (

We compare the two models by the following procedure:

Train the credit risk assessment model with the training set, and use the trained model to predict the expected return (

For each model, feed the predicted expected return vector

Compare the return rate of the two models.

As mentioned before, we select the Gaussian kernel,

The curve of CV (h).

To apply the robust credit portfolio optimization method to obtain the optimal investment strategy in problems (

Table

Rate of return from the optimal portfolio on the Prosper dataset.

Subset | IOM | RIOM |
---|---|---|

1 | 0.0501 | |

2 | 0.0550 | |

3 | 0.0540 | |

4 | 0.0564 | |

5 | 0.0627 | |

6 | 0.0543 | |

7 | 0.0532 | |

8 | 0.0605 | |

9 | 0.0593 | |

10 | 0.0546 | |

11 | 0.0637 | |

12 | 0.0567 | |

13 | 0.0468 | |

14 | 0.0519 | |

15 | 0.0544 | |

16 | 0.0357 | |

17 | 0.0588 | |

18 | 0.0607 | |

19 | 0.0544 | |

20 | 0.0625 | |

| ||

Average | 0.0553 | |

In order to test and verify that the conclusions obtained from the above experiments are stable, we consider different investment amounts and required returns as input parameters for portfolio selection and keep other conditions unchanged. As summarized in Table

Investors’ choices of input parameters for portfolio selection.

Set | Investment amount | Required rate |
---|---|---|

1 | $10,000 | 5.0% |

2 | $10,000 | 5.5% |

3 | $10,000 | 6.0% |

4 | $15,000 | 5.0% |

5 | $15,000 | 5.5% |

6 | $15,000 | 6.0% |

7 | $20,000 | 5.0% |

8 | $20,000 | 5.5% |

9 | $20,000 | 6.0% |

The computational results for each parameters pair are summarized in Table

Investment performances of input parameters for portfolio selection.

Subset | | | | | | | | | | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

| | | | | | | | | ||||||||||

IOM | RIOM | IOM | RIOM | IOM | RIOM | IOM | RIOM | IOM | RIOM | IOM | RIOM | IOM | RIOM | IOM | RIOM | IOM | RIOM | |

1 | 0.0598 | | 0.0601 | | 0.0502 | | 0.0501 | | 0.0558 | | 0.0520 | | 0.0594 | | 0.0544 | | 0.0691 | |

2 | 0.0500 | | 0.0601 | | 0.0675 | | 0.0550 | | 0.0517 | | 0.0551 | | 0.0504 | | 0.0664 | | 0.0661 | |

3 | 0.0441 | | 0.0491 | | 0.0735 | | 0.0540 | | 0.0598 | | 0.0631 | | 0.0503 | | 0.0554 | | 0.0647 | |

4 | 0.0525 | | 0.0658 | | 0.0636 | | 0.0564 | | 0.0512 | | 0.0553 | | 0.0566 | | 0.0617 | | 0.0518 | |

5 | 0.0532 | | 0.0631 | | 0.0513 | | 0.0627 | | 0.0566 | | 0.0616 | | 0.0576 | | 0.0547 | | 0.0610 | |

6 | 0.0634 | | 0.0564 | | 0.0717 | | 0.0543 | | 0.0570 | | 0.0585 | | 0.0584 | | 0.0528 | | 0.0516 | |

7 | 0.0613 | | 0.0547 | | 0.0551 | | 0.0532 | | 0.0528 | | 0.0620 | | 0.0481 | | 0.0485 | | 0.0460 | |

8 | 0.0529 | | 0.0505 | | 0.0685 | | 0.0605 | | 0.0545 | | 0.0645 | | 0.0545 | | 0.0628 | | 0.0592 | |

9 | 0.0548 | | 0.0550 | | 0.0559 | | 0.0593 | | 0.0507 | | 0.0574 | | 0.0535 | | 0.0561 | | 0.0574 | |

10 | 0.0474 | | 0.0472 | | 0.0499 | | 0.0546 | | 0.0528 | | 0.0622 | | 0.0514 | | 0.0582 | | 0.0689 | |

11 | 0.0597 | | 0.0602 | | 0.0661 | | 0.0637 | | 0.0562 | | 0.0498 | | 0.0531 | | 0.0569 | | 0.0572 | |

12 | 0.0644 | | 0.0541 | | 0.0624 | | 0.0567 | | 0.0529 | | 0.0574 | | 0.0551 | | 0.0536 | | 0.0618 | |

13 | 0.0635 | | 0.0709 | | 0.0532 | | 0.0468 | | 0.0637 | | 0.0504 | | 0.0555 | | 0.0636 | | 0.0616 | |

14 | 0.0593 | | 0.0626 | | 0.0634 | | 0.0519 | | 0.0568 | | 0.0614 | | 0.0577 | | 0.0541 | | 0.0572 | |

15 | 0.0523 | | 0.0485 | | 0.0571 | | 0.0544 | | 0.0577 | | 0.0633 | | 0.0597 | | 0.0536 | | 0.0595 | |

16 | 0.0549 | | 0.0684 | | 0.0508 | | 0.0357 | | 0.0642 | | 0.0573 | | 0.0593 | | 0.0616 | | 0.0551 | |

17 | 0.0549 | | 0.0549 | | 0.0538 | | 0.0588 | | 0.0674 | | 0.0615 | | 0.0535 | | 0.0487 | | 0.0696 | |

18 | 0.0546 | | 0.0512 | | 0.0560 | | 0.0607 | | 0.0585 | | 0.0687 | | 0.0599 | | 0.0576 | | 0.0507 | |

19 | 0.0492 | | 0.0572 | | 0.0657 | | 0.0544 | | 0.0434 | | 0.0589 | | 0.0581 | | 0.0472 | | 0.0623 | |

20 | 0.0554 | | 0.0413 | | 0.0596 | | 0.0625 | | 0.0562 | | 0.0698 | | 0.0518 | | 0.0601 | | 0.0638 | |

| ||||||||||||||||||

Average | 0.0554 | | 0.0566 | | 0.0598 | | 0.0553 | | 0.0560 | | 0.0595 | | 0.0552 | | 0.0564 | | 0.0597 | |

Performance comparison.

In conclusion, the optimal portfolio identified from the robust optimization model in this study is more efficient than the existing model. And the performance of our model is more robust and stable.

In this paper, we formulate a data-driven robust model of portfolio optimization with relative entropy constraints based on an instance-based credit risk assessment framework for investment decisions in P2P lending. This P2P lending investment decision model has at least three advantages. Firstly, it provides a more refined measure of P2P loans’ risk and reveals a more intuitive and quantized risk estimate to investors, instead of just labelling each loan with a credit grade. Secondly, this model can estimate each loan’s expected return and risk when the historical observation of the same borrower is unavailable. Finally, this model considers the loans’ distribution ambiguity (probability measure uncertainty) problem and uses relative entropy to model parameter uncertainty to ensure the optimal allocation strategy efficient and feasible under various actual scenarios. Numerical experiments imply that the P2P lending investment decision model using the robust optimization with relative entropy constraints provides better performance than existing model.

The data this paper used is downloaded from the website of Prosper: https://www.prosper.com/invest/download.aspx.

The authors declare that there are no conflicts of interest regarding the publication of this paper.”

The research is supported by the National Natural Science Foundation of China (Grants nos. 71471027, 71731003, and 71873103), the National Social Science Foundation of China (Grant no. 16BTJ017), National Natural Science Foundation of China Youth Project (Grant no. 71601041), Liaoning Economic and Social Development Key Issues (Grant no. 2015lslktzdian-05), and Liaoning Provincial Social Science Planning Fund Project (Grant no. L16BJY016). The authors acknowledge the organizations mentioned above.