Automated mortgage origination delay detection from textual conversations

作者：

Highlights：

• Delays in mortgage loan condition clearing are extremely expensive and damaging.

• A proprietary industry dataset of loan conversation transcripts is collected.

• Text mining models are developed using transcripts to predict delays.

• Delays are highly predictable, and interpretable models assist remediation efforts.

摘要

For modern mortgage firms, the process of setting up and verifying a new loan, known as origination, is complex and multifaceted. The literature notes that this process is rife with delays that can stunt the firm's business opportunities, but no modern analytical techniques have been developed to address the problem. In this paper, we suggest the use of text analytic and machine learning techniques to predict likely delays. In collaboration with a large national mortgage firm, we derive a large dataset of transcripts from employees' communications pertaining to potential loans. We first use information retrieval to generate an initial list of “seed terms,” or terms most associated with loans that were delayed. We then use an array of machine learning approaches to generate predictive models based upon these seed terms. We find that these approaches are comparable in performance to less interpretable state-of-the-art approaches utilizing word embeddings. The resultant models offer interpretable and high-performing solutions to mitigate the risk of delays through early risk detection.

论文关键词：Mortgages,Loan origination,Condition clearing,Text analytics,Machine learning,Predictive analysis

论文评审过程：Received 23 June 2020, Revised 26 October 2020, Accepted 27 October 2020, Available online 31 October 2020, Version of Record 30 November 2020.

论文官网地址：https://doi.org/10.1016/j.dss.2020.113433