Neural network embeddings on corporate annual filings for portfolio selection

作者:

Highlights:

• Neural network embeddings can capture semantic changes in large documents.

• Semantic changes in 10-K reports are associated with future firms’ performance.

• The PV-DM version of Doc2Vec outperforms PV-DBOW and the average of Word2Vec.

• Stability associated with momentum and unchanged 10K is rewarded in the marketplace.

• A portfolio based on cosine similarity earns statistically significant 3-factor alpha.

摘要

•Neural network embeddings can capture semantic changes in large documents.•Semantic changes in 10-K reports are associated with future firms’ performance.•The PV-DM version of Doc2Vec outperforms PV-DBOW and the average of Word2Vec.•Stability associated with momentum and unchanged 10K is rewarded in the marketplace.•A portfolio based on cosine similarity earns statistically significant 3-factor alpha.

论文关键词:Neural network embedding,Document embedding,Machine learning,Asset pricing,Inattention,Annual reports

论文评审过程:Received 26 June 2020, Revised 19 September 2020, Accepted 23 September 2020, Available online 25 September 2020, Version of Record 30 September 2020.

论文官网地址:https://doi.org/10.1016/j.eswa.2020.114053