Learning variable-length representation of words

作者:

Highlights:

• A variable-length representation learning (embedding) of words.

• Allows provision for compressing the word vectors.

• Proposed algorithm uses a smaller number of dimensions for words with consistent contexts (words with specific meanings).

• Variable length embedding potentially helps removing bias (over-fitting) on certain datasets.

• Proposed approach outperforms fixed-length embedding, and also transformation-based approaches based on regularization and binarization, on standard word-semantics datasets.

摘要

•A variable-length representation learning (embedding) of words.•Allows provision for compressing the word vectors.•Proposed algorithm uses a smaller number of dimensions for words with consistent contexts (words with specific meanings).•Variable length embedding potentially helps removing bias (over-fitting) on certain datasets.•Proposed approach outperforms fixed-length embedding, and also transformation-based approaches based on regularization and binarization, on standard word-semantics datasets.

论文关键词:Word embedding,Compression and sparsity,Lexical semantics

论文评审过程:Received 6 August 2019, Revised 13 January 2020, Accepted 23 February 2020, Available online 27 February 2020, Version of Record 5 March 2020.

论文官网地址:https://doi.org/10.1016/j.patcog.2020.107306