CCGIR: Information retrieval-based code comment generation method for smart contracts

作者:

Highlights:

摘要

A smart contract is a computer program, which is intended to automatically execute, control or document legally relevant events and actions according to the terms of a contract. About 10% of the security vulnerabilities in smart contracts are caused by misuse of codes without comments. Therefore, there is a need to design effective automatic code comment generation methods for smart contracts. In this study, we propose an information retrieval-based code comment generation method CCGIR for smart contracts. Since code clones are common in smart contract development, CCGIR finds the most similar code in the code repository and reuses its comment through an information retrieval approach from three aspects: semantic similarity, lexical similarity, and syntactic similarity of smart contract codes. We select a corpus, which contains 57,676 unique pairs of from 40,932 real-world smart contracts, as our experimental subject. Then we conduct empirical studies to evaluate the effectiveness of our proposed method. Experimental results show that CCGIR can outperform nine state-of-the-art baselines in terms of three performance measures. Moreover, we perform a human study to further verify that CCGIR can generate higher quality comments. Finally, we find CCGIR can achieve promising performance on the other two code comment generation tasks (i.e., code comment generation for Java and code comment generation for Python). Due to the simplicity and effectiveness of our proposed method, we recommend researchers can use our proposed method as the baseline when evaluating their proposed novel code comment generation methods.

论文关键词:Code comment generation,Smart contract,Information retrieval,Empirical study,Human study

论文评审过程:Received 12 August 2021, Revised 28 October 2021, Accepted 2 December 2021, Available online 9 December 2021, Version of Record 15 December 2021.

论文官网地址:https://doi.org/10.1016/j.knosys.2021.107858