HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

Finding Code-Clone Snippets in Large Source-Code Collection by ccgrep

Abstract : Finding the same or similar code snippets in the source code for a query code snippet is one of the fundamental activities in software maintenance. Code clone detectors detect the same or similar code snippets, but they report all of the code clone pairs in the target, which are generally excessive to the users. In this paper, we propose ccgrep, a token-based pattern matching tool with the notion of code clone pairs. The user simply inputs a code snippet as a query and specifies the target source code, and gets the matched code snippets as the result. The query and the result snippets form clone pairs. The use of special tokens (named meta-tokens) in the query allows the user to have precise control over the matching. It works for the source code in C, C++, Java, and Python on Windows or Unix with practical scalability and performance. The evaluation results show that ccgrep is effective in finding intended code snippets in large Open Source Software.
Document type :
Conference papers
Complete list of metadata

Contributor : Hal Ifip Connect in order to contact the contributor
Submitted on : Wednesday, June 9, 2021 - 8:55:10 AM
Last modification on : Wednesday, June 9, 2021 - 8:57:07 AM
Long-term archiving on: : Friday, September 10, 2021 - 6:08:32 PM


Files produced by the author(s)


Distributed under a Creative Commons Attribution 4.0 International License



Katsuro Inoue, Yuya Miyamoto, Daniel German, Takashi Ishio. Finding Code-Clone Snippets in Large Source-Code Collection by ccgrep. 17th IFIP International Conference on Open Source Systems (OSS), May 2021, Lathi/virtual event, Finland. pp.28-41, ⟨10.1007/978-3-030-75251-4_3⟩. ⟨hal-03254061⟩



Record views


Files downloads