Measuring code similarity is fundamental for many software engineering tasks,
e.g., code search, refactoring and reuse. However, most existing techniques
focus on code syntactical similarity only, while measuring code functional
similarity remains a challenging problem. In this paper, we propose a
novel approach that encodes code control flow and
data flow into a semantic matrix in which each element is a
high dimensional sparse binary feature vector, and we design a new deep
learning model that measures code functional similarity based
on this representation.
By concatenating hidden representations learned from a code pair,
model transforms the problem of detecting functionally similar code
to binary classification, which can effectively learn patterns between
functionally similar code with very different syntactics.
We have implemented our approach, DeepSim, for Java programs and evaluated its
recall, precision and time performance on two large datasets of functionally
similar code. The experimental results show that DeepSim significantly
outperforms existing state-of-the-art techniques, such as DECKARD, RtvNN, CDLH,
and two baseline deep neural networks models.
Tue 6 Nov Times are displayed in time zone: (GMT-05:00) Guadalajara, Mexico City, Monterrey change
|13:30 - 13:52|
|13:52 - 14:15|
|14:15 - 14:37|
|14:37 - 15:00|