DeepSim: Deep Learning Code Functional Similarity (ESEC/FSE 2018 - Research Papers)

Sun 4 - Fri 9 November 2018 Lake Buena Vista, Florida, United States

Who

Gang Zhao, Jeff Huang

Track

ESEC/FSE 2018 Research Papers

Time Zone

The program is currently displayed in (GMT-05:00) Guadalajara, Mexico City, Monterrey.

Use conference time zone: (GMT-05:00) Guadalajara, Mexico City, MonterreySelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 6 Nov 2018 13:52 - 14:15 at Horizons 6-9F - Deep Learning Chair(s): David Rosenblum

Abstract

Measuring code similarity is fundamental for many software engineering tasks,
e.g., code search, refactoring and reuse. However, most existing techniques
focus on code syntactical similarity only, while measuring code functional
similarity remains a challenging problem. In this paper, we propose a
novel approach that encodes code control flow and
data flow into a semantic matrix in which each element is a
high dimensional sparse binary feature vector, and we design a new deep
learning model that measures code functional similarity based
on this representation.
By concatenating hidden representations learned from a code pair,
this new
model transforms the problem of detecting functionally similar code
to binary classification, which can effectively learn patterns between
functionally similar code with very different syntactics.

We have implemented our approach, DeepSim, for Java programs and evaluated its
recall, precision and time performance on two large datasets of functionally
similar code. The experimental results show that DeepSim significantly
outperforms existing state-of-the-art techniques, such as DECKARD, RtvNN, CDLH,
and two baseline deep neural networks models.

Gang Zhao

Jeff Huang

Texas A&M University