In the domain of software engineering NLP techniques are needed to use and find duplicate or similar development knowledge which are stored in development documentation as development tasks. To understand duplicate and similar development documentations we will discuss different NLP techniques as descriptive statistics, topic analysis and similarity algorithms as N-grams, the Jaccard or LSI algorithm as well as machine learning algorithms as Decision trees or support vector machines (SVM). Those techniques are used to reach a better understanding of the characteristics, the lexical relations (syntactical and semantical) and the classification and prediction of duplicate development tasks. We found that duplicate tasks share conceptual information and are rather created by inexperienced developers. By tuning different features to predict development tasks with a gradient or a Fidelity loss function a system can identify a duplicate tasks with a 100% accuracy.
Passionate Software Engineer, Architect and Researcher!
Program Display Configuration
Sun 4 Nov
Displayed time zone: Guadalajara, Mexico City, Monterreychange
Xueqing Liu University of Illinois at Urbana-Champaign, USA, Chi Wang Microsoft, USA, Yue Leng University of Illinois at Urbana-Champaign, USA, ChengXiang Zhai University of Illinois at Urbana-Champaign, USA