In the domain of software engineering NLP techniques are needed
to use and find duplicate or similar development knowledge which
are stored in development documentation as development tasks. To
understand duplicate and similar development documentations we
will discuss different NLP techniques as descriptive statistics, topic
analysis and similarity algorithms as N-grams, the Jaccard or LSI algorithm
as well as machine learning algorithms as Decision trees or
support vector machines (SVM). Those techniques are used to reach
a better understanding of the characteristics, the lexical relations
(syntactical and semantical) and the classification and prediction of
duplicate development tasks. We found that duplicate tasks share
conceptual information and are rather created by inexperienced developers.
By tuning different features to predict development tasks
with a gradient or a Fidelity loss function a system can identify a
duplicate tasks with a 100% accuracy.
Passionate Software Engineer, Architect and Researcher!
Program Display Configuration
Sun 4 Nov
Displayed time zone: Guadalajara, Mexico City, Monterreychange
Xueqing Liu University of Illinois at Urbana-Champaign, USA, Chi Wang Microsoft, USA, Yue Leng University of Illinois at Urbana-Champaign, USA, ChengXiang Zhai University of Illinois at Urbana-Champaign, USA