Source code is bimodal: it combines a formal, algorithmic channel and a natural language channel of identifiers and comments. In this work, we model the bimodality of code with name flows, an assignment flow graph augmented to track identifier names. Conceptual types are logically distinct types that do not always coincide with program types. Passwords and URLs are example conceptual types that can share the program type string. Our tool, RefiNym, is an unsupervised method that mines a lattice of conceptual types from name flows and reifies them into distinct nominal types. For string, RefiNym finds and splits conceptual types originally merged into a single type, reducing the number of same-type variables per scope from 8.7 to 2.2 while eliminating 21.9% of scopes that have more than one same-type variable in scope. This makes the code more self-documenting and frees the type system to prevent a developer from inadvertently assigning data across conceptual types.
Tue 6 NovDisplayed time zone: Guadalajara, Mexico City, Monterrey change
13:30 - 15:00
|On Accelerating Source Code Analysis At Massive Scale
|RefiNym: Using Names to Refine Types
|Darwinian Data Structure Selection
Michail Basios University College London, Lingbo Li University College London, UK, Fan Wu University College London, UK, Leslie Kanthan University College London, UK, Earl T. BarrDOI Pre-print
|Scalability-First Pointer Analysis with Self-Tuning Context-Sensitivity