Static program analysis is a powerful technique that reasons about a program’s behavior without actually executing the program. To balance between the precision and the efficiency of an analyzer, developers often manually tune-up analyzer’s parameters for a specific program. However, this task can be tedious and time-consuming. To automate the search for the optimal parameters for a program, researchers employ machine learning (ML) techniques, that from the existing data learn the relationship between the program and the optimal parameters, which it encodes in an ML model.

The existing, or training, data set, plays an important role in the correctness of an ML model. In this work we investigate whether automatically generated programs are adequate for training an ML model, which determines SPF’s configurations for a given Java method. To do this, we compare the performance of a model trained on real programs with that of a model trained on synthetic programs. Our results indicate that while synthetic programs are inadequate for training a model alone, adding them to the training set of real programs improves the classification power of the resulting model.

