CHOOSE Forum 2018: Software Engineering and Machine Learning

Map Unavailable

Date/Time
Date(s) - 03/09/2018
8:30 am - 5:30 pm

Location
University of Zurich, Department of Informatics (IFI)

The topic of the CHOOSE forum 2018 is Software Engineering and Machine Learning.

To seed discussions, we sampled the space with six high-profile talks from both academia and industry. Confirmed speakers for this year are: Earl T. Barr, Georgios Gousios, Marc Brockschmidt, Veselin Raychev, Maxim Podkolzine, and Prem Devanbu.

The CHOOSE Forum 2018 is organized by the Zurich Empirical Software engineering Team (ZEST) at the University of Zurich, on behalf of CHOOSE.

[su_divider top=”no” divider_color=”#a42015″ size=”2″]

Day schedule

When	What
08:30 – 08:45	Registration
08:45 – 09:00	Welcome and introduction
09:00 – 09:45	“Why is Software Natural?” Prof. Dr. Prem Devanbu, University of California, Davis, USA
09:45 – 10:00	Short Break
10:00 – 10:45	“Big Data in Software Engineering” Dr. Georgios Gousios, Delft University of Technology, The Netherlands
10:45 – 11:15	Coffee Break
11:15 – 12:00	“More Productive Software Engineers Through Deep Learning” Dr. Marc Brockschmidt, Microsoft Research, Cambridge, UK
12:00 – 12:20	CHOOSE General Assembly
12:20 – 13:30	Lunch
13:30 – 14:15	“Machine Learning at JetBrains” Maxim Podkolzine, JetBrains, Munich, Germany
14:15 – 14:30	Short Break
14:30 – 15:15	“Finding Program Vulnerabilities by Learning from Code Changes” Dr. Veselin Raychev, DeepCode.ai, Switzerland
15:15 – 15:45	Coffee Break
15:45 – 16:30	“Bimodal Software Engineering” Dr. Earl T. Barr, University College London, UK
16:30 – 17:15	Panel
17:15 – 17:30	Closing

[su_divider top=”no” divider_color=”#a42015″ size=”2″]

Speakers

Earl T. Barr

Bimodal Software Engineering

Source code is bimodal: it combines a formal algorithmic channel and a natural language channel of identifiers and comments. To date, most work has focused exclusively on a single channel. This is a missed opportunity because the two channels interact: the natural language often explains or summarizes the algorithmic channel, so information in one channel can be used to improve analyses of the other channel. A canonical bimodal fact is identifier named “secret” (NL channel) printed to the console (AL channel). To exploit such bimodal facts, one must overcome two challenges: find cross-channel synchronisation points and handle noise in the form of ambiguity in the NL channel and imprecision in the AL channel. Thus, bimodality is a natural fit for machine learning. I will present RefiNym, a bimodal analysis that models code with *name-flows*, a dataflow graph augmented to track identifier names. Conceptual types are logically different types that do not always coincide with program types. Passwords and URLs are example conceptual types that can share the program type String. RefiNym is an unsupervised method that mines a lattice of conceptual types from name-flows and reifies those conceptual types into distinct nominal types. For the String type, we show that RefiNym minimises co-occurrence of disparate conceptual types in the same scope by 42%, thereby making it harder for a developer to inadvertently introduce an unintended flow.

Earl T. Barr is a senior lecturer (associate professor) at the University College London. He received his Ph.D. at UC Davis in 2009. Earl’s research interests include bimodal software engineering, testing and analysis, and computer security. His recent work focuses on automated software transplantation, applying game theory to software process, and using machine learning to solve programming problems. Earl dodges vans and taxis on his bike commute in London.
Personal website