The topic of the CHOOSE forum 2017 is Software and Data Engineering.
Software plays a pivotal role in almost all aspects of our life. We put trust in software to accomplish complex and vital tasks for us, such as managing our finances, diagnosing diseases, flying airplanes or driving cars. These systems manipulate and generate an unprecedented amount of data. In such a scenario, software cannot be understood without its data and data becomes valuable only thanks to the software analyzing it.
Tutorials’ slides available here.
Automated Search Based Test Case Design at Facebook Scale
This talk will describe recent work at Facebook on Search Based Software Engineering (SBSE) for automated test case design.
Mark Harman is an engineering manager at Facebook London, where he manages a team, working on Search Based Software Engineering (SBSE) at Facebook Scale. He is also a part time professor of Software Engineering in the Department of Computer Science at University College London, where he directed the CREST centre for ten years (2006-2017) and was Head of Software Systems Engineering (2012-2017). He is known for work on source code analysis, software testing, app store analysis and empirical software engineering. He was the co-founder of the field SBSE, which has grown rapidly with over 1,700 scientific publications from authors spread over more than 40 countries. SBSE research and practice is now the primary focus of his current work in both the industrial and scientific communities.
Software Productivity Decoded: How Data Science helps to Achieve More
Many companies are looking into understanding and improving productivity of individual software developers as well as software teams. In this talk, I will motivate the need for data analytics in software teams and describe how data scientists work in a large software companies helping software teams to infer actionable insights. I will then show how data from software development can be used to learn more about the productivity of organizations, teams, and individuals and help them to become more effective in building software.
Thomas Zimmermann is a Senior Researcher in the Research in Software Engineering group at Microsoft Research, Redmond, USA. His research interests include software productivity, software analytics, recommender systems, and games research. He is best known for his research on systematic mining of software repositories to conduct empirical studies and to build tools to support developers and managers. His work received several awards, including Ten Year Most Influential Paper awards at ICSE’14 and MSR’14, ’15, and 17′, five ACM SIGSOFT Distinguished Paper Awards, and a CHI Honorable Mention. He currently serves as Program Co-Chair for ICSME 2017. He is Co-Editor in Chief of the Empirical Software Engineering journal and serves on the editorial boards of several journals, including the IEEE Transactions on Software Engineering. He received his PhD in 2008 from Saarland University in Germany. His homepage is http://thomas-zimmermann.com.
Daring to Do Projects Others Do Not Dare to Dream
Somewhere between forward engineering of compilers for new languages and reverse engineering legacy software systems, there is an interesting area of industrial research and development: reverse engineering legacy languages and building tools for their automated retiring or upgrade. This talk will go through a typical project of compiler (re)development for a legacy language, facing the usual challenges: the lack of documentation, dealing with ancient paradigms and mindsets, large scale software analysis and transformation, refactoring while preserving undefined semantics, inferring the original developers’ intent by code analysis, etc. Reasonably obfuscated examples will be provided from various projects in current and recent development by the presenter and his colleagues.
Born in 1965, Darius has a master’s degree and a PhD from the Université Libre de Bruxelles. His focus is legacy modernization, articulated around compilers for legacy languages. Darius is the founder and CEO of Raincode (www.raincode.com), main designer and implementer of its core technology, an acclaimed speaker in academic and industrial circles.
Ahmed E. Hassan
Are we drinking too much Big Data and Machine Learning Kool-Aid?!
Is it always better to add more projects to our studies? Should we be using the fanciest Machine Learning (ML) techniques? With the ease of access to data about large projects (e.g., PROMISE and MSR Data Showcase), and the ease of access to advanced ML toolkits (e.g., R, Weka, and Scikit-learn), we are seeing more and more papers going for bigger and fancier (with reviewers quite often demanding so). In this talk, I will take a critical look at how bigger and fancier often leads to way less!
I will discuss how such a blind rush to using as much data, and as fancy ML as possible is quite often risking the validity of our empirical findings, even when combined with careful qualitative analysis of some of the studied projects.
Throughout the talk, I will provide concrete examples of such risks while proposing best practices that the software analytics community needs to follow to avoid such risks.
Ahmed E. Hassan is the Canada Research Chair (CRC) in Software Analytics, and the NSERC/BlackBerry Software Engineering Chair at the School of Computing at Queen’s University, Canada. Dr. Hassan serves on the editorial boards of the IEEE Transactions on Software Engineering, Springer Journal of Empirical Software Engineering, and PeerJ Computer Science. He spearheaded the organization and creation of the Mining Software Repositories (MSR) conference and its research community. Early tools and techniques developed by Dr. Hassan’s team are already integrated into products used by millions of users worldwide. Dr. Hassan industrial experience includes helping architect the Blackberry wireless platform, and working for IBM Research at the Almaden Research Lab and the Computer Research Lab at Nortel Networks. Dr. Hassan is the named inventor of patents at several jurisdictions around the world including the United States, Europe, India, Canada, and Japan. More information at: http://sail.cs.queensu.ca/
Inferring procedure specifications for automated testing
Procedure specifications are useful in many software development tasks. As one example, in automatic test case generation they can guide testing, act as test oracles able to reveal bugs, and identify illegal inputs. Whereas formal specifications are seldom available in practice, it is standard practice for developers to document their code with semi-structured comments such as Doxygen, Javadoc, RDoc, and Sphinx. These comments express the procedure specification with a mix of predefined tags and natural language. We present Toradocu, an approach that combines natural language parsing, pattern matching, and semantic similarity to translate Javadoc comments into executable procedure specifications written as Java expressions. The tool achieves better accuracy than the other similar tools in the state of the art. We supplied the automatically derived specifications to Randoop, an automated test case generation tool. The specifications enabled Randoop to generate test cases that reveal more defects and produce fewer false alarms.
Alessandra Gorla is an assistant researcher professor at the IMDEA Software Institute in Madrid, Spain. She received her Bachelor’s and Master’s degrees in computer science from the University of Milano-Bicocca in Italy. She completed her Ph.D. in informatics at the Università della Svizzera italiana in Lugano (USI), Switzerland in 2011. Before joining IMDEA Software Institute in December 2014, she has been a postdoctoral researcher in the software engineering group at Saarland University in Germany and a visiting researcher at Google. Alessandra is regularly serving as program committee member of top tier software engineering conferences. Her research interests are in malware detection for mobile applications, automatic software repair, software testing and analysis.