Musings of a Tech Transfer Enthusiast
You Missed the Mother of All Demos
Gergő Balogh demonstrated his prototype, CodeMetropolis, which visualizes software systems as cities using the MineCraft engine (actually at SCAM, a co-located event). While this work is obviously influenced by earlier work on CodeCities, as well as the Source Viewer 3D project, it was a great example of how taking research one step further towards application really creates a wow factor. During this presentation everyone completely ignored their urgent emails and many formerly anti-social participants suddenly learned how to tweet! :)
You Missed srcML's Coming Out Party
For anyone working in software maintenance, srcML should be a familiar name. Since its inception over ten years ago it has had great success, being used in tens if not hundreds of projects (including several ABB projects), parsing millions and millions of LOCs, and even winning the ICPC's most influential paper this past year. Unfortunately, it is not as popular as it should be (some still insist in writing their own C or Java parser!), and so this year Jonathan Maletic and Michael Collard are getting serious about publicizing this well-tested project. Armed with their ~$800K infrastructure grant, they will be continually improving srcML over the next three years, working with the talented developer Michael Decker (full time on srcML!) to improve both the code and the community.
You Missed a Great TraceLab Talk
For those of you who haven't heard of TraceLab, it's "an experimental workbench for designing, constructing, and executing traceability experiments", such as evaluating feature location tools. While I was already familiar with TraceLab, I was very happy to hear the tone and content of their distinguished paper presentation. This talk was not only focused on the contributions of TraceLab, but on the need for increasing the reproducibility of results in software engineering. Their clear presentation of how difficult it currently is to reproduce feature location evaluation results, and how TraceLab can help fix this, gives me hope that I'll see better comparisons between evaluations in future work.
You Didn't Beat Michele Lanza in Football
ICSM has an annual tradition of hosting a football game. While this may seem odd for a research conference it's a great way to meet and interact with other researchers outside of coffee breaks and formal networking. More importantly, it's a ton of fun! For those of you that will join us next year, I've heard rumor that Michele Lanza's team has *never* lost an ICSM match, and this year was no exception. Will worthy challengers appear in 2014?
You Missed Meeting Many "Applied Innovators"
ICSM is a haven for what my friend Brian Robinson calls applied innovation. A working definition of applied innovation is work that is both interesting to academics (i.e., innovative, containing new ideas) and developers (i.e., it actually works and saves them time or money). While I've used Arie van Deursen (above) as an example, as he is a full professor at Delft and co-founder of two software spinoffs, I could have chosen many other participants at ICSM to make the same point (e.g., Jurgen Vinju and his group, creators of Rascal). The average participant at ICSM is, in my experience, more likely to have written working software that one can leverage, founded a company that is interacting with real customers, or applied their work on industrial source code. In my opinion, these applied innovators are exactly the type of people that are starting to bridge the large gap between research and practice, and I'm happy to have spent a week learning from them!
I hope you'll join us next year for SCAM or ICSM 2014.
In the past three weeks we have collected anonymous usage data from developers (with their knowledge and permission!!) using our search tool in Visual Studio. Learning from the failed Usage Data Collection effort in Eclipse, which perhaps had too broad a scope, our data collection is tightly focused on improving search, and nothing else. Thus, after reviewing even three weeks of data we have several useful conclusions. Primarily, that developers who utilize recommendations (shown above) when searching code are more satisfied with the results and execute 20% less failed queries than those not using recommendations.
As you can see in pane (b) above we collected data for 25 days. We collected 204 log files from 44 unique users (excluding all Sando developers). Interestingly, as a comparison, Eclipse's Usage Data Collection collected only 453 new users' data over a 14 day period in 2008, even though they were potentially collecting from all Eclipse users.
During this 25 day period we collected anonymous metrics about 363 user queries executed on local projects via Sando. As seen in pane (a) 90 utilized the recommendations provided by Sando. Of those 90, 41 were utilized pro-actively (e.g., by selecting an entry from a dropdown) and 49 were triggered automatically (e.g., when the user searched for a word that did not exist in the code base). Of the 41 active queries 25 were "lookup"-style searches, such as using the autocomplete to help remember the name of a method, and 16 were exploratory-style searches, such as searching for phrase like "open file".
Using the data from these 363 queries we tried to gauge user satisfaction level using a few different metrics. First, we measured the query failure rate. We considered a query to fail if a user did not review any of the results, as indicated via UI events. As you can see in pane (e) queries that utilized recommendations had a 20% lower failure rate, which certainly leads to higher user satisfaction.
We also used both the short-click and long-click satisfaction measures, as are often used in web search. The intuition is that if a user clicks and views a result and then immediately returns to the result list that result was not helpful (a short-click), where as if a user clicks and views a result for a long period before (if ever) returning that result was helpful. As you can see in pane (d) queries with and without recommendations provided about the same rate of long-clicks, which is a positive metric, whereas queries with recommendations showed a significantly lower short-click rate, which is good as short-clicks are a negative satisfaction indicator.
As I mentioned at the beginning of this post, we only collect information aimed at helping us improve code search. So what's useful about the information I've presented today? Well, in the short term, Sando users should start using recommendations, immediately. They will be more satisfied with the results and fail 20% less. While we intend to do our part to improve the recommendation UI, possibly the most useful thing about this data is that it gives Sando users field evidence to support feature adoption. And when you think about it...how often have your other software tools given you evidence that a new feature actually makes you better? ;)
For those of you unfamiliar with academic research on software engineering, let me explain. Research in software engineering has generally consisted of "...inventing new tools or processes, describing their application to toy problems in academic journals, and then wondering why practitioners ignored them."
This state of affairs is a tragedy, because as you may remember from your college days, professors are often among the smartest, hardest working people you will meet... they just need some redirecting.
This is where you come in. If you work in the software industry on testing, verification, or validation please submit a brief writeup of your tool to the ICST Tools Track (3-8 pages). Accepted tools get 25 minute presentation slots in the tools track, which is a great place to promote cutting edge testing software to bright grad students (i.e., future project leads), get feedback from professors steeped in the latest testing techniques, and even find potential collaborators who can improve your tool as part of their research work. Join us as we bridge the divide between research and practice!
Why Industry Professionals Should Submit:
I realize that presenting at academic conferences is not the norm in industry, yet perhaps it should become more accepted. Here I've compiled a list of reasons why you should showcase your testing software at the ICST Tools Track:
Why Graduate Students Should Submit:
I realize that tools tracks are often overlooked by academics and graduate students, who focus on the more prestigious research track, but there are several reasons why this approach is short-sighted.
Why Both Should Attend:
Academics and practitioners have more in common than most think. Here are some reasons for submitting to ICST that speak to both parties:
Plan to Attend!
I hope I've convinced you of the value of applying to the ICST Tools Track and the larger mission of reuniting software engineering research with practice. I'd encourage you not only to apply yourself, but to encourage others to apply. I'd especially encourage you to bug your favorite tool vendor about presenting at ICST. I've listed a few of my favorites below... feel free to encourage them to join us and add your own suggestions in the comments.
Tools and teams I'd love to have attend:
When using code search tools, such as Find in Files, developers don't want to waste mental energy crafting the perfect search query, they just want to find relevant code. Previous versions of the Sando Code Search Extension helped by providing users with conceptual autocompletion. Even with this help we found users creating ill-advised queries (including very smart, experienced developers). For instance, users would perform literal searches (e.g., document.Add) without adding the required quotes (e.g., "document.Add"). Or they'd enter a (previously) unsupported query format, like a wildcard query (e.g., custom*Document). To make it more natural for developers to interact with Sando we expanded our query interpretation code in today's release. Developers can enter a keyword query (e.g., open file), a literal query (e.g., File.Open), or even a wildcard query (e.g., Document*Create) and expect Sando to take care of the details of returning relevant results.
Please give Sando's new query interpretation code a spin and let us know what you think! If you like this new direction we'll look into even better query interpretation in the future.
I read a few articles and a book about what companies usually look for in a full-time candidate and was wondering whether I could fit the bill with the research project in place of the usual internship.
Recently I gave a guest lecture at NCSU for their software engineering course. One of the students I chatted with afterwards will be using this summer to work on his research thesis. This particular thesis work involves significant software development. Having just heard my talk on the importance of gaining software development experience prior to graduation he was wondering how to get the most out of his summer work. Here is my advice for both this student and any other student taking on summer research.
Open Source Your Work
Academia affords students great opportunities for opening up their work. While some specific DoD grants may have restrictions, most typical grants (e.g., NSF or industrial grants) allow or even encourage open sourcing your work. As a student, open sourcing your research code will have two major benefits. First, your code will become much cleaner. While many developers may have some bad habits they relapse into when committing to a closed repository, when they write code they know will be public the code is simply better. This will not only make for a more pleasant summer but will also help you as you begin to build a programmer's portfolio, or a set of publicly available projects that you have written. This second benefit of open sourcing your code, while not a new idea, this is a surprisingly simple way of gaining credibility with potential employees. For instance, two of the three interns that made it past the first round at ABB this year provided us with links to their publicly available code (note: the third provided a live demo of their work).
Collaborate if at all Possible
While generating a sizable project alone can certainly be impressive working with others is an important part of experience that many students are lacking. If at all possible, collaborate with other researchers and students on your code. This can be as simple as building upon an existing framework or a set of libraries that another student has generated. For instance, during my time at UDel summer undergrad students would often start from an existing tool (e.g., a search tool) and improve a single component of that tool (e.g., an abbreviation expander), investigating the effect of that component's improvement on the overall performance. This forced them to not only understand that existing component, but to also learn how that component interacted with the larger system. In most research labs there is ample opportunity to build upon others' work.
I have a planned my schedule for the summer and I notice that I could spare time in a day for projects/ work outside my thesis.
One point I want to warn against is losing focus. While a summer may seem like a long time, it is short. So short, in fact, that at ABB we communicate with upcoming interns for months in advance to hammer out a well-scoped idea so that when they join us for the summer they can have a chance at finishing something significant. Thus, I advise students to focus on a single project. When doing this, it is important that this project have a significant coding component, as this student has, so that, should he/she have any extra time they can continue to polish and refactor and improve this code base ad infinitum. If a student can finish the summer with a well-polished research prototype, with clean code and a working demo, in addition to an (eventually) completed thesis, that student will certainly have good job prospects.
Enjoy the Summer...
I hope this post helps you as you organize your summer. To summarize, I recommend open sourcing as much as you can, working with others whenever possible, and focusing on a single project. Following these guidelines I'm sure that you can have a polished, well-narrated demo video posted on youtube by the end of the summer (add a link in the comments!). For those of you pursing PhDs in software engineering that enjoy both research AND developing software tools, consider applying to our internship program next year.
Many software engineers (i.e., those that actually program on a daily basis) are unaware of the dedicated sub-field of software engineering researchers (like these from Microsoft) whose mission is to help make the daily grind of writing software better. While software engineering researchers have historically had limited practical impact, there are some notable companies and tools that were born out of software engineering research, and many brilliant, driven individual researchers who want to have impact. I hope that raising awareness of this research among software engineers will encourage more feedback to the software engineering research community, ultimately leading to more useful output.
To that end, I'm posting a short 3 minute video that provides a quick overview of a typical software engineering research project at ABB. I hope this video gives you a better sense of what we're working on, our balance between theoretical and practical impact, and what types of technologies we're investing in. If you're a software engineer we'd love to hear what you think about our research directions, how we could improve, or even just what problems are currently slowing you down. Feel free to leave comments or even contact me directly.
Make it work with 2012! :) -naspinski, from reddit
Well, the people have spoken, and we've (finally) delivered. The Sando code search extension is now available in VS2012 (in addition to VS2010), thanks in large part to Kosta Damevski. As we release this version I realize that not all readers are familiar with Sando's Raison d'être. For those of you new to Sando here's the top few reasons why we've spent the last year creating it.
Above I've quickly described a few reasons that we expect Sando to perform better than available regex-based tools and I've used a few scenarios to explain why. However, its important to know that Sando is not primarily based on my personal insights. Sando is built upon the huge body of code search research, started by Andrian Marcus's thesis work and so ably continued by researchers like Denys Poshyvanyk, Dawn Lawrie and David Binkley, Lori Pollock, Emily Hill, and many others. Thus, you can download and use Sando, assured that it's providing you with high-quality search results influenced by cutting edge advances in software engineering research.
Sando is available as a Visual Studio extension for VS2010 and VS 2012
The ABB team is headed out to FSE next week! While we regularly work with academics (e.g., Jonathan Maletic, Michael Collard, Lori Pollock, Thomas Fritz, Kostadin Damevski, Emerson Murphy-Hill, Gregg Rothermel, Myra Cohen, Mary Jean Harrold...) it's often over Skype, and so we are looking forward to meeting the academic community in person. Please stop by our table, where we'll have ABB tumblers to give away, or grab us during a coffee break. We are always looking for next year's interns, possibilities for collaborations, sabbatical opportunities, or students graduating on the next cycle... so don't be shy!
One of the best ways you can get to know what our research group is up to is to visit us at one of our three tool demo sessions. They are:
10:30-11:45 Research Tool Demos, Session Chair: Brian Robinson, ABB Corporate Research
Automating Adaptive Maintenance Changes with SrcML and LINQ
Vinay Augustine,ABB Corporate Research
13:00-14:15 Research Tool Demos
Practical Change Impact Analysis Based on Static Program Slicing for Industrial Software Systems
Mithun Acharya and Brian Robinson, ABB Corporate Research
Sando: An Extensible Local Code Search Framework
David Shepherd, ABB, Inc.Kostadin Damevski, Virginia State University
Bartosz Ropski, Autodesk, Inc. Krakow, Poland, Thomas Fritz, University of Zurich
If you'd like to read about some of these projects ahead of time you can check out this video on Sando, visit the srcML.Net code repository, or review one of Mithun's papers on slicing industrial scale programs. See you soon!
Who says Microsoft's the only company interested in pushing the boundaries of software engineering forward? Not ABB, as I'm happy to announce the second running of the ABB Software Research Grant Program. These grants, ranging from 20K to 80K, are focused on several topic areas with at least two (shown below) being highly relevant for software engineering researchers. Researchers whose work has focused on mining software repositories or working with task context should consider applying!
Important Notes on ABB Software Research Grant Program
Note that this program is only in its second year of running and its organizers are still in the process of balancing ABB's interests with universities' interests. This means that the IP policy outlined for the grants may be too restrictive for some universities to participate, especially for US universities. If this is the case for you I recommend sending email to the provided contact on the grant page (so they can get a sense on whether this is a problem for many or only a few). Also note that this is my personal blog and all thoughts expressed here are my own and not representative of ABB's official policies.
David Shepherd leverages software engineering research to create useful additions to the IDE.