Musings of a Tech Transfer Enthusiast
Ever since graduate school I have been a man between two worlds. I value publishing, but authoring one more paper does not thrill me as it does my academic friends. I love writing tools that make developers' lives easier, but achieving wide commercial success for that tool does not drive me like it did my friends at Tasktop (i.e., a software tool vendor). Until recently, people like me, equally interested in research and practical impact were doomed to operate on the margins. We could work as a research developer implementing other people's ideas or as a software engineer desperately trying to align assigned projects to our interests. Fortunately, more and more companies are waking up to the value of those that can both dream and engineer, and applied researchers are beginning to thrive (and be hired!).
To the best of my knowledge there are at least two corporations that are clearly recruiting and rewarding applied researchers, ABB, through its Corporate Research centers, and Microsoft, through its Tools for Software Engineers group. As I currently work for ABB Corporate Research I'll focus on that group, but kudos to Microsoft, and especially Wolfram Schulte, for recognizing the need for applied innovation. It's always bothered me that so much great software engineering research gets lost in the publication machine; yet groups like mine, Wolfram's, and hopefully others I'm not aware of yet are working to remedy that situation.
At ABB Corporate Research the software engineering group's mandate is to create tools and processes that improve software quality or increase developer productivity (or hopefully both). In the past few years we have, for instance, implemented a state-of-the-art search tool for developers and used gamification to motivate software developers to adopt best practices. Yet, more interesting than what we have done so far (we are just getting started) is how we have done it. By explicitly valuing both innovative ideas *and* execution in terms of development we now have a rare combination of working software tools, engaged internal users, and publishable results. In a given day I'm just as likely to receive an email that says "I am using <your tool> and finding it extremely useful and found it much simpler and faster than <the respective default tool> present in Visual studio… Thank you for sharing this tool", a paper acceptance notification, or a grant collaboration request from an academic. And this is precisely the balance we are trying to achieve.
So, if you are like me, and you enjoy both research and impact, consider joining us here in sunny Raleigh, NC, because we are hiring. In addition to being a major tech hub for the East Coast, a great place to live, with a low cost-of-living, the RTP area is known for having great pig pickin's!
You Missed the Mother of All Demos
Gergő Balogh demonstrated his prototype, CodeMetropolis, which visualizes software systems as cities using the MineCraft engine (actually at SCAM, a co-located event). While this work is obviously influenced by earlier work on CodeCities, as well as the Source Viewer 3D project, it was a great example of how taking research one step further towards application really creates a wow factor. During this presentation everyone completely ignored their urgent emails and many formerly anti-social participants suddenly learned how to tweet! :)
You Missed srcML's Coming Out Party
For anyone working in software maintenance, srcML should be a familiar name. Since its inception over ten years ago it has had great success, being used in tens if not hundreds of projects (including several ABB projects), parsing millions and millions of LOCs, and even winning the ICPC's most influential paper this past year. Unfortunately, it is not as popular as it should be (some still insist in writing their own C or Java parser!), and so this year Jonathan Maletic and Michael Collard are getting serious about publicizing this well-tested project. Armed with their ~$800K infrastructure grant, they will be continually improving srcML over the next three years, working with the talented developer Michael Decker (full time on srcML!) to improve both the code and the community.
You Missed a Great TraceLab Talk
For those of you who haven't heard of TraceLab, it's "an experimental workbench for designing, constructing, and executing traceability experiments", such as evaluating feature location tools. While I was already familiar with TraceLab, I was very happy to hear the tone and content of their distinguished paper presentation. This talk was not only focused on the contributions of TraceLab, but on the need for increasing the reproducibility of results in software engineering. Their clear presentation of how difficult it currently is to reproduce feature location evaluation results, and how TraceLab can help fix this, gives me hope that I'll see better comparisons between evaluations in future work.
You Didn't Beat Michele Lanza in Football
ICSM has an annual tradition of hosting a football game. While this may seem odd for a research conference it's a great way to meet and interact with other researchers outside of coffee breaks and formal networking. More importantly, it's a ton of fun! For those of you that will join us next year, I've heard rumor that Michele Lanza's team has *never* lost an ICSM match, and this year was no exception. Will worthy challengers appear in 2014?
You Missed Meeting Many "Applied Innovators"
ICSM is a haven for what my friend Brian Robinson calls applied innovation. A working definition of applied innovation is work that is both interesting to academics (i.e., innovative, containing new ideas) and developers (i.e., it actually works and saves them time or money). While I've used Arie van Deursen (above) as an example, as he is a full professor at Delft and co-founder of two software spinoffs, I could have chosen many other participants at ICSM to make the same point (e.g., Jurgen Vinju and his group, creators of Rascal). The average participant at ICSM is, in my experience, more likely to have written working software that one can leverage, founded a company that is interacting with real customers, or applied their work on industrial source code. In my opinion, these applied innovators are exactly the type of people that are starting to bridge the large gap between research and practice, and I'm happy to have spent a week learning from them!
I hope you'll join us next year for SCAM or ICSM 2014.
In the past three weeks we have collected anonymous usage data from developers (with their knowledge and permission!!) using our search tool in Visual Studio. Learning from the failed Usage Data Collection effort in Eclipse, which perhaps had too broad a scope, our data collection is tightly focused on improving search, and nothing else. Thus, after reviewing even three weeks of data we have several useful conclusions. Primarily, that developers who utilize recommendations (shown above) when searching code are more satisfied with the results and execute 20% less failed queries than those not using recommendations.
As you can see in pane (b) above we collected data for 25 days. We collected 204 log files from 44 unique users (excluding all Sando developers). Interestingly, as a comparison, Eclipse's Usage Data Collection collected only 453 new users' data over a 14 day period in 2008, even though they were potentially collecting from all Eclipse users.
During this 25 day period we collected anonymous metrics about 363 user queries executed on local projects via Sando. As seen in pane (a) 90 utilized the recommendations provided by Sando. Of those 90, 41 were utilized pro-actively (e.g., by selecting an entry from a dropdown) and 49 were triggered automatically (e.g., when the user searched for a word that did not exist in the code base). Of the 41 active queries 25 were "lookup"-style searches, such as using the autocomplete to help remember the name of a method, and 16 were exploratory-style searches, such as searching for phrase like "open file".
Using the data from these 363 queries we tried to gauge user satisfaction level using a few different metrics. First, we measured the query failure rate. We considered a query to fail if a user did not review any of the results, as indicated via UI events. As you can see in pane (e) queries that utilized recommendations had a 20% lower failure rate, which certainly leads to higher user satisfaction.
We also used both the short-click and long-click satisfaction measures, as are often used in web search. The intuition is that if a user clicks and views a result and then immediately returns to the result list that result was not helpful (a short-click), where as if a user clicks and views a result for a long period before (if ever) returning that result was helpful. As you can see in pane (d) queries with and without recommendations provided about the same rate of long-clicks, which is a positive metric, whereas queries with recommendations showed a significantly lower short-click rate, which is good as short-clicks are a negative satisfaction indicator.
As I mentioned at the beginning of this post, we only collect information aimed at helping us improve code search. So what's useful about the information I've presented today? Well, in the short term, Sando users should start using recommendations, immediately. They will be more satisfied with the results and fail 20% less. While we intend to do our part to improve the recommendation UI, possibly the most useful thing about this data is that it gives Sando users field evidence to support feature adoption. And when you think about it...how often have your other software tools given you evidence that a new feature actually makes you better? ;)
Are you an aimless poker?
Are you a random scroller?
Are you a hopeless grepper?
An unranked searcher?
If you recognize any of these MO's from your own work, stop. Come by my 1pm talk on Friday and learn what I've learned from the best. The search and navigation strategies I'll demonstrate should shave off 20-40% of the time it takes to complete an average task. And bring you're laptop, we'll have fun squashing a bug together!
For those of you unfamiliar with academic research on software engineering, let me explain. Research in software engineering has generally consisted of "...inventing new tools or processes, describing their application to toy problems in academic journals, and then wondering why practitioners ignored them."
This state of affairs is a tragedy, because as you may remember from your college days, professors are often among the smartest, hardest working people you will meet... they just need some redirecting.
This is where you come in. If you work in the software industry on testing, verification, or validation please submit a brief writeup of your tool to the ICST Tools Track (3-8 pages). Accepted tools get 25 minute presentation slots in the tools track, which is a great place to promote cutting edge testing software to bright grad students (i.e., future project leads), get feedback from professors steeped in the latest testing techniques, and even find potential collaborators who can improve your tool as part of their research work. Join us as we bridge the divide between research and practice!
Why Industry Professionals Should Submit:
I realize that presenting at academic conferences is not the norm in industry, yet perhaps it should become more accepted. Here I've compiled a list of reasons why you should showcase your testing software at the ICST Tools Track:
Why Graduate Students Should Submit:
I realize that tools tracks are often overlooked by academics and graduate students, who focus on the more prestigious research track, but there are several reasons why this approach is short-sighted.
Why Both Should Attend:
Academics and practitioners have more in common than most think. Here are some reasons for submitting to ICST that speak to both parties:
Plan to Attend!
I hope I've convinced you of the value of applying to the ICST Tools Track and the larger mission of reuniting software engineering research with practice. I'd encourage you not only to apply yourself, but to encourage others to apply. I'd especially encourage you to bug your favorite tool vendor about presenting at ICST. I've listed a few of my favorites below... feel free to encourage them to join us and add your own suggestions in the comments.
Tools and teams I'd love to have attend:
This summer the Sando team has invested significant effort in making Sando more usable for all Visual Studio developers. One of the smaller, fun features we worked on is ensuring that Sando looks great in every theme. For instance, here's what Sando looks like in the default VS2012 theme (with some quality issues due to the limitations of animated gifs):
And here's what Sando looks like in the dark, Sublime-like theme:
Our latest release, which quietly went out just today, can be downloaded from Visual Studio Gallery. We'd love to hear any early feedback prior to our release posts!
Because of my time at Tasktop (i.e., an ISV focused on developer productivity) I'm a big believer in dogfooding the software tools that I am developing. Thus, if you find me working in Visual Studio you'll notice that I'm using Sando (a software search tool) to develop Sando. However, this can lead to some confusion during testing, as developers wonder whether they are testing the installed extension or the extension under development, a situation complicated by possible interactions with other installed extensions. For those developing Visual Studio Extensions my team has developed a simple strategy for avoiding any confusion and eliminating unwanted interactions.
The answer in a nutshell: change the target of your VS shortcut to...
C:\Program Files (x86)\Microsoft Visual Studio 11.0\Common7\IDE\devenv.exe" /RootSuffix ExtDev
In the figure to the left I have illustrated the normal process for deploying extensions to the Experimental Instance (i.e., the Visual Studio instance where you can test your extension). When you compile an extension in Visual Studio it is deployed to the Experimental Instance (iff: "Deploy VSIX content to experimental instance for debugging" is checked). In cases where you are only developing a single extension and are not dogfooding, working this way may be sufficient.
However, when dogfooding, note that there is a potential for confusion. When you execute a "Reset Visual Studio 2012 Experimental Instance" command all installed extensions are copied from your normal VS instance into the Experimental instance, as shown on the right side of the figure. This can cause two versions of your extension to exist in the Experimental Instance (although only one will run at a time). Furthermore, if any of the installed extensions interact with your extension you'll notice unexpected errors during testing. In my opinion, when developing VS extensions, it's best to isolate them in a clean Experimental Instance to avoid any configuration headaches.
A Simple Solution
To avoid any configuration issues we develop our extensions in a special instance of Visual Studio, which we label the "ExtDev" instance. We install all extensions that we use to develop, including those we are dogfooding, into this instance and we develop our extensions here. By working in this way, we keep our default Visual Studio instance clean (see upper left of figure). Thus, the only way for extensions to be deployed to the Experimental Instance is through a compliation in our "ExtDev" instance. Note that we still often run the "Reset" command but, because the default Visual Studio Instance has no extensions installed no extensions are copied to the Experimental Instance, allowing us to isolate our extension(s) in the Experimental Instance.
As mentioned above, it is trivial to begin working this way. Simply add "/RootSuffix ExtDev" to the target of your VS shortcut and always use this shortcut to open Visual Studio. One gotcha is that you can no longer open projects directly, by using the project-specific shortcuts, but we feel this is a small price to pay for bringing sanity to extension development. One additional limitation is that extensions can only be installed into the "ExtDev" instance through the extension manager (Tools > Extension Manager), and not by clicking on a .vsix file as that will install into the default Visual Studio.
We hope this tip helps bring some sanity to your manual testing process, as it has for us. We also hope you'll visit us again, as we plan to share more tips and best practices for developing Visual Studio Extensions. As we're learning as we go, please feel free to suggest additions/changes in the comments.
ABB has just published its RFP for the 2013 university grant program which allocates grants of 50-80 KUSD to academic researchers. Eight page proposals are due on September 30th on a variety of topics, both related to software engineering and other topics. I have extracted my three favorite topics from the RFP and am posting them here, hoping to inspire some great proposals.
Topic 6: Bug Reproduction Productivity Techniques and Tools
Bugs and security vulnerabilities in industrial software systems are notoriously difficult to reproduce, at least partially due to the complex software and hardware systems within which individual products operate. Critical memory leak bugs, library version mismatches, and even straightforward bugs can take hours if not days to reproduce. ABB seeks solutions to reduce the time necessary to reproduce software bugs, including (but not limited to) automatic provisioning of test configurations from a bug report, remote debugging tools, remote data collection tools or strategies, or even well-defined studies around incentivizing/training end-users to provide high-quality bug reports.
Topic 9: Testing Tools and Techniques for IEC 61131-3 Languages
The IEC 61131-3 languages are the standard for industrial automation, used by domain experts for a variety of large automation projects, such as controlling the lighting in commercial complexes or precisely coordinating large-scale chemical processes. Because IEC 61131-3 sub-languages are both graphical (e.g., function block diagrams) and textual (e.g., structured text) there exists no clear unifying strategy or tooling for
testing. ABB seeks the creation of a testing framework and appropriate tooling for IEC 61131-3 languages targeting both graphical and textual sub-languages. While innovative additions, such as the inclusion of live programming environments for creating test cases (e.g., http://arxiv.org/pdf/1212.1796.pdf) are encouraged, proposals will be evaluated primarily on their ability to seamlessly test all sub-languages.
Topic 11: Fine-Grained Developer Productivity Metrics Suite
While it is unlikely that a single metric can be produced to fairly assess a developer's individual productivity, certain types of behavior can be measured (e.g., the amount of structured versus unstructured navigation, as was studied in "How effective developers investigate source code: An exploratory study"). ABB seeks the definition of key developer behaviors and an accompanying suite of metrics as well as ultra-low impact approaches to automatically collecting that data, preferably in the IDE but possibly by analyzing relevant artifacts.
The announcement is available here and the full list of topics here.
As developers, most of us use an IDE because it takes care of the messy details of development, allowing us to focus our full attention on higher-level tasks. Most common IDE features work well, yet unfortunately there is one tool embedded within modern IDEs that is failing us: the search tool.*
Recent studies show that about 88% of developers' searches fail, but because search tools are bundled within an otherwise well-performing IDE most developers do not recognize the problem, and some even deny it exists. Today's post serves to dispel this myth. Three authors of search tools including myself present the case for upgrading your current search tooling.
*In this article we focus on local search tools. These tools are used to search your on-disk project. Examples include 'Find-in-Files' in Visual Studio, 'File Search' in Eclipse, or even grep. We are not discussing web-based searches of code repositories, such as GrepCode or GitHub's search.
Entrian Source Search (Visual Studio)
No matter how good we are as developers, we can't hold the whole source tree in our heads at once - we need tools to help navigate it. IDE's are a big help - the F12 key on my keyboard (Visual Studio's Go to Definition command) is well worn. But the tools that IDEs provide don't have the power, coverage or speed that a full-text indexed search engine gives you. As one Source Search user puts it: "Our products are complex enough that not everything lives in C#: there's XML, PowerShell scripts, WiX config, custom build actions in the .csproj files... only a full-text search engine will do what I want."
Anyone remember when the Yahoo Directory was how you found things on the internet? Life without Google, yow... but that's what developing without a good search tool is. Imagine the leap from Yahoo Directory to Google, but for your coding experience. Obviously speed is a big part of that - anything that breaks your train of thought, interrupts your flow, is a bad thing, so instantaneous search is a big help. But it's also about being able express what you're looking for, both more exactly ("it's in the Renderer directory, in a .cpp file that I've modified today") and more vaguely ("it's something like InterpolateSomething(), and there's a comment that talks about quaternion rotation").
I believe that a good source code search engine will one day be one of the tools of the trade that we all take for granted, like syntax coloring or networked source control - yes, you could develop without those things (and I'm just about old enough to remember when that was normal!) but you'd feel like you were working with one hand tied behind your back. Better tools make us better developers, and powerful code search is one of those tools that you quickly wonder how you ever did without.
From my experience good code search is essential in an IDE and can be a massive productivity boost. This especially true when working with large codebases and I wrote InstaSearch out of my own need to find code in a million LOC projects. In a way this is similar to a desktop search but there are aspects specific to source code. You have to be able to search for words of variable names independent of the naming convention (camel case, underscore delimited) and to search inside specific code projects. And, of course, it helps a lot if the search is fast since you can tweak your search and see the results change immediately.
The two main use cases for search in IDEs that I'm noticing are: 1) I am looking for an exact string, 2) I don't know exactly what I'm looking for but I want to find code related to a few words. The first one is for looking up all occurrences of some text such as a constant or a method name. The second one is for discovering new code and finding where a particular functionality is implemented. I think code search tools need to support both these use cases to be effective. One way to enhance code search even further is to take advantage of static code information such as classes and methods.
David C. Shepherd
Sando Code Search Tool (Visual Studio)
In the past I have presented arguments as to why searches fail, why an information retrieval approach is better than a regex-based approach, and even pointed out the obvious superiority of ranked results over a flat list. Today, however, I want to guide you through a few searches in your own IDE, as many developers (myself included!) do not truly understand the depth of the issue until they experience it in their own code base. So please humor me by opening your IDE and trying out the following three searches:
Ready for an Upgrade?
Today we have presented our case for why you should upgrade your search tooling, and I hope we have convinced you. If our arguments have hit home here is a list of known search tools available for popular IDEs that are based on information retrieval technology. Enjoy!
When using code search tools, such as Find in Files, developers don't want to waste mental energy crafting the perfect search query, they just want to find relevant code. Previous versions of the Sando Code Search Extension helped by providing users with conceptual autocompletion. Even with this help we found users creating ill-advised queries (including very smart, experienced developers). For instance, users would perform literal searches (e.g., document.Add) without adding the required quotes (e.g., "document.Add"). Or they'd enter a (previously) unsupported query format, like a wildcard query (e.g., custom*Document). To make it more natural for developers to interact with Sando we expanded our query interpretation code in today's release. Developers can enter a keyword query (e.g., open file), a literal query (e.g., File.Open), or even a wildcard query (e.g., Document*Create) and expect Sando to take care of the details of returning relevant results.
Please give Sando's new query interpretation code a spin and let us know what you think! If you like this new direction we'll look into even better query interpretation in the future.
David Shepherd leverages software engineering research to create useful additions to the IDE.