This spring I'll be presenting at several code camps and as well as online. Please see "Upcoming Talks" below for details. While I've blogged about code search many times in the past, this post contains new developer data, links to non-Sando search tools, and my latest thoughts on code search.
How many times do you Google per day? I'd estimate 20+ times for myself, but across all Internet users the average is much lower, at 2.46 Googles per day (5,922,000,000 Googles per day/2,405,518,375 Internet users). It's amazing how low this average is!
Now, for you developers out there, how many times do you search your code per day, perhaps looking for a feature's implementation while fixing a bug... once or twice? Nope. We collected anonymous activity from 60 developers for a total of 1750 developer days and saw that developers interact with known search tools in the Visual Studio IDE 34+ times per day! On average! Extrapolating from the Google example, I wonder if some developers are searching a hundred times per day on occasion...
Unfortunately, while developers are searching, empirical studies show they aren't finding. One study of developers performing maintenance tasks showed that 88% of developers' searches fail! To me, this is no surprise. Current search tooling inside modern IDEs is regex-based, and bound to fail for developers with less-than-perfect memories. Why don't developers upgrade search tools that fail so often, especially one that they interact with 34+ times per day?
The answer is no secret: until recently, there were no better search options. But now there are, and it is time for developers to remove constant failure from their day. In an upcoming set of talks, I'll detail why Lucene-based search tools work better for most developers. I'll talk about the different types of searches (e.g., symbol lookups vs exploratory searches), when to use built-in tooling (e.g., NavigateTo), and when to leverage advanced techniques (e.g., Sando or Entrian Source Search). Finally, I'll even provide advice on when uber-scalable server-based solutions (e.g., OpenGrok) are more appropriate.
This past week I attended CSMR-WCRE (soon to be called SANER) where I enjoyed Andi Marcus's acceptance talk for the most influential paper from 2004. As I listened to his talk, I realized that his work on applying IR techniques to more effectively search source code was originally presented at the second conference I ever attended. It's funny how this work is still influencing my work with Sando today, yet at the same time a bit sad how far code search has to go to gain mainstream acceptance.
Fortunately, while attending CSMR-WCRE, enjoying some great code search talks from both academia and industry, it hit me. Code search is extremely well-positioned to make a real impact in the next few years. Here's why:
Researchers are focusing on human aspects
In the early days of code search research, like in most fields, there was an appropriate but strong focus on better algorithms, higher precision and recall, and automated evaluations. Yet, because of the great progress on these fronts researchers like Sonia Haiduc, Emily Hill, and myself are starting to focus on the human aspects of code search, like query re-formulations, query recommendations, and search-appropriate UI. While human aspects may seem soft to hardcore algorithmic researchers consider this: a UI/UX overhaul for Sando in 2013 had a lot to do with the 10x increase in downloads over 2012. I believe progress on these long-neglected research areas may push code search over the top.
Researchers have the infrastructure to make rapid progress
One of the long known problems with code search tools was the difficulty of evaluating new search algorithms and, even harder, comparing two competing search algorithms. As documented in Bogdan and Denys's 2012 survey, only 30 papers out of 89 included an evaluation that compared the new approach against any existing approach. Fast forward to 2014 and we now have at least two infrastructures for evaluating code search algorithms, one allowing us to quickly prototype new algorithms and test them in-lab and one allowing us to further validate them via field data.
First, we have TraceLab, a framework for creating and carrying out search algorithm evaluations in a reproducible manner. I've seen TraceLab used, for instance, to show the effect of including and excluding commonly used components like splitters and stemmers into code search algorithms. Thus, for researchers who develop any improvement on top of existing approaches, evaluating their work against the state-of-the-art is as simple as implementing a TraceLab component.
Second, once researchers determine a new top algorithm in the lab using TraceLab, we also have a framework for evaluating that new algorithm in the field, Sando. Assuming that the algorithm can be plugged into Sando, we've recently shown that anonymous activity stream metrics correlate with user satisfaction, and thus by watching these metric we can determine whether the new algorithm improves performance in the field too.
I believe that this infrastructure will dramatically reduce the time-to-market for new ideas coming out of research. Each new idea, quickly tested in TraceLab, validated in the field by Sando, can impact users in a matter of months instead of years.
Calling all students!
So, PhD students looking for a topic, you may have thought that code search research was quieting down. I would argue the opposite; the research done in the next five years will likely be what appears in practice. There are obvious needs in terms of better query recommendation/expansion algorithms, more software specific synonym information, ideas for dealing with abbreviation-heavy domains, an improved code search UI design (i.e., lists vs graphs), and even the performant use of structural information to improve code search results. The infrastructure's available, the low-hanging fruit is available.. come join us for some of the most exciting work of the next five years!
Ever since graduate school I have been a man between two worlds. I value publishing, but authoring one more paper does not thrill me as it does my academic friends. I love writing tools that make developers' lives easier, but achieving wide commercial success for that tool does not drive me like it did my friends at Tasktop (i.e., a software tool vendor). Until recently, people like me, equally interested in research and practical impact were doomed to operate on the margins. We could work as a research developer implementing other people's ideas or as a software engineer desperately trying to align assigned projects to our interests. Fortunately, more and more companies are waking up to the value of those that can both dream and engineer, and applied researchers are beginning to thrive (and be hired!).
To the best of my knowledge there are at least two corporations that are clearly recruiting and rewarding applied researchers, ABB, through its Corporate Research centers, and Microsoft, through its Tools for Software Engineers group. As I currently work for ABB Corporate Research I'll focus on that group, but kudos to Microsoft, and especially Wolfram Schulte, for recognizing the need for applied innovation. It's always bothered me that so much great software engineering research gets lost in the publication machine; yet groups like mine, Wolfram's, and hopefully others I'm not aware of yet are working to remedy that situation.
At ABB Corporate Research the software engineering group's mandate is to create tools and processes that improve software quality or increase developer productivity (or hopefully both). In the past few years we have, for instance, implemented a state-of-the-art search tool for developers and used gamification to motivate software developers to adopt best practices. Yet, more interesting than what we have done so far (we are just getting started) is how we have done it. By explicitly valuing both innovative ideas *and* execution in terms of development we now have a rare combination of working software tools, engaged internal users, and publishable results. In a given day I'm just as likely to receive an email that says "I am using <your tool> and finding it extremely useful and found it much simpler and faster than <the respective default tool> present in Visual studio… Thank you for sharing this tool", a paper acceptance notification, or a grant collaboration request from an academic. And this is precisely the balance we are trying to achieve.
So, if you are like me, and you enjoy both research and impact, consider joining us here in sunny Raleigh, NC, because we are hiring. In addition to being a major tech hub for the East Coast, a great place to live, with a low cost-of-living, the RTP area is known for having great pig pickin's!
You Missed the Mother of All Demos
Gergő Balogh demonstrated his prototype, CodeMetropolis, which visualizes software systems as cities using the MineCraft engine (actually at SCAM, a co-located event). While this work is obviously influenced by earlier work on CodeCities, as well as the Source Viewer 3D project, it was a great example of how taking research one step further towards application really creates a wow factor. During this presentation everyone completely ignored their urgent emails and many formerly anti-social participants suddenly learned how to tweet! :)
You Missed srcML's Coming Out Party
For anyone working in software maintenance, srcML should be a familiar name. Since its inception over ten years ago it has had great success, being used in tens if not hundreds of projects (including several ABB projects), parsing millions and millions of LOCs, and even winning the ICPC's most influential paper this past year. Unfortunately, it is not as popular as it should be (some still insist in writing their own C or Java parser!), and so this year Jonathan Maletic and Michael Collard are getting serious about publicizing this well-tested project. Armed with their ~$800K infrastructure grant, they will be continually improving srcML over the next three years, working with the talented developer Michael Decker (full time on srcML!) to improve both the code and the community.
You Missed a Great TraceLab Talk
For those of you who haven't heard of TraceLab, it's "an experimental workbench for designing, constructing, and executing traceability experiments", such as evaluating feature location tools. While I was already familiar with TraceLab, I was very happy to hear the tone and content of their distinguished paper presentation. This talk was not only focused on the contributions of TraceLab, but on the need for increasing the reproducibility of results in software engineering. Their clear presentation of how difficult it currently is to reproduce feature location evaluation results, and how TraceLab can help fix this, gives me hope that I'll see better comparisons between evaluations in future work.
You Didn't Beat Michele Lanza in Football
ICSM has an annual tradition of hosting a football game. While this may seem odd for a research conference it's a great way to meet and interact with other researchers outside of coffee breaks and formal networking. More importantly, it's a ton of fun! For those of you that will join us next year, I've heard rumor that Michele Lanza's team has *never* lost an ICSM match, and this year was no exception. Will worthy challengers appear in 2014?
You Missed Meeting Many "Applied Innovators"
ICSM is a haven for what my friend Brian Robinson calls applied innovation. A working definition of applied innovation is work that is both interesting to academics (i.e., innovative, containing new ideas) and developers (i.e., it actually works and saves them time or money). While I've used Arie van Deursen (above) as an example, as he is a full professor at Delft and co-founder of two software spinoffs, I could have chosen many other participants at ICSM to make the same point (e.g., Jurgen Vinju and his group, creators of Rascal). The average participant at ICSM is, in my experience, more likely to have written working software that one can leverage, founded a company that is interacting with real customers, or applied their work on industrial source code. In my opinion, these applied innovators are exactly the type of people that are starting to bridge the large gap between research and practice, and I'm happy to have spent a week learning from them!
I hope you'll join us next year for SCAM or ICSM 2014.
In the past three weeks we have collected anonymous usage data from developers (with their knowledge and permission!!) using our search tool in Visual Studio. Learning from the failed Usage Data Collection effort in Eclipse, which perhaps had too broad a scope, our data collection is tightly focused on improving search, and nothing else. Thus, after reviewing even three weeks of data we have several useful conclusions. Primarily, that developers who utilize recommendations (shown above) when searching code are more satisfied with the results and execute 20% less failed queries than those not using recommendations.
As you can see in pane (b) above we collected data for 25 days. We collected 204 log files from 44 unique users (excluding all Sando developers). Interestingly, as a comparison, Eclipse's Usage Data Collection collected only 453 new users' data over a 14 day period in 2008, even though they were potentially collecting from all Eclipse users.
During this 25 day period we collected anonymous metrics about 363 user queries executed on local projects via Sando. As seen in pane (a) 90 utilized the recommendations provided by Sando. Of those 90, 41 were utilized pro-actively (e.g., by selecting an entry from a dropdown) and 49 were triggered automatically (e.g., when the user searched for a word that did not exist in the code base). Of the 41 active queries 25 were "lookup"-style searches, such as using the autocomplete to help remember the name of a method, and 16 were exploratory-style searches, such as searching for phrase like "open file".
Using the data from these 363 queries we tried to gauge user satisfaction level using a few different metrics. First, we measured the query failure rate. We considered a query to fail if a user did not review any of the results, as indicated via UI events. As you can see in pane (e) queries that utilized recommendations had a 20% lower failure rate, which certainly leads to higher user satisfaction.
We also used both the short-click and long-click satisfaction measures, as are often used in web search. The intuition is that if a user clicks and views a result and then immediately returns to the result list that result was not helpful (a short-click), where as if a user clicks and views a result for a long period before (if ever) returning that result was helpful. As you can see in pane (d) queries with and without recommendations provided about the same rate of long-clicks, which is a positive metric, whereas queries with recommendations showed a significantly lower short-click rate, which is good as short-clicks are a negative satisfaction indicator.
As I mentioned at the beginning of this post, we only collect information aimed at helping us improve code search. So what's useful about the information I've presented today? Well, in the short term, Sando users should start using recommendations, immediately. They will be more satisfied with the results and fail 20% less. While we intend to do our part to improve the recommendation UI, possibly the most useful thing about this data is that it gives Sando users field evidence to support feature adoption. And when you think about it...how often have your other software tools given you evidence that a new feature actually makes you better? ;)
Are you an aimless poker?
Are you a random scroller?
Are you a hopeless grepper?
An unranked searcher?
If you recognize any of these MO's from your own work, stop. Come by my 1pm talk on Friday and learn what I've learned from the best. The search and navigation strategies I'll demonstrate should shave off 20-40% of the time it takes to complete an average task. And bring you're laptop, we'll have fun squashing a bug together!
For those of you unfamiliar with academic research on software engineering, let me explain. Research in software engineering has generally consisted of "...inventing new tools or processes, describing their application to toy problems in academic journals, and then wondering why practitioners ignored them."
This state of affairs is a tragedy, because as you may remember from your college days, professors are often among the smartest, hardest working people you will meet... they just need some redirecting.
This is where you come in. If you work in the software industry on testing, verification, or validation please submit a brief writeup of your tool to the ICST Tools Track (3-8 pages). Accepted tools get 25 minute presentation slots in the tools track, which is a great place to promote cutting edge testing software to bright grad students (i.e., future project leads), get feedback from professors steeped in the latest testing techniques, and even find potential collaborators who can improve your tool as part of their research work. Join us as we bridge the divide between research and practice!
Why Industry Professionals Should Submit:
I realize that presenting at academic conferences is not the norm in industry, yet perhaps it should become more accepted. Here I've compiled a list of reasons why you should showcase your testing software at the ICST Tools Track:
Why Graduate Students Should Submit:
I realize that tools tracks are often overlooked by academics and graduate students, who focus on the more prestigious research track, but there are several reasons why this approach is short-sighted.
Why Both Should Attend:
Academics and practitioners have more in common than most think. Here are some reasons for submitting to ICST that speak to both parties:
Plan to Attend!
I hope I've convinced you of the value of applying to the ICST Tools Track and the larger mission of reuniting software engineering research with practice. I'd encourage you not only to apply yourself, but to encourage others to apply. I'd especially encourage you to bug your favorite tool vendor about presenting at ICST. I've listed a few of my favorites below... feel free to encourage them to join us and add your own suggestions in the comments.
Tools and teams I'd love to have attend:
This summer the Sando team has invested significant effort in making Sando more usable for all Visual Studio developers. One of the smaller, fun features we worked on is ensuring that Sando looks great in every theme. For instance, here's what Sando looks like in the default VS2012 theme (with some quality issues due to the limitations of animated gifs):
And here's what Sando looks like in the dark, Sublime-like theme:
Our latest release, which quietly went out just today, can be downloaded from Visual Studio Gallery. We'd love to hear any early feedback prior to our release posts!
Because of my time at Tasktop (i.e., an ISV focused on developer productivity) I'm a big believer in dogfooding the software tools that I am developing. Thus, if you find me working in Visual Studio you'll notice that I'm using Sando (a software search tool) to develop Sando. However, this can lead to some confusion during testing, as developers wonder whether they are testing the installed extension or the extension under development, a situation complicated by possible interactions with other installed extensions. For those developing Visual Studio Extensions my team has developed a simple strategy for avoiding any confusion and eliminating unwanted interactions.
The answer in a nutshell: change the target of your VS shortcut to...
C:\Program Files (x86)\Microsoft Visual Studio 11.0\Common7\IDE\devenv.exe" /RootSuffix ExtDev
In the figure to the left I have illustrated the normal process for deploying extensions to the Experimental Instance (i.e., the Visual Studio instance where you can test your extension). When you compile an extension in Visual Studio it is deployed to the Experimental Instance (iff: "Deploy VSIX content to experimental instance for debugging" is checked). In cases where you are only developing a single extension and are not dogfooding, working this way may be sufficient.
However, when dogfooding, note that there is a potential for confusion. When you execute a "Reset Visual Studio 2012 Experimental Instance" command all installed extensions are copied from your normal VS instance into the Experimental instance, as shown on the right side of the figure. This can cause two versions of your extension to exist in the Experimental Instance (although only one will run at a time). Furthermore, if any of the installed extensions interact with your extension you'll notice unexpected errors during testing. In my opinion, when developing VS extensions, it's best to isolate them in a clean Experimental Instance to avoid any configuration headaches.
A Simple Solution
To avoid any configuration issues we develop our extensions in a special instance of Visual Studio, which we label the "ExtDev" instance. We install all extensions that we use to develop, including those we are dogfooding, into this instance and we develop our extensions here. By working in this way, we keep our default Visual Studio instance clean (see upper left of figure). Thus, the only way for extensions to be deployed to the Experimental Instance is through a compliation in our "ExtDev" instance. Note that we still often run the "Reset" command but, because the default Visual Studio Instance has no extensions installed no extensions are copied to the Experimental Instance, allowing us to isolate our extension(s) in the Experimental Instance.
As mentioned above, it is trivial to begin working this way. Simply add "/RootSuffix ExtDev" to the target of your VS shortcut and always use this shortcut to open Visual Studio. One gotcha is that you can no longer open projects directly, by using the project-specific shortcuts, but we feel this is a small price to pay for bringing sanity to extension development. One additional limitation is that extensions can only be installed into the "ExtDev" instance through the extension manager (Tools > Extension Manager), and not by clicking on a .vsix file as that will install into the default Visual Studio.
We hope this tip helps bring some sanity to your manual testing process, as it has for us. We also hope you'll visit us again, as we plan to share more tips and best practices for developing Visual Studio Extensions. As we're learning as we go, please feel free to suggest additions/changes in the comments.
ABB has just published its RFP for the 2013 university grant program which allocates grants of 50-80 KUSD to academic researchers. Eight page proposals are due on September 30th on a variety of topics, both related to software engineering and other topics. I have extracted my three favorite topics from the RFP and am posting them here, hoping to inspire some great proposals.
Topic 6: Bug Reproduction Productivity Techniques and Tools
Bugs and security vulnerabilities in industrial software systems are notoriously difficult to reproduce, at least partially due to the complex software and hardware systems within which individual products operate. Critical memory leak bugs, library version mismatches, and even straightforward bugs can take hours if not days to reproduce. ABB seeks solutions to reduce the time necessary to reproduce software bugs, including (but not limited to) automatic provisioning of test configurations from a bug report, remote debugging tools, remote data collection tools or strategies, or even well-defined studies around incentivizing/training end-users to provide high-quality bug reports.
Topic 9: Testing Tools and Techniques for IEC 61131-3 Languages
The IEC 61131-3 languages are the standard for industrial automation, used by domain experts for a variety of large automation projects, such as controlling the lighting in commercial complexes or precisely coordinating large-scale chemical processes. Because IEC 61131-3 sub-languages are both graphical (e.g., function block diagrams) and textual (e.g., structured text) there exists no clear unifying strategy or tooling for
testing. ABB seeks the creation of a testing framework and appropriate tooling for IEC 61131-3 languages targeting both graphical and textual sub-languages. While innovative additions, such as the inclusion of live programming environments for creating test cases (e.g., http://arxiv.org/pdf/1212.1796.pdf) are encouraged, proposals will be evaluated primarily on their ability to seamlessly test all sub-languages.
Topic 11: Fine-Grained Developer Productivity Metrics Suite
While it is unlikely that a single metric can be produced to fairly assess a developer's individual productivity, certain types of behavior can be measured (e.g., the amount of structured versus unstructured navigation, as was studied in "How effective developers investigate source code: An exploratory study"). ABB seeks the definition of key developer behaviors and an accompanying suite of metrics as well as ultra-low impact approaches to automatically collecting that data, preferably in the IDE but possibly by analyzing relevant artifacts.
The announcement is available here and the full list of topics here.
David Shepherd leverages software engineering research to create useful additions to the IDE.