Musings of a Tech Transfer Enthusiast
This spring I'll be presenting at several code camps and as well as online. Please see "Upcoming Talks" below for details. While I've blogged about code search many times in the past, this post contains new developer data, links to non-Sando search tools, and my latest thoughts on code search.
How many times do you Google per day? I'd estimate 20+ times for myself, but across all Internet users the average is much lower, at 2.46 Googles per day (5,922,000,000 Googles per day/2,405,518,375 Internet users). It's amazing how low this average is!
Now, for you developers out there, how many times do you search your code per day, perhaps looking for a feature's implementation while fixing a bug... once or twice? Nope. We collected anonymous activity from 60 developers for a total of 1750 developer days and saw that developers interact with known search tools in the Visual Studio IDE 34+ times per day! On average! Extrapolating from the Google example, I wonder if some developers are searching a hundred times per day on occasion...
Unfortunately, while developers are searching, empirical studies show they aren't finding. One study of developers performing maintenance tasks showed that 88% of developers' searches fail! To me, this is no surprise. Current search tooling inside modern IDEs is regex-based, and bound to fail for developers with less-than-perfect memories. Why don't developers upgrade search tools that fail so often, especially one that they interact with 34+ times per day?
The answer is no secret: until recently, there were no better search options. But now there are, and it is time for developers to remove constant failure from their day. In an upcoming set of talks, I'll detail why Lucene-based search tools work better for most developers. I'll talk about the different types of searches (e.g., symbol lookups vs exploratory searches), when to use built-in tooling (e.g., NavigateTo), and when to leverage advanced techniques (e.g., Sando or Entrian Source Search). Finally, I'll even provide advice on when uber-scalable server-based solutions (e.g., OpenGrok) are more appropriate.
In the past three weeks we have collected anonymous usage data from developers (with their knowledge and permission!!) using our search tool in Visual Studio. Learning from the failed Usage Data Collection effort in Eclipse, which perhaps had too broad a scope, our data collection is tightly focused on improving search, and nothing else. Thus, after reviewing even three weeks of data we have several useful conclusions. Primarily, that developers who utilize recommendations (shown above) when searching code are more satisfied with the results and execute 20% less failed queries than those not using recommendations.
As you can see in pane (b) above we collected data for 25 days. We collected 204 log files from 44 unique users (excluding all Sando developers). Interestingly, as a comparison, Eclipse's Usage Data Collection collected only 453 new users' data over a 14 day period in 2008, even though they were potentially collecting from all Eclipse users.
During this 25 day period we collected anonymous metrics about 363 user queries executed on local projects via Sando. As seen in pane (a) 90 utilized the recommendations provided by Sando. Of those 90, 41 were utilized pro-actively (e.g., by selecting an entry from a dropdown) and 49 were triggered automatically (e.g., when the user searched for a word that did not exist in the code base). Of the 41 active queries 25 were "lookup"-style searches, such as using the autocomplete to help remember the name of a method, and 16 were exploratory-style searches, such as searching for phrase like "open file".
Using the data from these 363 queries we tried to gauge user satisfaction level using a few different metrics. First, we measured the query failure rate. We considered a query to fail if a user did not review any of the results, as indicated via UI events. As you can see in pane (e) queries that utilized recommendations had a 20% lower failure rate, which certainly leads to higher user satisfaction.
We also used both the short-click and long-click satisfaction measures, as are often used in web search. The intuition is that if a user clicks and views a result and then immediately returns to the result list that result was not helpful (a short-click), where as if a user clicks and views a result for a long period before (if ever) returning that result was helpful. As you can see in pane (d) queries with and without recommendations provided about the same rate of long-clicks, which is a positive metric, whereas queries with recommendations showed a significantly lower short-click rate, which is good as short-clicks are a negative satisfaction indicator.
As I mentioned at the beginning of this post, we only collect information aimed at helping us improve code search. So what's useful about the information I've presented today? Well, in the short term, Sando users should start using recommendations, immediately. They will be more satisfied with the results and fail 20% less. While we intend to do our part to improve the recommendation UI, possibly the most useful thing about this data is that it gives Sando users field evidence to support feature adoption. And when you think about it...how often have your other software tools given you evidence that a new feature actually makes you better? ;)
For those of you unfamiliar with academic research on software engineering, let me explain. Research in software engineering has generally consisted of "...inventing new tools or processes, describing their application to toy problems in academic journals, and then wondering why practitioners ignored them."
This state of affairs is a tragedy, because as you may remember from your college days, professors are often among the smartest, hardest working people you will meet... they just need some redirecting.
This is where you come in. If you work in the software industry on testing, verification, or validation please submit a brief writeup of your tool to the ICST Tools Track (3-8 pages). Accepted tools get 25 minute presentation slots in the tools track, which is a great place to promote cutting edge testing software to bright grad students (i.e., future project leads), get feedback from professors steeped in the latest testing techniques, and even find potential collaborators who can improve your tool as part of their research work. Join us as we bridge the divide between research and practice!
Why Industry Professionals Should Submit:
I realize that presenting at academic conferences is not the norm in industry, yet perhaps it should become more accepted. Here I've compiled a list of reasons why you should showcase your testing software at the ICST Tools Track:
Why Graduate Students Should Submit:
I realize that tools tracks are often overlooked by academics and graduate students, who focus on the more prestigious research track, but there are several reasons why this approach is short-sighted.
Why Both Should Attend:
Academics and practitioners have more in common than most think. Here are some reasons for submitting to ICST that speak to both parties:
Plan to Attend!
I hope I've convinced you of the value of applying to the ICST Tools Track and the larger mission of reuniting software engineering research with practice. I'd encourage you not only to apply yourself, but to encourage others to apply. I'd especially encourage you to bug your favorite tool vendor about presenting at ICST. I've listed a few of my favorites below... feel free to encourage them to join us and add your own suggestions in the comments.
Tools and teams I'd love to have attend:
When using code search tools, such as Find in Files, developers don't want to waste mental energy crafting the perfect search query, they just want to find relevant code. Previous versions of the Sando Code Search Extension helped by providing users with conceptual autocompletion. Even with this help we found users creating ill-advised queries (including very smart, experienced developers). For instance, users would perform literal searches (e.g., document.Add) without adding the required quotes (e.g., "document.Add"). Or they'd enter a (previously) unsupported query format, like a wildcard query (e.g., custom*Document). To make it more natural for developers to interact with Sando we expanded our query interpretation code in today's release. Developers can enter a keyword query (e.g., open file), a literal query (e.g., File.Open), or even a wildcard query (e.g., Document*Create) and expect Sando to take care of the details of returning relevant results.
Please give Sando's new query interpretation code a spin and let us know what you think! If you like this new direction we'll look into even better query interpretation in the future.
Here's an example search using FindInFiles (left) and Sando (right):
Last year we released an early version of Sando, a free, open source code search extension for Visual Studio, that was based on the latest advances in code search research. Because of the modest success of this relatively unpolished tool (1300+ downloads) we have taken a number of measures to refine and improve it, making it more appropriate for wider usage. Among other things, we have refactored the code into two independent projects, updated the core search algorithm based on user study feedback, improved the quality of the test suite, shortened indexing time by 2-3x, and added autocompletion. In this post we will highlight several of the new features of Sando 0.4 which searches C, C++, and C# code.*
* The ReallyReallyReallySimpleRoguelike project will be used to illustrate several usage scenarios of Sando. This game "...has a really simple goal - pickup a sword and kill the monster(s)."
When searching an unfamiliar codebase or even an unfamiliar part of a familiar codebase creating an effective query can be tricky because it is difficult to guess what terms are actually used in the code. Let's consider searching for the concept of picking up a weapon in the Roguelike C# game. Using the default FindInFiles search engine searches for "pick*weapon" or alternatives such as "grab*weapon" fail to return any results, and a search for "weapon" returns a large number of results that would be tedious to sort through.
Sando can help you complete your query when you are only partially sure of what you should be searching for. In the example to the left the user has typed in "weapon" and Sando has proposed several autocomplete suggestions. A quick scan suggests that the #3 result, "Add Weapon", is likely a good candidate.
Executing the search for "Add Weapon" finds several relevant methods, including the AddWeapon method in the Player class, which is most directly responsible for picking up a weapon. This search also returns several related fields and methods such as field Player.WeaponSlot.
Exclude Terms (e.g., Don't Search Test Files)
When executing a search on a code base it is easy to miss the relevant items in a search result because they are overwhelmed by unrelated results. When trying to search for the concept of reading input from the keyboard in RRRSimpleRoguelike my results were flooded with XML-reading code. To eliminate these results from my search I added '-xml' to my query. Another popular use case for this feature is to exclude tests from search results.
Limit by Filetype (e.g., Only Search .h Files)
Many projects are a combination of languages (e.g., C++ backend with a C# GUI) and searches meant to explore the C# code can include results from C++ code. In the example to the left we have searched Sando's code base itself for the term "theme". The relevant C# results are overwhelmed by the results from the C++ code. In fact only a single C# result is shown (as the second result).
To eliminate unwanted filetypes, and to enable developers to search only header files, we have implemented a filetype search. In the example on the left we have further scoped the search to only include C# files, thus eliminating the irrelevant C++ results from our search.
Exact Matching (e.g., Find This String Literal)
As we saw during our user study, developers often want to search for exact strings. By adding quotes to any search string in Sando developers can search for specific literals (or any line snippet) that exists in code. Because Sando is an indexed searcher the results are near-instant, whereas using FindInFiles for this type of search will cause a delay while each file in the search scope is scanned.
Got All That? If Not, Consult the Tooltips
As you use Sando you'll notice that, as part of polishing it, we've added a few tooltips to guide your usage. Both the search box and the [?] icon include helpful tips on how to use Sando effectively.
Try Our Free, Open Source Search Extension
We think that the combination of new features, dramatically improved indexing performance, and polish lead to a much better search experience, but we'd love to hear what you think! Please don't hesitate to drop us a line on codeplex or, better yet, rate Sando on Visual Studio Gallery.
Download Sando from Visual Studio Gallery
Many software engineers (i.e., those that actually program on a daily basis) are unaware of the dedicated sub-field of software engineering researchers (like these from Microsoft) whose mission is to help make the daily grind of writing software better. While software engineering researchers have historically had limited practical impact, there are some notable companies and tools that were born out of software engineering research, and many brilliant, driven individual researchers who want to have impact. I hope that raising awareness of this research among software engineers will encourage more feedback to the software engineering research community, ultimately leading to more useful output.
To that end, I'm posting a short 3 minute video that provides a quick overview of a typical software engineering research project at ABB. I hope this video gives you a better sense of what we're working on, our balance between theoretical and practical impact, and what types of technologies we're investing in. If you're a software engineer we'd love to hear what you think about our research directions, how we could improve, or even just what problems are currently slowing you down. Feel free to leave comments or even contact me directly.
Make it work with 2012! :) -naspinski, from reddit
Well, the people have spoken, and we've (finally) delivered. The Sando code search extension is now available in VS2012 (in addition to VS2010), thanks in large part to Kosta Damevski. As we release this version I realize that not all readers are familiar with Sando's Raison d'être. For those of you new to Sando here's the top few reasons why we've spent the last year creating it.
Above I've quickly described a few reasons that we expect Sando to perform better than available regex-based tools and I've used a few scenarios to explain why. However, its important to know that Sando is not primarily based on my personal insights. Sando is built upon the huge body of code search research, started by Andrian Marcus's thesis work and so ably continued by researchers like Denys Poshyvanyk, Dawn Lawrie and David Binkley, Lori Pollock, Emily Hill, and many others. Thus, you can download and use Sando, assured that it's providing you with high-quality search results influenced by cutting edge advances in software engineering research.
Sando is available as a Visual Studio extension for VS2010 and VS 2012
When fixing bugs most software developers start with a search. I know. I've watched them. They read the bug report, reproduce the problem (sometimes), and then search for relevant code. This is when it gets ugly. Developers either wander semi-randomly around the Solution Explorer (Roomba Style), struggle to use Find in Files with increasing levels of anger (William Foster Style), or I've even seen one user step through almost the entire program in the debugger (Slowskys Style). Please, for goodness' sake, next time you search try Sando Style. Our newly released version of Sando (0.3), is a free, open source extension for VS2010 that makes searching your C#, C, and C++ code as easy as using Google.
VS Gallery Download: http://bit.ly/SandoDownload
Getting Started With Sando
After installing Sando the main view will appear in your Visual Studio Studio, as shown to the left. The UI consists of the main input box (upper left), the search button (upper right), the results view (bottom center) and the normally folded advanced options (middle center). To use Sando simply open a solution, type in a search term, and press search (or hit return).
Once you search, you'll see results like those shown to the left. Notice how the method OpenFile is shown first, because it matches both search terms, whereas the lower ranked terms only match one of the two terms. Each result is shown with a representative icon that communicates both the element type (e.g., method, class, or plaintext) and the accessibility level if relevant (e.g., public or private).
See a result you like but not ready to commit to opening it? No problem, after the search is executed just press the down arrow to scroll through results, expanding each result to view a partial snippet when it is selected. In the picture to the left OpenFile has been selected and you can view the first few lines of this method. To open the method in the VS editor simply hit return or double click on the result.
Getting Fancy with Sando
So far we've discussed the very basics of how to use Sando. However, the entire Sando team dog-foods Sando (i.e., we use it everyday ourselves) and so it's got a few advanced features baked in, which I'll discuss here.
Halfway through your task and decide you need to search again? No need to pickup that mouse, Sando is designed to allow you to work mouse-free. Hit Alt+Shift+S and the Sando view will appear with the cursor in the text box, ready to search. Using this shortcut I find Sando to be a great way to open any file or method I'm thinking of, and thus I almost never open the Solution Explorer.
One of the things we noticed when using Sando is that sometimes you just don't want to see certain results. For instance, we have huge text files of words in our test directories, which are indexed by Sando, but sometimes we don't want to see these results. In the advanced section of Sando you can deselect any program type, such as a method or a comment, or any other type, such as a text line. Doing so means that deselected types won't appear in your results. This provides a quick way to eliminate unwanted items from your current search.
You may have noticed that Sando often changes the text just under its main input box. This is the status line and it is used to give you feedback on your query. If your query has hits it will tell you how many, and if it doesn't Sando may give you tips on how to improve your query. You may also notice the message that "Sando is still performing its initial index...". This occurs when you open a new project that Sando has not analyzed before. This only happens once and, when analyzing Sando itself, takes only ~1 minute to complete on a 1.7GHz machine with no SSD. For a machine with SSD I've seen this index happen almost instantly for sizable projects. Note that this indexing is necessary for Sando to be able to return search results virtually instantly on even *very* large Solutions.
Finally, while we've done our best to choose good defaults for most users, you may want to customize a few things. To do this, visit the Sando options page (Tools > Options) where you can configure the directory where Sando stores its index as well as the maximum number of results returned, which is by default set to 20. I recommend resetting this to 50, but I'm crazy about search so I didn't force this default on the masses...
Search With Style
Hopefully this post gives you a good idea of the functionality that Sando provides. My team and I have spent many a late night improving Sando based on early adopters' feedback and we truly think this release represents a big step forward in usability and robustness. If you have been using Sando now's the time to start recommending it to your friends. If you have been considering using Sando, you should now enjoy a much improved experience. Next time you search, I hope it's Sando Style!
VS Gallery Download: http://bit.ly/SandoDownload
Video Introduction (dated): http://bit.ly/SandoVideo
Just a quick "Thank You" to those that have filed feedback either on CodePlex or on Visual Studio Gallery. We're listening to your feedback and have just released a (small) update on both sites. Feel free to download the latest and let us know what else we can improve!
As you can see from the list of fixed bugs above we have tried to respond relatively quickly to the issues that are causing our users the most pain (e.g., #165). If you have filed a bug and don't see it on this list get your friends to vote for it on CodePlex, as we prioritize according to user demand and severity.
I continue to personally use Sando on a daily basis to speed up my C# programming work. If you haven't downloaded Sando yet, now may be a good time to check it out. Sando's brave early adopters have ironed out many of the big issues. Hats off to them!
Sando on Visual Studio Gallery
Sando on CodePlex
David Shepherd leverages software engineering research to create useful additions to the IDE.