Musings of a Tech Transfer Enthusiast
This spring I'll be presenting at several code camps and as well as online. Please see "Upcoming Talks" below for details. While I've blogged about code search many times in the past, this post contains new developer data, links to non-Sando search tools, and my latest thoughts on code search.
How many times do you Google per day? I'd estimate 20+ times for myself, but across all Internet users the average is much lower, at 2.46 Googles per day (5,922,000,000 Googles per day/2,405,518,375 Internet users). It's amazing how low this average is!
Now, for you developers out there, how many times do you search your code per day, perhaps looking for a feature's implementation while fixing a bug... once or twice? Nope. We collected anonymous activity from 60 developers for a total of 1750 developer days and saw that developers interact with known search tools in the Visual Studio IDE 34+ times per day! On average! Extrapolating from the Google example, I wonder if some developers are searching a hundred times per day on occasion...
Unfortunately, while developers are searching, empirical studies show they aren't finding. One study of developers performing maintenance tasks showed that 88% of developers' searches fail! To me, this is no surprise. Current search tooling inside modern IDEs is regex-based, and bound to fail for developers with less-than-perfect memories. Why don't developers upgrade search tools that fail so often, especially one that they interact with 34+ times per day?
The answer is no secret: until recently, there were no better search options. But now there are, and it is time for developers to remove constant failure from their day. In an upcoming set of talks, I'll detail why Lucene-based search tools work better for most developers. I'll talk about the different types of searches (e.g., symbol lookups vs exploratory searches), when to use built-in tooling (e.g., NavigateTo), and when to leverage advanced techniques (e.g., Sando or Entrian Source Search). Finally, I'll even provide advice on when uber-scalable server-based solutions (e.g., OpenGrok) are more appropriate.
In the past three weeks we have collected anonymous usage data from developers (with their knowledge and permission!!) using our search tool in Visual Studio. Learning from the failed Usage Data Collection effort in Eclipse, which perhaps had too broad a scope, our data collection is tightly focused on improving search, and nothing else. Thus, after reviewing even three weeks of data we have several useful conclusions. Primarily, that developers who utilize recommendations (shown above) when searching code are more satisfied with the results and execute 20% less failed queries than those not using recommendations.
As you can see in pane (b) above we collected data for 25 days. We collected 204 log files from 44 unique users (excluding all Sando developers). Interestingly, as a comparison, Eclipse's Usage Data Collection collected only 453 new users' data over a 14 day period in 2008, even though they were potentially collecting from all Eclipse users.
During this 25 day period we collected anonymous metrics about 363 user queries executed on local projects via Sando. As seen in pane (a) 90 utilized the recommendations provided by Sando. Of those 90, 41 were utilized pro-actively (e.g., by selecting an entry from a dropdown) and 49 were triggered automatically (e.g., when the user searched for a word that did not exist in the code base). Of the 41 active queries 25 were "lookup"-style searches, such as using the autocomplete to help remember the name of a method, and 16 were exploratory-style searches, such as searching for phrase like "open file".
Using the data from these 363 queries we tried to gauge user satisfaction level using a few different metrics. First, we measured the query failure rate. We considered a query to fail if a user did not review any of the results, as indicated via UI events. As you can see in pane (e) queries that utilized recommendations had a 20% lower failure rate, which certainly leads to higher user satisfaction.
We also used both the short-click and long-click satisfaction measures, as are often used in web search. The intuition is that if a user clicks and views a result and then immediately returns to the result list that result was not helpful (a short-click), where as if a user clicks and views a result for a long period before (if ever) returning that result was helpful. As you can see in pane (d) queries with and without recommendations provided about the same rate of long-clicks, which is a positive metric, whereas queries with recommendations showed a significantly lower short-click rate, which is good as short-clicks are a negative satisfaction indicator.
As I mentioned at the beginning of this post, we only collect information aimed at helping us improve code search. So what's useful about the information I've presented today? Well, in the short term, Sando users should start using recommendations, immediately. They will be more satisfied with the results and fail 20% less. While we intend to do our part to improve the recommendation UI, possibly the most useful thing about this data is that it gives Sando users field evidence to support feature adoption. And when you think about it...how often have your other software tools given you evidence that a new feature actually makes you better? ;)
When using code search tools, such as Find in Files, developers don't want to waste mental energy crafting the perfect search query, they just want to find relevant code. Previous versions of the Sando Code Search Extension helped by providing users with conceptual autocompletion. Even with this help we found users creating ill-advised queries (including very smart, experienced developers). For instance, users would perform literal searches (e.g., document.Add) without adding the required quotes (e.g., "document.Add"). Or they'd enter a (previously) unsupported query format, like a wildcard query (e.g., custom*Document). To make it more natural for developers to interact with Sando we expanded our query interpretation code in today's release. Developers can enter a keyword query (e.g., open file), a literal query (e.g., File.Open), or even a wildcard query (e.g., Document*Create) and expect Sando to take care of the details of returning relevant results.
Please give Sando's new query interpretation code a spin and let us know what you think! If you like this new direction we'll look into even better query interpretation in the future.
Here's an example search using FindInFiles (left) and Sando (right):
Last year we released an early version of Sando, a free, open source code search extension for Visual Studio, that was based on the latest advances in code search research. Because of the modest success of this relatively unpolished tool (1300+ downloads) we have taken a number of measures to refine and improve it, making it more appropriate for wider usage. Among other things, we have refactored the code into two independent projects, updated the core search algorithm based on user study feedback, improved the quality of the test suite, shortened indexing time by 2-3x, and added autocompletion. In this post we will highlight several of the new features of Sando 0.4 which searches C, C++, and C# code.*
* The ReallyReallyReallySimpleRoguelike project will be used to illustrate several usage scenarios of Sando. This game "...has a really simple goal - pickup a sword and kill the monster(s)."
When searching an unfamiliar codebase or even an unfamiliar part of a familiar codebase creating an effective query can be tricky because it is difficult to guess what terms are actually used in the code. Let's consider searching for the concept of picking up a weapon in the Roguelike C# game. Using the default FindInFiles search engine searches for "pick*weapon" or alternatives such as "grab*weapon" fail to return any results, and a search for "weapon" returns a large number of results that would be tedious to sort through.
Sando can help you complete your query when you are only partially sure of what you should be searching for. In the example to the left the user has typed in "weapon" and Sando has proposed several autocomplete suggestions. A quick scan suggests that the #3 result, "Add Weapon", is likely a good candidate.
Executing the search for "Add Weapon" finds several relevant methods, including the AddWeapon method in the Player class, which is most directly responsible for picking up a weapon. This search also returns several related fields and methods such as field Player.WeaponSlot.
Exclude Terms (e.g., Don't Search Test Files)
When executing a search on a code base it is easy to miss the relevant items in a search result because they are overwhelmed by unrelated results. When trying to search for the concept of reading input from the keyboard in RRRSimpleRoguelike my results were flooded with XML-reading code. To eliminate these results from my search I added '-xml' to my query. Another popular use case for this feature is to exclude tests from search results.
Limit by Filetype (e.g., Only Search .h Files)
Many projects are a combination of languages (e.g., C++ backend with a C# GUI) and searches meant to explore the C# code can include results from C++ code. In the example to the left we have searched Sando's code base itself for the term "theme". The relevant C# results are overwhelmed by the results from the C++ code. In fact only a single C# result is shown (as the second result).
To eliminate unwanted filetypes, and to enable developers to search only header files, we have implemented a filetype search. In the example on the left we have further scoped the search to only include C# files, thus eliminating the irrelevant C++ results from our search.
Exact Matching (e.g., Find This String Literal)
As we saw during our user study, developers often want to search for exact strings. By adding quotes to any search string in Sando developers can search for specific literals (or any line snippet) that exists in code. Because Sando is an indexed searcher the results are near-instant, whereas using FindInFiles for this type of search will cause a delay while each file in the search scope is scanned.
Got All That? If Not, Consult the Tooltips
As you use Sando you'll notice that, as part of polishing it, we've added a few tooltips to guide your usage. Both the search box and the [?] icon include helpful tips on how to use Sando effectively.
Try Our Free, Open Source Search Extension
We think that the combination of new features, dramatically improved indexing performance, and polish lead to a much better search experience, but we'd love to hear what you think! Please don't hesitate to drop us a line on codeplex or, better yet, rate Sando on Visual Studio Gallery.
Download Sando from Visual Studio Gallery
Make it work with 2012! :) -naspinski, from reddit
Well, the people have spoken, and we've (finally) delivered. The Sando code search extension is now available in VS2012 (in addition to VS2010), thanks in large part to Kosta Damevski. As we release this version I realize that not all readers are familiar with Sando's Raison d'être. For those of you new to Sando here's the top few reasons why we've spent the last year creating it.
Above I've quickly described a few reasons that we expect Sando to perform better than available regex-based tools and I've used a few scenarios to explain why. However, its important to know that Sando is not primarily based on my personal insights. Sando is built upon the huge body of code search research, started by Andrian Marcus's thesis work and so ably continued by researchers like Denys Poshyvanyk, Dawn Lawrie and David Binkley, Lori Pollock, Emily Hill, and many others. Thus, you can download and use Sando, assured that it's providing you with high-quality search results influenced by cutting edge advances in software engineering research.
Sando is available as a Visual Studio extension for VS2010 and VS 2012
When fixing bugs most software developers start with a search. I know. I've watched them. They read the bug report, reproduce the problem (sometimes), and then search for relevant code. This is when it gets ugly. Developers either wander semi-randomly around the Solution Explorer (Roomba Style), struggle to use Find in Files with increasing levels of anger (William Foster Style), or I've even seen one user step through almost the entire program in the debugger (Slowskys Style). Please, for goodness' sake, next time you search try Sando Style. Our newly released version of Sando (0.3), is a free, open source extension for VS2010 that makes searching your C#, C, and C++ code as easy as using Google.
VS Gallery Download: http://bit.ly/SandoDownload
Getting Started With Sando
After installing Sando the main view will appear in your Visual Studio Studio, as shown to the left. The UI consists of the main input box (upper left), the search button (upper right), the results view (bottom center) and the normally folded advanced options (middle center). To use Sando simply open a solution, type in a search term, and press search (or hit return).
Once you search, you'll see results like those shown to the left. Notice how the method OpenFile is shown first, because it matches both search terms, whereas the lower ranked terms only match one of the two terms. Each result is shown with a representative icon that communicates both the element type (e.g., method, class, or plaintext) and the accessibility level if relevant (e.g., public or private).
See a result you like but not ready to commit to opening it? No problem, after the search is executed just press the down arrow to scroll through results, expanding each result to view a partial snippet when it is selected. In the picture to the left OpenFile has been selected and you can view the first few lines of this method. To open the method in the VS editor simply hit return or double click on the result.
Getting Fancy with Sando
So far we've discussed the very basics of how to use Sando. However, the entire Sando team dog-foods Sando (i.e., we use it everyday ourselves) and so it's got a few advanced features baked in, which I'll discuss here.
Halfway through your task and decide you need to search again? No need to pickup that mouse, Sando is designed to allow you to work mouse-free. Hit Alt+Shift+S and the Sando view will appear with the cursor in the text box, ready to search. Using this shortcut I find Sando to be a great way to open any file or method I'm thinking of, and thus I almost never open the Solution Explorer.
One of the things we noticed when using Sando is that sometimes you just don't want to see certain results. For instance, we have huge text files of words in our test directories, which are indexed by Sando, but sometimes we don't want to see these results. In the advanced section of Sando you can deselect any program type, such as a method or a comment, or any other type, such as a text line. Doing so means that deselected types won't appear in your results. This provides a quick way to eliminate unwanted items from your current search.
You may have noticed that Sando often changes the text just under its main input box. This is the status line and it is used to give you feedback on your query. If your query has hits it will tell you how many, and if it doesn't Sando may give you tips on how to improve your query. You may also notice the message that "Sando is still performing its initial index...". This occurs when you open a new project that Sando has not analyzed before. This only happens once and, when analyzing Sando itself, takes only ~1 minute to complete on a 1.7GHz machine with no SSD. For a machine with SSD I've seen this index happen almost instantly for sizable projects. Note that this indexing is necessary for Sando to be able to return search results virtually instantly on even *very* large Solutions.
Finally, while we've done our best to choose good defaults for most users, you may want to customize a few things. To do this, visit the Sando options page (Tools > Options) where you can configure the directory where Sando stores its index as well as the maximum number of results returned, which is by default set to 20. I recommend resetting this to 50, but I'm crazy about search so I didn't force this default on the masses...
Search With Style
Hopefully this post gives you a good idea of the functionality that Sando provides. My team and I have spent many a late night improving Sando based on early adopters' feedback and we truly think this release represents a big step forward in usability and robustness. If you have been using Sando now's the time to start recommending it to your friends. If you have been considering using Sando, you should now enjoy a much improved experience. Next time you search, I hope it's Sando Style!
VS Gallery Download: http://bit.ly/SandoDownload
Video Introduction (dated): http://bit.ly/SandoVideo
Just a quick "Thank You" to those that have filed feedback either on CodePlex or on Visual Studio Gallery. We're listening to your feedback and have just released a (small) update on both sites. Feel free to download the latest and let us know what else we can improve!
As you can see from the list of fixed bugs above we have tried to respond relatively quickly to the issues that are causing our users the most pain (e.g., #165). If you have filed a bug and don't see it on this list get your friends to vote for it on CodePlex, as we prioritize according to user demand and severity.
I continue to personally use Sando on a daily basis to speed up my C# programming work. If you haven't downloaded Sando yet, now may be a good time to check it out. Sando's brave early adopters have ironed out many of the big issues. Hats off to them!
Sando on Visual Studio Gallery
Sando on CodePlex
Can you remember the last time you tried to search your code base for a relevant method or code snippet? I can, and it was a disaster. I used Visual Studio’s “Find in Files”, which returned 100s of results. After reviewing about twenty of them I gave up and started surfing the Solution Explorer, opening files one-by-one. After about 20 minutes of reading source files I began to wonder if there was a better way...
Unfortunately this pattern of failure is all too common, as recent studies show that close to 90% of code searches during software maintenance tasks fail. While fruitless searches themselves can be annoying it’s the time spent reading and exploring irrelevant code that really compounds this cost. If you’re tired of failed searches and costly sidetracks read on to discover Sando, a new Visual Studio extension that leverages recent research advances to significantly improve code search.
Why Use Sando?
Sando, our new Visual Studio Extension for searching source code, is built upon solid research findings, and in our opinion much better than the regular expression technology it replaces. We could argue that leveraging information retrieval instead of regex technology naturally leads to faster search execution. We could claim that multi-term search is difficult and annoying with regular expressions. We might even point out that regex-based searches return unranked results instead of highlighting the most relevant results. However, having been burnt by savvy marketing in the past we prefer to simply show a few concrete results of running Sando against Visual Studio's "Find in Files" and let you decide for yourself (note: these example searches were run on Sando's own source code).
One of the trickiest types of searches involves searching for a term that is common in the given domain. Imagine searching for “text” in a text editor project or “paint” in a drawing project; you’re likely to be overwhelmed with 100s of results. This is exactly the case when searching Sando’s source code for the term “search” using "Find in Files" (shown above, left). However, using Sando with the same search term users can find relevant source code elements faster. Sando leverages information retrieval to find elements where the term “search” is more central. Thus, as you can see above (right), methods like “Search” are ranked highly in the search results.
While unranked results can be annoying to sift through a more pressing problem may be missing results. Imagine using "Find in Files" with the search string “file open”. The result, as shown above on the left, is an empty set, yet I know that there exists relevant code in this solution. Fortunately, since Sando is not sensitive to word ordering it returns the method OpenFile as the first result, shown above right. Manually re-ordering search terms becomes a thing of the past when using Sando.
While creating this post I noticed another difference between these two tools, the wait times. While the times are not drastically different, with "Find in Files" taking 5 – 10 seconds and Sando returning almost instantly, the lag time enforced by "Find in Files" discourages follow up queries and often leads to abandoned searches. As you might imagine, abandoned searches often lead to sub-optimal behavior (e.g., Solution Explorer surfing) and thus the instant results offered by Sando can be a significant advantage.
Try Sando's Beta Release!
Today we have only scratched the surface of ways in which Sando’s technology outperforms regular-expression-based searchers. However, we don’t expect you to take our word for it. Please try Sando's Beta Release today and don't hesitate to let us know what we should improve!
David Shepherd leverages software engineering research to create useful additions to the IDE.