As developers, most of us use an IDE because it takes care of the messy details of development, allowing us to focus our full attention on higher-level tasks. Most common IDE features work well, yet unfortunately there is one tool embedded within modern IDEs that is failing us: the search tool.*
Regex-based searches are slow
IDEs currently offer search tools that are painfully slow-to-execute.
Recent studies show that about 88% of developers' searches fail, but because search tools are bundled within an otherwise well-performing IDE most developers do not recognize the problem, and some even deny it exists. Today's post serves to dispel this myth. Three authors of search tools including myself present the case for upgrading your current search tooling.

*In this article we focus on local search tools. These tools are used to search your on-disk project. Examples include 'Find-in-Files' in Visual Studio, 'File Search' in Eclipse, or even grep. We are not discussing web-based searches of code repositories, such as GrepCode or GitHub's search.

Picture
Richie Hindle 
Entrian Source Search (Visual Studio)

No matter how good we are as developers, we can't hold the whole source tree in our heads at once - we need tools to help navigate it. IDE's are a big help - the F12 key on my keyboard (Visual Studio's Go to Definition command) is well worn. But the tools that IDEs provide don't have the power, coverage or speed that a full-text indexed search engine gives you. As one Source Search user puts it: "Our products are complex enough that not everything lives in C#: there's XML, PowerShell scripts, WiX config, custom build actions in the .csproj files... only a full-text search engine will do what I want."

Anyone remember when the Yahoo Directory was how you found things on the internet? Life without Google, yow... but that's what developing without a good search tool is. Imagine the leap from Yahoo Directory to Google, but for your coding experience. Obviously speed is a big part of that - anything that breaks your train of thought, interrupts your flow, is a bad thing, so instantaneous search is a big help. But it's also about being able express what you're looking for, both more exactly ("it's in the Renderer directory, in a .cpp file that I've modified today") and more vaguely ("it's something like InterpolateSomething(), and there's a comment that talks about quaternion rotation").

I believe that a good source code search engine will one day be one of the tools of the trade that we all take for granted, like syntax coloring or networked source control - yes, you could develop without those things (and I'm just about old enough to remember when that was normal!) but you'd feel like you were working with one hand tied behind your back. Better tools make us better developers, and powerful code search is one of those tools that you quickly wonder how you ever did without.


Picture
Andrejs Jermakovics 
Instasearch (Eclipse)

From my experience good code search is essential in an IDE and can be a massive productivity boost. This especially true when working with large codebases and I wrote InstaSearch out of my own need to find code in a million LOC projects. In a way this is similar to a desktop search but there are aspects specific to source code. You have to be able to search for words of variable names independent of the naming convention (camel case, underscore delimited) and to search inside specific code projects. And, of course, it helps a lot if the search is fast since you can tweak your search and see the results change immediately.

The two main use cases for search in IDEs that I'm noticing are: 1) I am looking for an exact string, 2) I don't know exactly what I'm looking for but I want to find code related to a few words. The first one is for looking up all occurrences of some text such as a constant or a method name. The second one is for discovering new code and finding where a particular functionality is implemented. I think code search tools need to support both these use cases to be effective. One way to enhance code search even further is to take advantage of static code information such as classes and methods.


Picture
David C. Shepherd
Sando Code Search Tool (Visual Studio)

In the past I have presented arguments as to why searches fail, why an information retrieval approach is better than a regex-based approach, and even pointed out the obvious superiority of ranked results over a flat list. Today, however, I want to guide you through a few searches in your own IDE, as many developers (myself included!) do not truly understand the depth of the issue until they experience it in their own code base. So please humor me by opening your IDE and trying out the following three searches:

  1. Search for the most popular term in your code base.  For the Sando code base that would mean searching for the term 'search'. Using standard search tools this search takes 10s to execute on a relatively small project of about 300 files. Using a next-gen search tool search results are instant, regardless of the number of project files or hits. 
  2. Search for a feature that someone else implemented. For the Sando code base that would mean searching for the method parsing code. Using standard search tools I search for 'parse' and receive 3300+ hits. Using a next-gen tool autocompletion guides me to expand my query to 'parse method', which finds the relevant methods as the top three hits.
  3. Search for 5 - 10 "known" classes, methods, or fields. For the Sando code base, that would mean searching for classes I'm familiar with, like CppParser. Using a standard symbol lookup dialog (i.e., 'Navigate To' in Visual Studio or 'Open Type' in Eclipse) to search for 'cpp', SrcMLCppParser is hidden in an alphabetized list of 47 matching symbols (at slot 32).  Even after expanding my query to 'cpp parser' SrcMLCppParser is still one of 18 matching symbols. A search for 'cpp' using a next-gen tool recommends 'cpp parse' as the first autocompletion and the resulting search returned SrcMLCppParser as the first result. 
The quick execution time, autocompletion help, and information retrieval-based search engine of next-gen tools leads to a much better search experience.


Ready for an Upgrade?

Today we have presented our case for why you should upgrade your search tooling, and I hope we have convinced you.  If our arguments have hit home here is a list of known search tools available for popular IDEs that are based on information retrieval technology.  Enjoy!
Eclipse:
Instasearch 
Visual Studio:
Source Search
Sando
Standalone:
OpenGrok
 


Comments

Symbiatch
05/28/2013 11:02pm

I understand what you're going after with this "Think about having Google for your code", but at the moment I would never change a simple search for Google in my code. Why? Because every time I try to search for something on the net I get my search terms mangled and I get lots of results that have nothing to do with the search terms. It would be such a waste of time to use such a search for code. I wish there was a net search engine that would allow me to say "hey, I know what I want to find. You don't. So don't mess with my search."

David C Shepherd
05/29/2013 4:24am

Hi Symbiatch,

I completely understand! One of the things we noticed when running user studies (prior to release) is that developers prefer exact matches of unprocessed queries whenever they exist, and thus we always try to return these results first. So, you don't have to trade an exact search for a search-term-mangling search, a good next-gen approach should return exact matches first and less exact matches afterwards. That being said, I understand there is some frustration when using an information retrieval search instead of a regex-based search, because I personally can't simulate execution with my mind like I could with grep, and it is harder to trust a tool that is not as well understood. Trusting an IR-based tool is often a matter of using the tool over time instead of understanding exactly how it works.

05/29/2013 7:10am

@Symbiatch: I couldn't agree more about the way Google works these days. Maybe I should have said "the Google of ten years ago". :-) Entrian Source Search doesn't make guesses - it has features for fuzzy searching, but they only apply if you ask for them.

(By the way, putting "all" "your" "terms" "in" "quotes" makes Google behave more like the way you want.)


Comments are closed.