David C. Shepherd
  • Blog
  • Colleagues
  • Contact
  • About

Transferring Tech

Code Search for VS 2012

11/30/2012

 
Picture
Make it work with 2012! :)   -naspinski, from reddit
Well, the people have spoken, and we've (finally) delivered.  The Sando code search extension is now available in VS2012 (in addition to VS2010), thanks in large part to Kosta Damevski. As we release this version I realize that not all readers are familiar with Sando's Raison d'être. For those of you new to Sando here's the top few reasons why we've spent the last year creating it. 

  1. Regexes Searches Fail - Many programmers claim that they can write a regex to find any code they might need. This is absolutely true, but they often don't mention that most of their regex searches fail.  In Ko et al.'s 2006 study of developers performing maintenance tasks 88% of developers' regex searches failed, and these developers were searching over a program of only 500 lines of code.  In our study of programmers working on medium sized code bases we observed a similar failure rate. I'd like to challenge you. Record your search success rate for a single day; if it's under 50% consider installing Sando. 
  2. Information Retrieval Technology Avoids Regex's Failures - The tragedy of this high failure rate is that the types of failures that are caused by regex technology are easily avoided.  Because Sando is backed by Lucene.NET, which uses the Vector Space Model with TF-IDF scoring, it can handle common regex failure cases such as word re-ordering.  Users who would have had to search for both "open*file" and "file*open" using regex technology can now simply search for "open file".  Similarly, since Sando uses Lucene's  SnowballAnalyzer a search for "open file" will automatically return matches for different word forms, such as "opened", thus finding the relevant method "OpenedFile". While each individual shortcoming when searching with regexes seems trivial, the combination of issues creates real problems. Imagine searching for the concept of "open file".  The javascript regex ([oO]pen[ing]*(\s)*[fF]ile[s]*)*((\s)*[fF]ile[s]*[oO]pen[ing]*)* could be used to find the likely relevant strings  {openfile, fileopen, OpenFile, FileOpen, fileOpening,OpeningFiles}. Yet even more regex-fu would be needed to match the equally probable {FileOpened}.  In contrast, Sando would only require the search terms "open file".
  3. Ranked Results Reduce Human Processing Time - One of the main drivers of code search tools is that they save developers time.  In that sense, regex-based searches (e.g., grep) are a huge time-saver when compared with manual scanning.  Sando aims to build upon this time saving by not only automatically identifying matches, but by ranking those matches. In practice this time savings is significant. Consider this common scenario.  A developer searches for the string "save*failed" using a regex-based search.  This very specific search returns no results, so he creates the more general query "fail".  This query returns about one hundred unranked results, which he slowly scans, finding the relevant match in result #50.  In contrast, when using Sando, which ranks matches according to their similarity score, the most relevant result appears as result #2. 

Research-Driven Advances

Above I've quickly described a few reasons that we expect Sando to perform better than available regex-based tools and I've used a few scenarios to explain why. However, its important to know that Sando is not primarily based on my personal insights. Sando is built upon the huge body of code search research, started by Andrian Marcus's thesis work and so ably continued by researchers like Denys Poshyvanyk, Dawn Lawrie and David Binkley, Lori Pollock, Emily Hill, and many others. Thus, you can download and use Sando, assured that it's providing you with high-quality search results influenced by cutting edge advances in software engineering research.   

Sando is available as a Visual Studio extension for VS2010 and VS 2012

Sando as a Research Framework - Demo at FSE

11/15/2012

 
While you may know Sando as a software search tool for Visual Studio many are unaware that Sando is also a research-enabling framework.  Sando was built to be extensible, for open source enthusiasts who want to support new languages, but also for researchers who need to quickly prototype new search ideas. 

You may wonder, why do researchers need to prototype their code search ideas? Because code search is a software engineering problem, involving aspects of program analysis, information retrieval, and even natural language processing, it's necessary for researchers to ground their new approaches in the reality of the engineering issues.  They need to test their new search algorithm(s) on realistic source code bases, because it's difficult to simulate the complexity of the system through thought experiments alone.     

So, if you're a researcher interested in code search or a developer looking for an open framework to experiment with have a look at my demo (above) on Sando from the 2012 Foundations of Software Engineering Demo Track.  I'll cover not only how developers use Sando in their day-to-day work but also how Sando can be used to quickly realize your kooky research ideas. Happy searching!

Meet ABB's Software Engineering Researchers at FSE

11/9/2012

 
Picture
The ABB team is headed out to FSE next week!  While we regularly work with academics (e.g., Jonathan Maletic, Michael Collard, Lori Pollock, Thomas Fritz, Kostadin Damevski, Emerson Murphy-Hill, Gregg Rothermel, Myra Cohen, Mary Jean Harrold...) it's often over Skype, and so we are looking forward to meeting the academic community in person. Please stop by our table, where we'll have ABB tumblers to give away, or grab us during a coffee break. We are always looking for next year's interns, possibilities for collaborations, sabbatical opportunities, or students graduating on the next cycle... so don't be shy!          
Picture
One of the best ways you can get to know what our research group is up to is to visit us at one of our three tool demo sessions.  They are:  

10:30-11:45 Research Tool Demos, Session Chair: Brian Robinson, ABB Corporate Research
    Automating Adaptive Maintenance Changes with SrcML and LINQ
    Vinay Augustine,ABB Corporate Research

13:00-14:15 Research Tool Demos
    Practical Change Impact Analysis Based on Static Program Slicing for Industrial Software Systems
    Mithun Acharya and Brian Robinson, ABB Corporate Research
    
    Sando: An Extensible Local Code Search Framework
    David Shepherd, ABB, Inc.Kostadin Damevski, Virginia State University
    Bartosz Ropski, Autodesk, Inc. Krakow, Poland, Thomas Fritz, University of Zurich

If you'd like to read about some of these projects ahead of time you can check out this video on Sando, visit the srcML.Net code repository, or review one of Mithun's papers on slicing industrial scale programs. See you soon!

    Author

    David Shepherd leverages software engineering research to create useful additions to the IDE. 

    Archives

    May 2018
    May 2017
    November 2016
    May 2016
    October 2015
    September 2015
    August 2015
    January 2015
    September 2014
    May 2014
    March 2014
    February 2014
    January 2014
    October 2013
    September 2013
    August 2013
    July 2013
    May 2013
    April 2013
    March 2013
    February 2013
    December 2012
    November 2012
    October 2012
    July 2012
    June 2012
    May 2012
    January 2012

    Categories

    All
    Abb
    Extensions
    Grants
    Ide
    Software Engineering
    Software Engineering Research
    Software Tools
    Visual Studio
    Visual Studio Extensions

    RSS Feed

Powered by Create your own unique website with customizable templates.
  • Blog
  • Colleagues
  • Contact
  • About