Musings of a Tech Transfer Enthusiast
The above video provides some qualitative insight into how frustrating it can be to search in today's IDEs.
During the last year or so my collaborators and I have been focused on impacting both the state-of-the-practice and the state-of-the-art in code search. To impact the state-of-the-practice we have created a code search extension for Visual Studio, called Sando. To impact the state-of-the-art we have used this extension as the basis for case studies, where we collected usage data from developers performing normal maintenance tasks. For us, this data has directly led to improvements in the upcoming release of Sando. However, to maximize our impact on the state-of-the-art we are releasing this humble data set to other researchers and developers in hopes that it helps them create an even better code search tool.
Context and Data Format
This data was collected during a comparative analysis of two search engines, namely the default Visual Studio search engine (i.e., Find in Files) and the Sando search engine, as developers worked in situ. It was collected in two separate phases. The first phase compared the current version of Sando at the time against FiF and contained 325 user queries. The second phase compared an improved version of Sando against FiF and contains 637 user queries.
The primary reason we are posting this data is to share the user queries we have collected. However, as the results were gathered during our case study they contain some additional information. The results for phase one are in the following format: <date>; Sando=<wins>, Lex = <wins> ; query=<query string>. An example data entry is shown below.
11-8-2012; Sando=6, Lex=3 ; query='reader'
The date represents the day on which the data was collected and the number behind Sando and Lex represents the number of 'wins' for each approach for this query. In the above example the developer clicked on 9 search resuls; 6 of those clicked items had a higher ranking in Sando's result set and 3 had a higher ranking in Lex's result set. In phase two the data format was slightly updated, adding: <number of Sando results>, <number of Lex results when grouped by program element>(<number of Lex results when grouped by line number>). An example data entry from this phase is shown below:
11-29-2012; EnhSando=1, Lex=0 ; query='MembershipUser' ; 20, 90(105)
In favor of brevity, details of our comparative study have been omitted from this post. For further information on this study please contact Kostadin Damevski, Lori Pollock, or myself for a preprint.
Here we provide the data in two separate files, the first from phase one and the second from phase two. We hope that this data helps further your research or improve your search tool!
David Shepherd leverages software engineering research to create useful additions to the IDE.