Musings of a Tech Transfer Enthusiast
The above video provides some qualitative insight into how frustrating it can be to search in today's IDEs.
During the last year or so my collaborators and I have been focused on impacting both the state-of-the-practice and the state-of-the-art in code search. To impact the state-of-the-practice we have created a code search extension for Visual Studio, called Sando. To impact the state-of-the-art we have used this extension as the basis for case studies, where we collected usage data from developers performing normal maintenance tasks. For us, this data has directly led to improvements in the upcoming release of Sando. However, to maximize our impact on the state-of-the-art we are releasing this humble data set to other researchers and developers in hopes that it helps them create an even better code search tool.
Context and Data Format
This data was collected during a comparative analysis of two search engines, namely the default Visual Studio search engine (i.e., Find in Files) and the Sando search engine, as developers worked in situ. It was collected in two separate phases. The first phase compared the current version of Sando at the time against FiF and contained 325 user queries. The second phase compared an improved version of Sando against FiF and contains 637 user queries.
The primary reason we are posting this data is to share the user queries we have collected. However, as the results were gathered during our case study they contain some additional information. The results for phase one are in the following format: <date>; Sando=<wins>, Lex = <wins> ; query=<query string>. An example data entry is shown below.
11-8-2012; Sando=6, Lex=3 ; query='reader'
The date represents the day on which the data was collected and the number behind Sando and Lex represents the number of 'wins' for each approach for this query. In the above example the developer clicked on 9 search resuls; 6 of those clicked items had a higher ranking in Sando's result set and 3 had a higher ranking in Lex's result set. In phase two the data format was slightly updated, adding: <number of Sando results>, <number of Lex results when grouped by program element>(<number of Lex results when grouped by line number>). An example data entry from this phase is shown below:
11-29-2012; EnhSando=1, Lex=0 ; query='MembershipUser' ; 20, 90(105)
In favor of brevity, details of our comparative study have been omitted from this post. For further information on this study please contact Kostadin Damevski, Lori Pollock, or myself for a preprint.
Here we provide the data in two separate files, the first from phase one and the second from phase two. We hope that this data helps further your research or improve your search tool!
I read a few articles and a book about what companies usually look for in a full-time candidate and was wondering whether I could fit the bill with the research project in place of the usual internship.
Recently I gave a guest lecture at NCSU for their software engineering course. One of the students I chatted with afterwards will be using this summer to work on his research thesis. This particular thesis work involves significant software development. Having just heard my talk on the importance of gaining software development experience prior to graduation he was wondering how to get the most out of his summer work. Here is my advice for both this student and any other student taking on summer research.
Open Source Your Work
Academia affords students great opportunities for opening up their work. While some specific DoD grants may have restrictions, most typical grants (e.g., NSF or industrial grants) allow or even encourage open sourcing your work. As a student, open sourcing your research code will have two major benefits. First, your code will become much cleaner. While many developers may have some bad habits they relapse into when committing to a closed repository, when they write code they know will be public the code is simply better. This will not only make for a more pleasant summer but will also help you as you begin to build a programmer's portfolio, or a set of publicly available projects that you have written. This second benefit of open sourcing your code, while not a new idea, this is a surprisingly simple way of gaining credibility with potential employees. For instance, two of the three interns that made it past the first round at ABB this year provided us with links to their publicly available code (note: the third provided a live demo of their work).
Collaborate if at all Possible
While generating a sizable project alone can certainly be impressive working with others is an important part of experience that many students are lacking. If at all possible, collaborate with other researchers and students on your code. This can be as simple as building upon an existing framework or a set of libraries that another student has generated. For instance, during my time at UDel summer undergrad students would often start from an existing tool (e.g., a search tool) and improve a single component of that tool (e.g., an abbreviation expander), investigating the effect of that component's improvement on the overall performance. This forced them to not only understand that existing component, but to also learn how that component interacted with the larger system. In most research labs there is ample opportunity to build upon others' work.
I have a planned my schedule for the summer and I notice that I could spare time in a day for projects/ work outside my thesis.
One point I want to warn against is losing focus. While a summer may seem like a long time, it is short. So short, in fact, that at ABB we communicate with upcoming interns for months in advance to hammer out a well-scoped idea so that when they join us for the summer they can have a chance at finishing something significant. Thus, I advise students to focus on a single project. When doing this, it is important that this project have a significant coding component, as this student has, so that, should he/she have any extra time they can continue to polish and refactor and improve this code base ad infinitum. If a student can finish the summer with a well-polished research prototype, with clean code and a working demo, in addition to an (eventually) completed thesis, that student will certainly have good job prospects.
Enjoy the Summer...
I hope this post helps you as you organize your summer. To summarize, I recommend open sourcing as much as you can, working with others whenever possible, and focusing on a single project. Following these guidelines I'm sure that you can have a polished, well-narrated demo video posted on youtube by the end of the summer (add a link in the comments!). For those of you pursing PhDs in software engineering that enjoy both research AND developing software tools, consider applying to our internship program next year.
David Shepherd leverages software engineering research to create useful additions to the IDE.