Musings of a Tech Transfer Enthusiast
I read a few articles and a book about what companies usually look for in a full-time candidate and was wondering whether I could fit the bill with the research project in place of the usual internship.
Recently I gave a guest lecture at NCSU for their software engineering course. One of the students I chatted with afterwards will be using this summer to work on his research thesis. This particular thesis work involves significant software development. Having just heard my talk on the importance of gaining software development experience prior to graduation he was wondering how to get the most out of his summer work. Here is my advice for both this student and any other student taking on summer research.
Open Source Your Work
Academia affords students great opportunities for opening up their work. While some specific DoD grants may have restrictions, most typical grants (e.g., NSF or industrial grants) allow or even encourage open sourcing your work. As a student, open sourcing your research code will have two major benefits. First, your code will become much cleaner. While many developers may have some bad habits they relapse into when committing to a closed repository, when they write code they know will be public the code is simply better. This will not only make for a more pleasant summer but will also help you as you begin to build a programmer's portfolio, or a set of publicly available projects that you have written. This second benefit of open sourcing your code, while not a new idea, this is a surprisingly simple way of gaining credibility with potential employees. For instance, two of the three interns that made it past the first round at ABB this year provided us with links to their publicly available code (note: the third provided a live demo of their work).
Collaborate if at all Possible
While generating a sizable project alone can certainly be impressive working with others is an important part of experience that many students are lacking. If at all possible, collaborate with other researchers and students on your code. This can be as simple as building upon an existing framework or a set of libraries that another student has generated. For instance, during my time at UDel summer undergrad students would often start from an existing tool (e.g., a search tool) and improve a single component of that tool (e.g., an abbreviation expander), investigating the effect of that component's improvement on the overall performance. This forced them to not only understand that existing component, but to also learn how that component interacted with the larger system. In most research labs there is ample opportunity to build upon others' work.
I have a planned my schedule for the summer and I notice that I could spare time in a day for projects/ work outside my thesis.
One point I want to warn against is losing focus. While a summer may seem like a long time, it is short. So short, in fact, that at ABB we communicate with upcoming interns for months in advance to hammer out a well-scoped idea so that when they join us for the summer they can have a chance at finishing something significant. Thus, I advise students to focus on a single project. When doing this, it is important that this project have a significant coding component, as this student has, so that, should he/she have any extra time they can continue to polish and refactor and improve this code base ad infinitum. If a student can finish the summer with a well-polished research prototype, with clean code and a working demo, in addition to an (eventually) completed thesis, that student will certainly have good job prospects.
Enjoy the Summer...
I hope this post helps you as you organize your summer. To summarize, I recommend open sourcing as much as you can, working with others whenever possible, and focusing on a single project. Following these guidelines I'm sure that you can have a polished, well-narrated demo video posted on youtube by the end of the summer (add a link in the comments!). For those of you pursing PhDs in software engineering that enjoy both research AND developing software tools, consider applying to our internship program next year.
Many software engineers (i.e., those that actually program on a daily basis) are unaware of the dedicated sub-field of software engineering researchers (like these from Microsoft) whose mission is to help make the daily grind of writing software better. While software engineering researchers have historically had limited practical impact, there are some notable companies and tools that were born out of software engineering research, and many brilliant, driven individual researchers who want to have impact. I hope that raising awareness of this research among software engineers will encourage more feedback to the software engineering research community, ultimately leading to more useful output.
To that end, I'm posting a short 3 minute video that provides a quick overview of a typical software engineering research project at ABB. I hope this video gives you a better sense of what we're working on, our balance between theoretical and practical impact, and what types of technologies we're investing in. If you're a software engineer we'd love to hear what you think about our research directions, how we could improve, or even just what problems are currently slowing you down. Feel free to leave comments or even contact me directly.
Can you remember the last time you tried to search your code base for a relevant method or code snippet? I can, and it was a disaster. I used Visual Studio’s “Find in Files”, which returned 100s of results. After reviewing about twenty of them I gave up and started surfing the Solution Explorer, opening files one-by-one. After about 20 minutes of reading source files I began to wonder if there was a better way...
Unfortunately this pattern of failure is all too common, as recent studies show that close to 90% of code searches during software maintenance tasks fail. While fruitless searches themselves can be annoying it’s the time spent reading and exploring irrelevant code that really compounds this cost. If you’re tired of failed searches and costly sidetracks read on to discover Sando, a new Visual Studio extension that leverages recent research advances to significantly improve code search.
Why Use Sando?
Sando, our new Visual Studio Extension for searching source code, is built upon solid research findings, and in our opinion much better than the regular expression technology it replaces. We could argue that leveraging information retrieval instead of regex technology naturally leads to faster search execution. We could claim that multi-term search is difficult and annoying with regular expressions. We might even point out that regex-based searches return unranked results instead of highlighting the most relevant results. However, having been burnt by savvy marketing in the past we prefer to simply show a few concrete results of running Sando against Visual Studio's "Find in Files" and let you decide for yourself (note: these example searches were run on Sando's own source code).
One of the trickiest types of searches involves searching for a term that is common in the given domain. Imagine searching for “text” in a text editor project or “paint” in a drawing project; you’re likely to be overwhelmed with 100s of results. This is exactly the case when searching Sando’s source code for the term “search” using "Find in Files" (shown above, left). However, using Sando with the same search term users can find relevant source code elements faster. Sando leverages information retrieval to find elements where the term “search” is more central. Thus, as you can see above (right), methods like “Search” are ranked highly in the search results.
While unranked results can be annoying to sift through a more pressing problem may be missing results. Imagine using "Find in Files" with the search string “file open”. The result, as shown above on the left, is an empty set, yet I know that there exists relevant code in this solution. Fortunately, since Sando is not sensitive to word ordering it returns the method OpenFile as the first result, shown above right. Manually re-ordering search terms becomes a thing of the past when using Sando.
While creating this post I noticed another difference between these two tools, the wait times. While the times are not drastically different, with "Find in Files" taking 5 – 10 seconds and Sando returning almost instantly, the lag time enforced by "Find in Files" discourages follow up queries and often leads to abandoned searches. As you might imagine, abandoned searches often lead to sub-optimal behavior (e.g., Solution Explorer surfing) and thus the instant results offered by Sando can be a significant advantage.
Try Sando's Beta Release!
Today we have only scratched the surface of ways in which Sando’s technology outperforms regular-expression-based searchers. However, we don’t expect you to take our word for it. Please try Sando's Beta Release today and don't hesitate to let us know what we should improve!
The IDE is a software developer's toolbench. For the most part it's an ordered chaos, with all of a developer's favorite windows and extensions at the ready. Unfortunately, the IDE's complexity helps to hide some of its glaring shortcomings -- it's easy to focus on the shiny wrenches across the front while ignoring your Grandfather's 1970s disc sander that's about to melt down from misuse. Like Grandpa's sander, code search* is a tool that's in dire need of an overhaul.
* In this article code search refers to the act of searching over the code base that one is currently working on, not searching over repositories of code.
Code Search Sucks.... Really?
Developers, having grown up on awful search tools, have become numb to the problem. Many may not even believe it is a problem! In order to illustrate the sorry state of code search let's explore an analogy based on web search...
I've got a new web search service that I'd like for you to try, called Gaagle. It's incorporated all of the key features of modern code search. Let's try searching for "Visual Studio" using Gaagle:
Ohps! I misspelled "Visual Studio", but Gaagle doesn't fix my mistake (or even point it out). Gaagle simply obediently reports that there are, in fact, no matches for the search terms "Vizual Sudio". Thanks Gaagle!
Having corrected my search terms to "Visual Studio" I search using Gaagle again. This time I get lots of hits! However, the hits are visualized as a list of paths, leading to an exciting game of "what's behind that link!" After clicking on each link in the list I have a pretty good idea (and a sore mouse-finger).
But wait, there's been an upgrade to Gaagle! Instead of returning results in an ordered list Gaagle now returns unordered results. That means that each and every web page in the world that contains the terms "Visual Studio" are returned to me for my search. No worries! I start wading through the thousands of (mainly irrelevant) hits until I find what I'm looking for.
Not ready to switch to Gaagle? Well surprise... you're already using it! If you're using Visual Studio 2010 or earlier your code search (i.e., Edit > Find and Replace > Find in Files) operates almost exactly like Gaagle.
Join Us in Fixing Code Search!
Visual Studio, so powerful in many other ways, has neglected code search. This January we're starting a project to modernize code search, called Sando. If you long for a modern code search experience please join us at http://sando.codeplex.com/. We're looking for great developers (with C# or Java experience) but also for any suggestions or requests that can help shape our search tool.
David Shepherd leverages software engineering research to create useful additions to the IDE.