Welcome to the download page for FRAN

Recommending Random Walks

Zachary M. Saul, Vladimir Filkov, Premkumar Devanbu, Christian Bird

Abstract

Recommender systems can help programmers by finding functions that are closely related to a function of interest. We improve on previous recommenders by taking advantage of the layered structure of software, discovering the functions that share a caller or a callee with the query function. Further, we use a random-walk approach, crudely mimicking the far more focused behavior of a developer, who intelligently browses the caller-callee links in the callgraph of a large program, seeking routines that are most likely to be related to a function of interest. Inspired by Kleinberg's work[1], we roughly approximate the steady-state of a particular type of infinite random walk on a subset of a callgraph so that we can rank the functions by their steady-state probabilities. Surprisingly, this purely structural approach works quite well. Our approach, like that of Robillard's Suade algorithm[3], and earlier data mining approaches [2] relies solely on the always available and truthful current state of the code, rather than other sources such as comments, documentation or revision information. Using the Apache API documentation as an oracle, we perform a quantitative evaluation of our method, finding that our algorithm dramatically improves upon Suade for this task. We also find that the performance of traditional data mining approaches is complementary to ours; this leads naturally to an evidence-based combination of the two, which shows excellent performance on this task.

Copy of a draft paper is available. Please send us comments. The paper has been accepted to ACM SIGSOFT ESEC/FSE 2007; final version will shortly be posted.

Algorithm Description

The Finding with Random Walk (FRAN) algorithm consists of two phases.
  1. First, Select a focused subgraph of the callgraph to return as our set of results. We call this set the base set. The base set consists of the union of the parent set, the sibling set and the spouse set. These sets are labled in the figure below.
  1. Next, rank the nodes in the base set using the authority scores returned by the web search algorithm Hits [1]. This ranked base set is returned as the set of related results.

Download

Please download our source and data.

References

[1] J. Kleinberg. Authoritative sources in a hyperlinked environment. In Proceedings, 9th SIAM Symposium on Discrete Algorithms, New York, NY, 1998. ACM, ACM.
[2] A. Michail. Data mining library reuse patterns using generalized association rules. International Conference on Software Engineering, pages 167-176, 2000.
[3] M. Robillard. Automatic generation of suggestions for program investigation. ACM SIGSOFT Software Engineering Notes, 30(5):11-20, 2005.

For comments, suggestions and requests, please contact me.

This material is based upon work supported by the National Science Foundation under Grant No. 0613949 (NSF SOD-TEAM) Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation