We briefly present in this position paper multi-level approaches to find source code matches. The matches can be retrieved from a factorized call graph of tokenized functions : it allows to infer similarity metrics at a function level. If syntax trees are considered, similarity at different structural levels can be assessed. We discuss the ability to find approximate matches in two steps: first we locate exact matches (germs) according to an abstraction profile for syntax tree hashing and next we gather the germs according to their proximity in their respective host syntax trees.
This position paper has been presented in the 4th International Workshop on Software Clones (IWSC'2010), a satellite event of the International Confererence on Sofware Engineering (ICSE'2010) in Cape Town (South Africa).
Downloads: