Android app repackaging threatens the health of application markets, as repackaged apps, besides stealing revenue for honest developers, are also a source of malware distribution. Techniques that rely on visual similarity of Android apps recently emerged as a way to tackle the repackaging detection problem, as code-based detection techniques often fail in terms of efficiency, and effectiveness when obfuscation is applied. Among such techniques, the resource-based repackaging detection approach that compares sets of files included in apks has arguably the best performance. Yet, this approach has not been previously validated on a dataset of repackaged apps.
In this paper we report on our evaluation of the approach, and present substantial improvements to it. Our experiments show that the state-of-art tools applying this technique rely on too restrictive thresholds. Indeed, we demonstrate that a very low proportion of identical resource files in two apps is a reliable evidence for repackaging. Furthermore, we have shown that the Overlap similarity score performs better than the Jaccard similarity coefficient used in previous works. By applying machine learning techniques, we give evidence that considering separately the included resource file types significantly improves the detection accuracy of the method. Experimenting with a balanced dataset of more than 2700 app pairs, we show that with our enhancements it is possible to achieve the F-measure of 0.9919.