Harvesting phishing sites for filtering has always been somewhat of an ongoing, uphill battle. Many phishing sites are designed to look as close to the legitimate webpage they’re imitating as possible. The more genuine looking, the greater the chance of someone willingly agreeing to hand over sensitive personal information, and also the tougher it is to determine if the site is legitimate or if it’s a phish. Phishing pages tend to have a high turnover rate as well, meaning that more often than not, the site will only be live for a day or two, sometimes even hours, before it’s discovered and taken down, or moved to a different URL. It’s reasons like these that have made tackling phishing sites a tedious chase.
After searching for a solution to this obstacle, OpenDNS Labs came up with the idea of using its phishing detection model, NLPRank, as a recommender system on PhishTank, the OpenDNS community based phishing verification system, to combat phishing verification woes.
The phishing detection process takes a submission (domain or URL) it’s given and checks them against any existing OpenDNS whitelists and ASN filters. This initial step is to filter our false positives and spammy submissions that are submitted to Phishtank. If the URL makes it past these first few checkpoints, it then retrieves the content and source code from the submission URL/ proposed phishing page for review. That source code is then, put through a machine learning algorithm, that in a nutshell, compares the submission content to a corpus of content from commonly spoofed brands and returns a similarity score. If the similarity score is above predetermined threshold, it’s deemed a phish. Figure 1 shows a diagram of how the system works:
The eventual plan is to integrate the results of our recommender system back into PhishTank and share results with the community. PhishTank’s current approach of “submit a domain and wait for the community to verify” has so far been formidable, as it continues to remain “best in class” as one of the largest sources for human curated data when it comes to phishing sites. However, the drawback here is that the current system continues to become more primitive as time marches on, and as the Time to Verify measurement grows larger, the efficacy of the feed suffers. This new recommender system increases the effectiveness of PhishTank and improve the overall experience for the thousands of users that utilize PhishTank’s verified phishes feed.
While this method is outstanding for real-time blocking of active phishes, it isn’t necessarily predictive, just highly reactive. Speed of classification aside, it’s still a “wait and see” approach. So how do we transition this into predictive verification? Consider the notion that phishes can sometimes be a bit like cockroaches. If you see one marching around your house, chances are there are a bunch more hanging out somewhere close by, out of sight. By taking the verified results of the NLP rank process, and pivoting through their host IP’s, we are able to uncover handfuls of other registered URL’s acting as targets for the very phishing campaign that was initially discovered. Often times when we continue to dig deeper, through attacker infrastructure and WHOIS records, we uncover even more of the same. By adding these IPs and registrants to our blacklists, we are able to stay ahead of the curve and greatly increase the chances of our users being protected from widespread phishing campaigns. Figure 2 shows a diagram of the Bulletproof Infrastructure Classification system. In this process, we first take dedicated phishing domains that we have caught with NLPRank. We then query OpenDNS Investigate for domains associated with the hosting IP and the registrant email address from their WHOIS records. We have then built a mini-classifier using different features from the domains on these infrastructures to classify registrants and IP addresses, and in turn push them to our block list. In this sense we can predict the infrastructures phishers will use as they are being setup, and we can block phishes sites even before they go live with spoofed content.
Our Phishing recommender system is still in its very early stages of production, however the results and accuracy from its output thus far have been exceptional. As we continue to push forward, and improve our implementation into PhishTank and OpenDNS, we only stand to increase the level of security that’s delivered to the thousands of OpenDNS users worldwide.
from OpenDNS Blog http://ift.tt/2eV3dh8