Good bots vs. bad bots

Why deep learning provides unique value and opportunity.

Most of us are all familiar with the concept of bots – those small chunks of software designed to perform simple, automated tasks.  It’s commonly known in the technology world that more than half of a website’s traffic comes from bots.

The many faces of bots

Bots can be categorized into good bots or bad bots.  Good bots are those that are meant for enabling more accurate web searches, chat you through an online order or many other helpful tasks.

The not-so-good bots are meant to slow down the website, negatively impacting customer satisfaction. These bad bots might capture data — especially pricing data from a competitor’s website – or simply steal personal and financial data such as credit card information.

However, how we label a bot as good or bad can be highly subjective.  What’s “good” for one business may not be good for another one.

Just like the battle between computer viruses and anti-virus software, bad bots are becoming increasingly more sophisticated as our ability to detect and differentiate good bots from bad bots improves.

For example, one of the significant characteristics of bad bots used to be quantity.  We detected bots by looking for large numbers of requests and a repeating visiting pattern.  Today’s bad bots have learned to avoid large quantities; rather, they focus on quality.

Another telltale sign of bad bots was a large number of requests originating from one IP address. In response, some of bad bots no longer attack from the same IP address, and in fact, the majority of bad bots today attack from a pool of IP addresses.

Furthermore, bad bots increasingly mimic human behavior, with the hope that they would be detected but classified as “human”. There is now even a new term for these bots, referred to as APBs: Advanced Persistent Bots.

Analytics for bot detection

The ever-increasing complexity of bad bots presents an ongoing challenge for companies whose digital presence is essential to their business. This is where analytics can provide unique value and opportunity.

To understand this, we need to first understand the traditional approach of detecting bots based on IP counts.

For example, as a shopper I might visit a website for 15 minutes, sending about 20 page requests.  However, if there are 200 page requests in a 10-minute window from the same computer, that’s likely a bot since a human can’t browse that fast.

Bots were also detected based on the geographical location of the IP addresses.  Each company tends to have a targeted segment of customers.  For example, a company may have 95% of their customers based in the U.S., so if one of these customers travels to Australia and browses from there, it’s not suspicious.

However, if all of a sudden there are hundreds of web requests from customers in Australia, typically a relatively inactive location, then this may be a bot.  Traditional bot-detecting mechanisms tend to focus on volume, initiating IP address, and some foundational statistical methods such as sum or average.

Unfortunately, these traditional techniques are losing ground in bot detection as the bad ones become more advanced.  But analytics, especially deep learning, can introduce an entirely new approach to make our bot detection and mitigation effective again.

Why deep learning

Deep learning, according to Wikipedia, is “a branch of machine learning based on a set of algorithms that attempt to model high-level abstractions in data by using multiple processing layers, with complex structures or otherwise, composed of multiple non-linear transformations.”

In other words, with enormous speed, deep learning can recognize human-like complex patterns, perform an adoptive style of learning of these patterns, and then sniff out suspicious behavior.

Deep learning can be especially effective in recognizing complex bots because of the recent advancements in neural networks.  The more complex a bot, the more it resembles a human.  Neural networks provide the ability to correlate a significantly larger number of variants, in multiple layers, creating a completely new style of behavioral learning that is more dynamic and ongoing – more like a human’s.

Neural networks provide predictions on a much more real-time basis.  As a new complex bot emerges, deep learning systems will not only be able to quickly “learn” the bot’s new behavior patterns and how it differs from a real human visit, but it will also continue the learning as the bot changes behavior so that previous insights can be leveraged.

At CA Technologies, we are exploring better and more effective ways to differentiate good bots and bad bots by leveraging deep learning.  To see it in action, check out Forty2.io, a CA Accelerator project.


Jin Zhang is a CA Technologies intrapreneur and head of the forty2.io team within the…

Comments

rewrite

Insights from the app driven world
Subscribe Now >
RECOMMENDED
The Sociology of Software >How (Not) to Lie with Data Visualization >DevOps and Cloud Computing: Exploiting the Synergy for Business Advantage >