Moving beyond signature-based and behavior-based approaches to defending against cyberthreats in the cloud.

In the early days of the internet, cybersecurity was a manageable risk that could be contained with relative ease. In recent years, the explosion of data has made cyber-attacks much more difficult to monitor and detect. Identifying risks and vulnerabilities has become a complex exercise that is akin to us trying to find a tiny needle in an exponentially bigger stack of hay.

As cyber-attacks have become more sophisticated, organizations can no longer fight the villains without a disruptive change in the ways they handle cyberthreats.

In 2018, an estimated two million episodes of cyber-attacks resulted in more than US$45 billion in losses worldwide, according to Internet Society’s Online Trust Alliance (OTA). The damage goes much further than just financial losses; cyber-attacks cripple business operations and inflict harm on the victims, whose personal information are often exploited by cybercriminals.

As a leader in the cloud space, cybersecurity is a top priority for Alibaba Cloud. To this end, we have turned to cutting-edge technologies such as deep learning and artificial intelligence to secure our operations and protect our customers.

Traditional approaches to malware detection

Malware detection is an important component of cybersecurity. A diverse range of anti-malware offerings exist in today’s cybersecurity market, and they are generally guided by two approaches: signature-based and behavior-based.

In the signature-based approach, an anti-malware software scans files against a pool of known malware signatures. When the system sees a match, it deems the file malicious. This approach tends to be effective and accurate when dealing with known malware signatures. However, when it comes to the malwares that we encounter these days, this approach is no longer effective given the rapidly evolving nature of new malware signatures.

The behavior-based approach does not look at the files but their behaviors. When there are anomalous patterns of behavior, the system deems the file malicious. This approach is effective when dealing with new malware, as the decision is not based on precedence. However, it is also less accurate, as anomalous behaviors are not always a sign of malicious programs.

A new way is needed to bridge the gaps between the two approaches. How do we create a solution that works for new malware while ensuring maximum accuracy?

A new deep learning-based approach

In the past year, Alibaba Cloud has started applying a third approach, what we call a “deep learning-based approach” to malware detection.

Despite the increasing sophistication of new malware, we came to realize a simple truth about malware developers — they like to re-use codes. A new malware, or a new variant of an existing malware will include code lines that were used before.

If our system can compare new files with clusters of known malware families which share a similar “code base”, a decision can be made with high certainty.  

We know that deep learning – also known as deep neural network – could be used to classify malicious code, and they operate without explicit rules or human supervision. Once a deep neural network learns how to identify malicious code, it can start screening unknown programs real time and with extremely high accuracy rates.

The process of training a deep neural network involves the analysis of millions of legitimate and malicious files. While this is not easy, it is much easier than gathering a group of cybersecurity experts to manually pore through all the data we have on malwares.

Through deep learning, Alibaba Cloud has built a robust foundation for malware detection. Multiple file features and API calls form the input vectors for this neural network. Multiple layers of neural networks are generated to learn the abstract representation of malware.

Abstract representation: is this a bear or a dog?

Abstract representation can be explained with an example of image recognition.

If we are using deep learning to detect an image of a dog, we could start by training the intermediate layers of the networks to independently learn how best to identify a dog. These include common identifiers such as the dog’s ears, legs, tail. This is what we call the “abstract representation” of a dog.

When an image is presented, deep learning can compare the image with the abstract representation of a dog. Based on similarity scores on multiple layers, it will decide if the image represents a dog.

The figure below shows the image recognition of a dog, where multiple neurons can identify parts of the dog’s common features.

When the abstract representation of a potential malware is learned at an intermediate layer, the system uses a softmax function to instantaneously calculate the probability of the file being malicious. It is also able to calculate if the malware is likely to fall under the categories of trojan, ransomware, or virus.

When a neural network learns to predict the abstract representation of malware, it can also visualize the relationship between different malwares and create a map of distinct malware families.

As Alibaba Cloud has great traffic visibility on cloud-native malware, we have been uniquely positioned to build a highly trained deep neural network for several years already. This allows for quick identification of malware by family and code similarity.

Deep learning in action to detect ransomware

Alibaba Cloud has used its deep neural network to successfully detect ransomware, a type of malware designed to deny access to a computer system or data until a ransom is paid. Ransomware is often spread through phishing emails and social media, and it can cause a large amount of monetary loss and business disruption.

As new ransomware does not come with unique signatures, a signature-based approach does not work. However, we do know a couple of patterns about ransomware, for example, they will need to have certain APIs to perform the encryption function.

With this knowledge, the Alibaba Cloud malware detection system learns the behavior of different ransomware families and can block new ransomware which demonstrates similar characteristics.

The silver bullet to ending cybercrime?

Cybercrime is on the rise and is becoming more costly for organizations, according to the Ninth Annual Cost of Cybercrime Study released this year. The average cost of cybercrime for an organization has increased from US$1.4 million over the past year, to US$13.0 million during the last year.

Although AI is not the silver bullet to end cybercrime, it significantly shortens the time needed to detect malicious software, thereby reducing the damage to organizations.