Introduction

Anyone who has had a server on the Internet will have experienced hundreds of access attempts, the search for strange paths testing to see if any fail or open an unexpected hole to sneak through.

Years ago, one could set up a server and have it exist more or less anonymously for as long as desired, but that has long ceased to be the case.

Now, you turn on a machine on the Internet, and within minutes, as if it were a zombie movie, the enemy is already trying to break down the door.

Let's explore what options we have and introduce a way to detect some of these attacks with our own open-source Terraform module.

What is a scanning attack?

In the case of the web, when we talk about a scanning attack, we are referring to an attack in which multiple requests are sent to our website to try to detect what services we have available, what ports are open, or what software we are using on our server.

In the server logs, we will see many accesses returning 404 (Not Found) errors, which also help the attacker see what we do in those cases (whether we return the error directly or if it goes through another system of ours that returns intermediate information that could be useful to the attacker). In short, they try to figure out how well protected we are.

Some of these searches that return 404 are attempts to access common addresses in other systems. For example, even if you are not using WordPress, there will be many attempts to access /wp-admin/, which is the default administration address for WordPress. (This is one of the reasons why, if you do have WordPress, one of the first things you can do to improve security is to change the admin access URL.)

What options do we have to protect ourselves?

A little earlier, we mentioned that we would show a way to detect these attacks using our own Terraform module, but before jumping into Marketing (bad Marketing, since it's completely free, but...), let me tell you about other options and why we created our module.

Web Application Firewall (WAF)

This is software that we place between the Internet and our website to block everything we don't want reaching our machine (including scans and bots). Some are more or less sophisticated, more or less "intelligent" and dynamic. Typical options include Cloudflare WAF, Google Cloud Armor, AWS WAF.

If you don't have your infrastructure on Google Cloud or Amazon AWS, I would go straight for Cloudflare WAF. It has a free tier, the paid version starts at reasonable amounts, and it’s relatively easy to use. Additionally, it integrates other tools, such as a CDN, etc. On the downside, all traffic to your website must go through Cloudflare.

Blocking IPs with Fail2ban

Fail2ban is open-source software that we can install on our server to monitor logs for signs of attacks and scans by bots or other types of users. If you're using a server with a Linux distribution, there's a high chance you already have it installed. You can check out more on its GitHub repository:

https://github.com/fail2ban/fail2ban

One issue with Fail2ban is that it must be installed directly on your machine, so it doesn't work for deployments in container management systems like Kubernetes or Google Cloud Run, or in serverless environments like Google App Engine.

Alternatives to protect ourselves in Google Cloud

We've already seen that there are different existing software options, such as Fail2ban and products from Google, Amazon, or Cloudflare, to protect our machines. So why should we try doing it differently with our Terraform module, which sets up a log sink and event queues using Pub/Sub? It sounds complicated.

The main reason is cost. The official and straightforward alternative from Google Cloud is Google Cloud Armor, which has two pricing models: Standard, where you pay based on usage, and Enterprise, where you pay a subscription based on the number of resources you want to protect.

The Standard model does not include "adaptive" protection, which generates dynamic protection rules based on the attacks you are receiving—exactly what interests us in this article. To get that feature, you must subscribe to the Enterprise plan.

Until recently, the Enterprise plan started at $3,000 per month, though now there is a more affordable "pay as you go" option ($200 per month for up to 2 resources and an additional $200 per resource per month). Then, there are additional costs for traffic.

-Alright, you’ve convinced me, I want to try other ways to achieve the same result.

Google offers what are called log sinks. A sink is literally like a drain, but technically, it is a way to filter logs in Google Cloud and send the filtered data elsewhere. It’s like making a small hole in the stream of information flowing through our pipeline and directing whatever falls through that hole into another bucket. This has many applications; for example, if you're integrating Datadog with Google Cloud for analytics, system monitoring, application profiling, and traceability, Datadog's recommended way to send logs from Google Cloud is through a sink.

I added a picture of a sink to break up the text a bit and because, as I mentioned, "sink" literally means that.

But I’m rambling. Basically, what we can do with our sink is filter logs that return a 404 (Not Found) error and send them to an event queue. Then, this event queue is processed by a function that measures how many of these errors occur within a given time window and whether they come from the same user.

Of course, if we send logs to an event queue and process them, each log entry will be independent from the rest for the receiving function. We need a way to make it remember what has been happening. In our case, we do this by storing each request in Redis every time a specific IP requests a resource and gets a 404 error. To detect if we've been attacked more than X times in the last hour, for example, we store those calls with a one-hour expiration. If we wanted a 30-minute window, we’d set a 30-minute expiration—this is all configurable.

When the function processing the events detects an attack, it triggers another event to a separate Google Pub/Sub queue.

We could configure all of this manually, but making changes later could become overly complex. If, at some point, we need to modify something or deploy this across multiple environments, manually repeating every step could lead to a huge potential for errors. And let's be honest—there are already plenty of existing solutions for this...

Our module in Terraform

Terraform is an Infrastructure as Code (IaC) tool that allows you to define, deploy, and manage cloud resources in an automated way. Instead of manually creating everything we've discussed so far (Pub/Sub queues, Redis, the Google Cloud Function that receives the events, the log sink) from your provider's console, with Terraform, you can write a configuration file, run a couple of commands, and have your entire infrastructure ready in seconds.

It works with multiple platforms like Google Cloud, AWS, and Azure, and can be used to automate deployments and keep everything under control with well-documented versions and changes.

That’s why we created a Terraform module—to set up everything we've covered in this article—and we've made it publicly available on GitHub.

In our Terraform module, you’ll find variables like not_found_request_limit to define how many requests we allow and not_found_request_window for the time window in seconds.

In other words, if more requests than the defined not_found_request_limit come from the same IP, returning a 404 error, within the not_found_request_window seconds, an alert will be triggered.

This alert will then be sent as an event to a Google Pub/Sub topic, allowing further processing as needed.

The available variables are:

project: The GCP project where it will be deployed.
region: The GCP region where it will be deployed. Default is europe-west1.
resource_prefix: The prefix used for all resources. Default is scan-attack-detector.
not_found_request_window: The time window for checking 404 requests. Default is 60.
not_found_request_limit: The limit of 404 requests within the time window before triggering an attack alert. Default is 10.
sink_filter: The filter used for the sink. Default is protoPayload.status=404 OR httpRequest.status=404.
temporary_artifact_bucket_name: The name of the bucket used for temporary artifacts.
redis_host: The host of the Redis instance.
redis_port: The port of the Redis instance. Default is 6379.
redis_database: The database used in the Redis instance. Default is 0.
redis_vpc_connector_id: The ID of the VPC Access Connector used for the Redis instance. Default is null.
redis_vpc_connector_egress_settings: The egress settings used for the VPC Access Connector. Default is null.

That concludes our explanation of what scan attacks are and how we can receive alerts in Google Cloud (easily applicable to other environments) by using Terraform to configure an event filtering system based on a log sink.

You can find the module and more documentation at: https://github.com/softspring/tf-google-cloud-scan-attack-detector

📫

Here’s today’s article. Feel free to reach out to us on social media as always, or at hola@softspring.es with any questions or suggestions!

Detecting attacks on Google Cloud

Introduction

What is a scanning attack?

What options do we have to protect ourselves?

Alternatives to protect ourselves in Google Cloud

Our module in Terraform

Do you want to tell us your idea?