【Table of contents】
- Introduction
- How to Configure Web ACLs
- List of Rules
- How to Allow or Deny Permissions
- Using with WafCharm
- Conclusion
1. Introduction
On April 1, 2021, Amazon introduced a bot control feature for AWS WAF.
The purpose of this function is to reduce unnecessary resource consumption, prevent downtime caused by excessive access, and control access that site administrators do not want. It can also be controlled to not block accesses that you want to allow, such as search engine crawlers, etc.
There is currently no bot mitigation feature, so it isn’t possible to prevent short-lived high load accesses from unknown malicious bots.
We’re going to walk you through how to set up and incorporate Bot Control with WafCharm.
2. How to Configure Web ACLs
The bot control function is provided as a managed rule.
Please note that unlike the other managed rules released by AWS, bot control incurs a provisioning fee ($10.00 USD / month).
For more information about the cost, please check the official information:
https://aws.amazon.com/waf/pricing/
On the AWS WAF dashboard, navigate to the Web ACL you intend to set up bot control for and select "Add rules", then select "Add managed rule groups".
Select "AWS managed rule groups" and change the Action for "Bot Control".
Unlike other AWS managed groups, these are "Paid rule groups", which clearly indicates that they are paid for.
Click "Add rules" at the bottom of the screen to apply them.
Next, change the order of the rules as you wish and click "Save" to complete.
Bot Control provides a dedicated monitoring screen so that you can check the detection status of bots only.
3. List of Rules
Rule Name | Overview |
---|---|
CategoryAdvertising | Advertising Bot |
CategoryArchiver | Archive bot |
CategoryContentFetcher | Content acquisition bot |
CategoryHttpLibrary | Use HttpLibrary |
CategoryLinkChecker | Link checker bot |
CategoryMiscellaneous | Others |
Category Monitoring | Monitoring tool |
CategoryScrapingFramework | Use Scraping Framework |
CategorySearchEngine | Search engine bot |
CategorySecurity | Security Scanner |
CategorySeo | SEO bot |
CategorySocialMedia | Social media bot |
SignalAutomatedBrowser | Automated browser |
SignalKnownBotDataCenter | A bot whose data center used as the bot's infrastructure is srcip |
SignalNonBrowserUserAgent | UserAgent not the browser UserAgent |
4. How to Allow or Deny Permissions
Changing the action of each rule
Since some bots are useful depending on the characteristics of websites and businesses, it is better to install them with COUNT while monitoring the log.
*Official rule information
https://docs.aws.amazon.com/waf/latest/developerguide/aws-managed-rule-groups-list.html
Control by Labels
Labels are another newly released feature that adds a descriptive label to a web request when a WAF rule matches the request, regardless of the action associated with the rule. Additionally, it is possible to create rules that use this label.
The labels have been applied to this bot control and are available for use.
The following is only a partial list of available labels:
For use in bot control, you can exclude specific patterns from specific rules; the image in JSON may be easier to understand, so please see the below example.
As a prerequisite, the target rule (the rule to be labeled) must be set to COUNT. Create a label match rule for the rule that follows it (that follows the managed rule).
{ "Rule": { "Name": "match_rule", "Statement": { "AndStatement": { "Statements": [ { "LabelMatchStatement": { "Scope": "LABEL", "Key": "awswaf:managed:aws:AWSManagedRulesBotControlRuleSet:bot:category:monitoring" } }, { "NotStatement": { "Statement": { "LabelMatchStatement": { "Scope": "LABEL", "Key": "awswaf:managed:aws:AWSManagedRulesBotControlRuleSet:bot:name:pingdom" } } } } ] } }, "RuleLabels": [], "Action": { "Block": {} } } }
We can exclude pingdom from the BLOCK target by creating a rule that BLOCKs everything except those labeled pingdom when the rule CategoryMonitoring is met.
Since the content of the NotStatement can also use existing statements, it is also possible to exclude using rules that use string matching conditions or rules that use IP addresses.
Filtering using ScopeDownStatement in Managed Rules
This is another newly released feature that allows you to add conditions to the managed rules themselves. You can add conditions to the managed rules themselves, such as "only login pages" or "only users outside of a specific country".
Let's check this out in JSON format.
{ "Name": "AWS-AWSBotControl-Example", "Priority": 5, "Statement": { "ManagedRuleGroupStatement": { "VendorName": "AWS", "Name": "AWSManagedRulesBotControlRuleSet", "ExcludedRules": [ { "Name": "CategoryVerifiedSearchEngine" }, { "Name": "CategoryVerifiedSocialMedia" } ] }, "VisibilityConfig": { "SampledRequestsEnabled": true, "CloudWatchMetricsEnabled": true, "MetricName": "AWS-AWSBotControl-Example" }, "ScopeDownStatement": { "ByteMatchStatement": { "SearchString": "login", "FieldToMatch": { "UriPath": {} }, "TextTransformations": [ { "Priority": 0, "Type": "NONE" } ], "PositionalConstraint": "CONTAINS" } } } }
ScopeDownStatement is given to the managed rule as a compound condition, and the string matching condition is that the URI must contain login. In other words, if the URI does not contain login, it is excluded. You can also reverse the string matching condition with NotStatement.
Exclusion using custom headers
The custom header feature is another newly released feature that allows you to add a custom header when a rule is matched. You can create a rule that matches a specific condition on the premise of a bot control rule and add a custom header to it. The header can then be used to exclude the rule from the ScopeDownStatement by linking it to the NotStatement condition.
Example:
- If "googlebot" is included in the user agent, give the header name: x-bypass-secret and the value: exclude. Set the action to COUNT.
- Give the bot control rule a ScopeDownStatement, and give it a NotStatement with a string matching condition that the header name: x-bypass-secret contains the value: exclude.
Example of JSON description of the rule in 2:
{ "Name": "AWS-AWSBotControl-Example", "Priority": 5, "Statement": { "ManagedRuleGroupStatement": { "VendorName": "AWS", "Name": "AWSManagedRulesBotControlRuleSet", "ExcludedRules": [ { "Name": "CategoryVerifiedSearchEngine" }, { "Name": "CategoryVerifiedSocialMedia" } ] }, "VisibilityConfig": { "SampledRequestsEnabled": true, "CloudWatchMetricsEnabled": true, "MetricName": "AWS-AWSBotControl-Example" }, "ScopeDownStatement": { "NotStatement": { "Statement": { "ByteMatchStatement": { "SearchString": "exclude", "FieldToMatch": { "SingleHeader": { "Name": "x-bypass-secret" } }, "TextTransformations": [ { "Priority": 0, "Type": "NONE" } ], "PositionalConstraint": "EXACTLY" } } } } } }
There have been multiple feature releases, and the number of patterns that can be supported by exclusion settings has increased, especially in the managed rules.
Please refer to the official AWS information for configuration examples.
https://docs.aws.amazon.com/waf/latest/developerguide/waf-bot-control-examples.html
5. Using with WafCharm
We have tested the bot control function and confirmed that there is no problem when used together with WafCharm rules. If you are having trouble accessing bots, please try using them together.
6. Conclusion
This is a very cost effective rule considering that it can be used with WCU 50. If you create a similar condition with your own string match condition, it will consume a lot of WCU. Although the additional cost is a concern, I recommend implementing it.
First, let's apply it in COUNT condition and see how much access you get.
If your site is targeted at a specific audience, you may not need to allow bots to access your site, so BLOCK may not be a problem.
For example, you can allow only search crawlers to access your site, so you don't have to use unnecessary resources.
There are several ways to exclude specific access patterns, such as narrowing down URIs, in the AWS WAF, but they may be a little difficult to configure. If you are familiar with the AWS WAF and have a WAF Capacity Unit available in your Web ACL, you can create a rule to exclude specific access patterns. If not, it may be a another way to use robots.txt for the purpose.