Defender Geo IP banning and Bots

There is a setting in ‘Defender > Firewall > IP Banning > Locations’ to “ban countries you don’t expect/want traffic from to protect your site from unwanted hackers and bots in a specific location.”

I am interested in this but I wonder if this will block bots found in the allow list at Defender > Firewall > User Agent Banning > User Agents > Allow List.

My concern is that by banning countries, Google Bots from those countries may not be able to access the site.

 

Can you confirm if this is the case or not?

  • Adam
    • Support Gorilla

    Hi Matt

    I hope you’re well today and thank you for your question!

    I understand your doubts and I’m afraid they are fully justified. Currently the country block in Defender can be overwritten by adding IPs to allow list (so specific IPs would be allowed to access site even if they are coming from blocked countries) but user agent will not override that.

    I have asked our Defender Team to look into it to check whether this would be a bug or rather an intended behavior that could possibly be improved in future but that will take some time and I’m not able to give any ETA on changes.

    For some services there are available IP lists but as far as I’m aware, not for Google – so that is a bummer.

    An option could be to manually block IP ranges (identified, based on logs/stats as possibly unwanted) using IP blocklist in Defender and/or also block unwanted user agents. (Though this actually is quite a “fragile” protection as user agent can very easily be changed and most of “malicious” bots do change them. On the other hand, allowing Google user agents without verification (and user agent blocking tool in Defender cannot verify those bots currently) is also a bit “fragile” due to the very same reason.

    So a more secure workaround could be to use CloudFlare and it’s WAF to block traffic based on country at CloudFlare level; as it allows making exceptions for search engines (see here) but if I’m not mistaken, it may only be available for Pro (paid) plans.

    Kind regards,
    Adam

  • Adam
    • Support Gorilla

    Hi again Matt

    After posting above response, I had a hunch to check it again and realized that I was a bit wrong about Google. In fact they do provide their *IPs:

    https://developers.google.com/search/docs/crawling-indexing/verifying-googlebot#use-automatic-solutions

    The format is a bit “inconvenient” to use in Defender but nonetheless IPs are available so if you block countries but then add those IPs to IP allow list – Google would be able to access the site despite country block.

    Best regards,
    Adam

  • Matt
    • Flash Drive

    Hi Adam,

    Thanks for the detailed response! That’s very informative.

    I’ll give the workaround a try by manually blocking IP ranges and utilizing the Google IPs you provided. Your suggestion about using CloudFlare for country-level blocking and making exceptions for search engines is also a great idea, and I’ll explore that option too.

    I understand that these changes might take some time and I appreciate your assistance in the meantime.

  • Adam
    • Support Gorilla

    Hi Matt

    I’m glad I could help at lest that much.

    In the meantime, I got further feedback from our Defender team. Developers confirmed that the current user agent allow list behavior is a bug and they’ll work on a fix. Though at this point we are considering a bit different approach – making it optional (with some checkbox in settings) for site admin to be able to actually decide if User Agent allow list should or should not overpower country blocking. I think that’d be the best of both words and give some “flexibility”.

    We’ve also added implementing the “allow verified Google bots” feature to the “to do” list for future features – in which case when enabled (it would be optional as well) plugin would attempt to verify if a request appearing (by user agent string) really is from legitimate Google bot and if yes, it would allow it despite any other blocks/lockouts in place.

    I can’t really tell when this would be added but I hope that’s a good news anyway. And also thank you for pointing this out – as we were able to identify the bug and come up with this new feature thanks to your feedback!

    Best regards,
    Adam

  • André
    • André Bothma

    Hi there, WPMUDev.
    So I’m also having issues around Google bots today, and them reporting 403s on pages because I’ve block a site to only be accessible from one country (South Africa). And after an online chat with FLS, there was no real resolve to my issue, so I started doing some research and came across this topic/ticket/article.

    Any way to resolve the allowing bots issue as yet from a Defender settings perspective, as this article is kinda dated 2 years ago?

  • Luigi Di Benedetto
    • Staff

    Hey there André

    I hope you’re having a great day.

    To clarify, the information previously discussed is still valid: the user-agent allowlist in Defender does not currently override geoblocking. This improvement remains on our internal roadmap. In the meantime, as my colleague recommended two years ago, adding the necessary IPs to the allowlist will override geoblocking. This method remains effective, so you can continue to use it as an alternative until we release an update that prioritizes user-agent allowlists over geoblocking.

    I hope this helps. If you need further assistance, please let me know.

    Best regards,
    Luigi.

  • André
    • André Bothma

    Hi there, Luigi.

    Thanks for the reply!
    The process however does become a bit tedious, as Google has a list of about 1975 IPs used by their crawlers and bots, but its impossible to add Meta’s bots IP’s to an allow list. :smirk:

    So please add my upvote to this development, for what it may be worth.

    Meta (formerly Facebook) uses a very large, dynamic, and frequently changing set of IP addresses for its various crawlers and fetchers (e.g.,
    facebookexternalhit, meta-externalagent, meta-externalfetcher). While the exact total number of active IP addresses changes, they operate from large network blocks (AS32934) that, if fully utilized, could encompass tens of thousands to over 100,000+ individual IP addresses, including both IPv4 and IPv6.
    Key Details on Meta Crawler IP Ranges:

    Active Blocks: Meta uses numerous large CIDR blocks, including ranges like 31.13.64.0/18, 69.171.224.0/20, and 173.252.64.0/18.
    IPv6: They utilize wide IPv6 ranges, such as 2a03:2880::/32, which can cover an astronomical number of possible addresses.
    Rotation: These IPs are not static and are frequently updated or rotated, making fixed IP allowlists difficult to maintain.
    Detection: The most reliable way to identify Meta crawlers is by their user-agent string (facebookexternalhit or meta-externalagent) or by verifying their IP addresses against their official published network lists.