Analytics

Is your website traffic real, or bots?

One overseas datacenter was sending us almost as many visitors as the entire United States. None of them were real. Here is how we caught it, and the two-minute check you can run on your own site.

CitrusWeb Team

5 min read

Some of the traffic in your analytics is not people. It is bots: scripts, scrapers, and scanners running out of datacenters. On many sites they are a large share of recorded visitors, and they inflate your numbers until the reports stop matching reality. The good news: you can prove it in about two minutes, and you can block the bad bots without losing the good ones.

We know because it happened to a site we run.

What we saw

We were reviewing Google Analytics for one of our own sites and a single country jumped out. Singapore was the number two country by users, almost tied with the United States. This is a US business with no Singapore audience. That is a red flag, not a growth signal.

The two-minute test you can run yourself

Real visitors engage with a page. Bots load it, fire your tracking script, then leave in under a second. That gap is the tell. In Google Analytics, put two reports side by side:

Top countries by users. Note which countries send the most visitors.

Average engagement time by country. Note how long each country actually stays.

Then compare. A country that ranks high by user count but sits near the bottom, or is missing, on engagement time is almost certainly bots. In our case Singapore was second by users and did not make the top ten by engagement time. Its sessions averaged close to zero seconds.

A country that sends thousands of visitors who each stay zero seconds is not an audience. It is noise.

What the bot traffic actually was

Reading the raw requests at the edge made it obvious:

A single overseas server sending around 3,000 requests in one day using curl, a command-line tool. No person browses with curl.

Scrapers harvesting content from cheap cloud providers.

Vulnerability scanners probing for WordPress login pages and config files, on a site that does not run WordPress.

None of it was a customer. All of it counted as traffic.

Do not block the good bots

There is one category of bot you want: the AI crawlers behind ChatGPT, Perplexity, Google's AI answers, and Claude. Getting cited by them matters.

Those crawlers identify themselves and reach sites from known US datacenters, and they were not part of the junk. So the fix is not a blunt block. It is a rule that lets the good crawlers through first, then filters the rest.

How the fix works

On a modern edge host you can act in minutes, in this order:

Allow the AI crawlers first, at the top of the rule stack, so nothing below can block them.

Block the noise: challenge scripted tools like curl, block regions with no legitimate audience and clear bot dominance, and block the worst repeat offenders by address.

Lean on the platform's built-in bot and DDoS protection instead of reinventing it.

Clean analytics is the foundation of every good decision. When four in ten recorded users are noise, you cannot see your real audience. Watching for this is the kind of signal Pulse, our SEO command center, is built to help agencies catch, and the check above is one anyone can run today.

Part two · Case study

Cleaning up our own analytics, start to finish

Here is the same story as a case study. It is a site we run, and we treat our own properties as the first test of anything we recommend to a client.

The challenge

Reporting is only useful if the numbers are real. On one of our own sites, Google Analytics showed Singapore almost tied with the United States for the top country. For a US business with no Singapore audience, that is noise dressed up as demand, and it quietly corrupts every report and every decision built on top of it.

The investigation

We proved it with two reads, not a hunch.

The engagement test. We put top countries by users next to average engagement time by country. Singapore was second by users and did not make the top ten by engagement time, averaging close to zero seconds. Real people do not behave that way.

The edge logs. We read the raw requests at the firewall. The heaviest source was a single overseas server sending around 3,000 requests in a day using curl, alongside scrapers and scanners probing for WordPress files on a site that does not run WordPress. None of it was a customer.

The fix

We built a layered firewall rule set, ordered so the first match wins: allow the verified AI crawlers first so they are never caught, then challenge scripted tools like curl, block the regions with no real audience, and block the worst repeat offenders by address. It went live the same afternoon, with no effect on real visitors and no effect on the AI crawlers we want.

The results

~38%

of one week’s recorded users were zero-engagement bots from a single datacenter

~0s

average engagement from the bot country, next to 36s from real US visitors

100%

of the AI crawlers that earn citations still allowed, on purpose

What it means for your site

Most owners never look at this. A traffic number climbs and it feels like progress, when a large share is bots inflating the count and probing the site. On CitrusWeb-managed sites we run this check every month, keep the AI crawlers that earn you citations, and block the scripts and scanners that do not. Clean numbers, real visitors, protected where it counts.

FAQ

How much of website traffic is bots? It varies by site, but a large share of automated traffic is common, especially from datacenters. On the site above, close to 40% of one week's users were zero-engagement bot sessions from a single overseas datacenter.

How can I tell if my website traffic is bots? Compare top countries by users against average engagement time by country in Google Analytics. A country high by user count but near zero on engagement time is almost certainly bots, not real visitors.

Will blocking bots hurt my SEO or AI search visibility? Not if you do it correctly. The crawlers that drive Google rankings and AI citations, such as Googlebot, GPTBot, PerplexityBot and ClaudeBot, identify themselves and should be allowed first, before any blocking rules.

Why is bot traffic a problem if it is just numbers? It corrupts reporting so decisions are based on fiction, it wastes hosting resources, and scanners probe for vulnerabilities. Removing it gives you accurate data and a smaller attack surface.

What is the difference between good bots and bad bots? Good bots are search and AI crawlers that index your content so people can find you. Bad bots are scrapers, spam bots and scanners that add no value. Welcome the first, block the second.

The takeaway

Before you trust a traffic number, check whether it engages. If a country sits near the top by users but near zero on engagement time, that is bots inflating your reports, not demand. Clean the data first, then decide.

Explore CitrusWeb Pulse