Most beginner scrapers send a request, get blocked, and blame the website. The real culprit is almost always the same: a missing or default User Agent. That single HTTP header tells servers what kind of client you are — and if you announce yourself as python-requests/2.x, many sites will refuse to serve real content.
This post explains what a User Agent is, why it has such a strong effect on scraping, and shows three working Python examples comparing desktop, mobile, and feature-phone responses from a real site.
What is a User Agent?
The User Agent is a plain-text string sent in the HTTP request headers that identifies the device, OS, and browser making the request. Servers use it to decide which version of a page to serve: a mobile-optimized layout for phones, a heavier JavaScript SPA for modern desktop browsers, a stripped-down HTML for older devices.
Because the header is plain text, it’s trivial to manipulate. Whatever value you send, the server treats as truth. That’s exactly what makes it so useful for scraping.
Why your scraper needs one
If you don’t set a User Agent, libraries fall back to defaults like python-requests/2.31.0 or curl/8.x — both obvious bot signatures. Many sites block them outright (HTTP 403, captcha walls, or empty bodies).
When to use this: always. Even if a site doesn’t block you today, sending a realistic User Agent is the lowest-effort improvement you can make to a scraper. Pair it with rotation across a small pool to mimic organic traffic patterns.
Setup: dependencies
pip install requests beautifulsoup4 lxmlExample 1 — Desktop User Agent
A modern desktop User Agent string. Many sites respond with the heavy JavaScript SPA version — which means requests alone won’t render the actual content:
import requests
from bs4 import BeautifulSoup
desktop_ua = (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/124.0.0.0 Safari/537.36"
)
headers = {"User-Agent": desktop_ua}
resp = requests.get("https://twitter.com/billgates", headers=headers)
if resp.status_code == 200:
soup = BeautifulSoup(resp.text, "lxml")
print(soup.prettify()[:2000])
else:
print(f"Error: {resp.status_code}")You’ll see a <noscript> form telling you to enable JavaScript — Twitter (now X) loads its timeline via JS, and requests doesn’t execute scripts.
Example 2 — Smartphone User Agent
smartphone_ua = (
"Mozilla/5.0 (Linux; Android 14; Pixel 8) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/124.0.0.0 Mobile Safari/537.36"
)
headers = {"User-Agent": smartphone_ua}
resp = requests.get("https://twitter.com/billgates", headers=headers)Modern mobile sites also rely on JS, so the response looks similar to desktop. Mobile UAs help mostly on older sites or those that maintain a separate m. subdomain.
Example 3 — Feature-phone User Agent
This is where it gets interesting. Older feature-phone UAs trigger the lightweight, JS-free HTML version of many social sites — perfect for raw HTML scraping with requests:
old_phone_ua = (
"Nokia5310XpressMusic_CMCC/2.0 (10.10) Profile/MIDP-2.1 "
"Configuration/CLDC-1.1 UCWEB/2.0 (Java; U; MIDP-2.0; en-US; "
"Nokia5310XpressMusic) U2/1.0.0 UCBrowser/9.5.0.449 U2/1.0.0 Mobile"
)
headers = {"User-Agent": old_phone_ua}
resp = requests.get("https://twitter.com/billgates", headers=headers)The response includes actual tweet content embedded in plain HTML — no JS required. This is the kind of “User Agent leverage” that turns a scraping problem from impossible without a headless browser into three lines with requests.
Comparison: which User Agent for which job?
| Use case | Recommended UA type | Why |
|---|---|---|
| General scraping | Modern desktop Chrome/Firefox | Most realistic; lowest block rate |
| Mobile-only sites | Recent Android/iOS | Triggers mobile layout |
| JS-heavy sites without a headless browser | Feature-phone or old smartphone | Forces lightweight HTML version |
| Search engine respect | Mozilla/5.0 (compatible; MyBot/1.0; +https://your-site.example/bot) | Honest, identifies your scraper |
Best practices
- Keep User Agents fresh. Browser versions roll out monthly. A UA from 2019 is itself a bot signal.
- Rotate across a small pool (5–10 UAs) instead of using one fixed string.
- Match other headers:
Accept,Accept-Language,Accept-Encoding. A request with a Chrome UA but missing those headers looks suspicious. - Respect
robots.txt. The legality and ethics of scraping vary by site and jurisdiction. - For aggressive anti-bot sites (Cloudflare, DataDome, PerimeterX), User Agent alone won’t be enough — you’ll need a real headless browser like Playwright.
Final thoughts
The User Agent header is one of the cheapest and highest-leverage things to get right when scraping. Most beginner blocks vanish after switching from the default to a realistic desktop User Agent — and a few clever choices (like feature-phone UAs) can unlock content that would otherwise require a full headless browser.
For a complete scraping example, see Python Facebook Posts Scraper with Requests and BeautifulSoup4. Once your data is collected, you may want to expose it as an API — How to get more exposure for your API.