Scraping Facebook posts with pure requests and BeautifulSoup4 sounds impossible — Facebook’s main site is one of the most JavaScript-heavy SPAs on the web. The trick is the same one that worked for scraping Twitter with the right User Agent: pretend to be an old feature phone, and Facebook will serve you a clean, lightweight HTML version that’s fully scrapeable.
This post walks through the open-source FacebookPostsScraper project — what fields it extracts, how the login flow works, how to export results, and important caveats about Facebook’s Terms of Service in 2026.
⚠️ Important: legal and ethical caveats
Before running any scraper against Facebook (or Meta properties), be aware that:
- Scraping Facebook violates their Terms of Service. Accounts have been suspended for it.
- Court decisions like hiQ Labs v. LinkedIn have evolved the legal landscape; consult a lawyer for commercial use.
- For most legitimate use cases, the official Meta Graph API is the right path. Use scraping only for personal research, archiving your own content, or learning purposes.
- Facebook’s HTML structure changes frequently — selectors that work today may break tomorrow.
With that out of the way, here’s how the scraper works.
What the scraper extracts
| Field | Description |
|---|---|
published | Formatted publish datetime |
description | Post text content |
images | List of image URLs in the post |
post_url | Unique post URL |
external_links | External links found in the description |
like_url | The Like-button URL |
How it works under the hood
- Creates a persistent
requests.Session(). - Sets the User Agent to an old Nokia C3 — Facebook’s feature-phone HTML response is the easiest version to parse.
- Checks for a saved session cookie locally; if missing, performs the email/password login flow and persists the cookie for next time.
- Fetches a profile/page/group URL and parses the timeline with BeautifulSoup4 CSS selectors.
- Returns structured Python dictionaries.
Login is required because most user profiles aren’t visible to logged-out clients. The scraper supports localized Facebook UIs via the post_url_text argument:
| Locale | post_url_text |
|---|---|
| English | 'Full Story' |
| Spanish | 'Historia completa' |
1. Setup
Use Python 3.10+ (the project was originally written for 3.7, but newer Python is fine).
git clone https://github.com/adeoy/FacebookPostsScraper.git
cd FacebookPostsScraper
pip install -r requirements.txt2. Scrape a single profile
from FacebookPostsScraper import FacebookPostsScraper as Fps
from pprint import pprint as pp
email = "YOUR_EMAIL"
password = "YOUR_PASSWORD"
fps = Fps(email, password, post_url_text="Full Story")
posts = fps.get_posts_from_profile("https://www.facebook.com/BillGates")
pp(posts)
fps.posts_to_csv("my_posts")
# fps.posts_to_excel("my_posts")
# fps.posts_to_json("my_posts")3. Scrape multiple sources at once
from FacebookPostsScraper import FacebookPostsScraper as Fps
from pprint import pprint as pp
fps = Fps("YOUR_EMAIL", "YOUR_PASSWORD", post_url_text="Full Story")
profiles = [
"https://www.facebook.com/zuck", # User profile
"https://www.facebook.com/thepracticaldev", # Page
"https://www.facebook.com/groups/python", # Group
]
posts = fps.get_posts_from_list(profiles)
pp(posts)
fps.posts_to_csv("my_posts")When to use this: exporting your own profile’s history, archiving public pages you administer, or as a learning exercise. For production analytics or commercial use, prefer the Meta Graph API.
Tips for keeping the scraper alive
- Throttle requests. Add
time.sleep(2–5)between calls. Bursts are the fastest way to get rate-limited. - Use a dedicated test account — never your main Facebook account.
- Persist cookies: log in once, reuse the session.
- Detect breakage: assert that key fields (
description,post_url) are non-empty in your output. When Facebook changes selectors, you want to know fast. - Consider Playwright if the feature-phone trick stops working — it’s heavier but resilient to layout changes.
Final thoughts
This project is a great example of how a clever User Agent choice can turn a “requires Selenium/Playwright” problem into something solvable with requests and BeautifulSoup4 alone. Just remember: be respectful of platforms, follow ToS where it matters, and prefer official APIs for anything beyond personal use.
The full source is on GitHub. For the User Agent technique that powers this scraper, see Why User Agents matter in web scraping. Once you have data, you might want to expose it as an API.