A few weeks ago I wrote about the Importance of using User Agents when we scrap data, and my examples shows the response from Twitter when we used the correct User Agent. This time I want to do the same but Facebook. We gonna scrape the posts in users profiles, Facebook pages and groups.
What we gonna get?
A list of items with the next values:
params | description |
---|---|
published | Formatted DateTime of published |
description | Post text content |
images | List of images in posts |
post_url | The unique post URL |
external_links | External links found in the description |
like_url | The Like URL |
Let’s start
- Get Python (recommended Python 3.7+)
2. Clone or download this repository
git clone https://github.com/adeoy/FacebookPostsScraper.git
3. Install the project requirements
pip install -r requirements.txt
Lets explain
First of all, all the code is in my Github repository https://github.com/adeoy/FacebookPostsScraper
- We create a requests session.
- Set a User Agent of and old Nokia C3 phone to the requests session (Nokia C3 gives me better results during the scraping than other phones).
- Check if we have a session cookie saved in our computer, if not, then login to Facebook with email and password and save the session cookie in our computer (we need to log because our friends private profiles can’t be scraped without auth).
- Request a profile and scrape the posts using BeautifulSoup and CSS selectors.
- Return the results.
- Have fun 🙂
I already made a class manage all the processes, first, we need to instantiate an object of FacebookPostsScraper, pass of email and password, and optionally if your Facebook account isn’t in English you need to set the Text in the URL that opens a Post that only appears in the Facebook mobile version. Don’t worry if you don’t understand, I will respond for you if ask me in the comments the language what you need. BTW, these are for English and Spanish:
- English: ‘Full Story’
- Spanish: ‘Historia completa’
Once you instantiate an object, in the process, the class automatically logs to Facebook and prepares the session for the requests. Now you can call the method get_posts_from_profile
and pass a Facebook profile URL to get the posts.
Edit June 27th, 2020. Now you can export the scraped posts to CSV, Excel, and JSON. See the end of the examples to check out.
Examples
Example with single url
from FacebookPostsScraper import FacebookPostsScraper as Fps
from pprint import pprint as pp
# Enter your Facebook email and password
email = 'YOUR_EMAIL'
password = 'YOUR_PASWORD'
# Instantiate an object
fps = Fps(email, password, post_url_text='Full Story')
# Example with single profile
single_profile = 'https://www.facebook.com/BillGates'
data = fps.get_posts_from_profile(single_profile)
pp(data)
fps.posts_to_csv('my_posts') # You can export the posts as CSV document
# fps.posts_to_excel('my_posts') # You can export the posts as Excel document
# fps.posts_to_json('my_posts') # You can export the posts as JSON document
Example with multiple urls
from FacebookPostsScraper import FacebookPostsScraper as Fps
from pprint import pprint as pp
# Enter your Facebook email and password
email = 'YOUR_EMAIL'
password = 'YOUR_PASWORD'
# Instantiate an object
fps = Fps(email, password, post_url_text='Full Story')
# Example with multiple profiles
profiles = [
'https://www.facebook.com/zuck', # User profile
'https://www.facebook.com/thepracticaldev', # Facebook page
'https://www.facebook.com/groups/python' # Facebook group
]
data = fps.get_posts_from_list(profiles)
pp(data)
fps.posts_to_csv('my_posts') # You can export the posts as CSV document
# fps.posts_to_excel('my_posts') # You can export the posts as Excel document
# fps.posts_to_json('my_posts') # You can export the posts as JSON document
Final thoughts
I also recommend you to check out this book where I learned some cool tricks to make web scraping: https://amzn.to/3umlGuc.
Please be free to asking anything you want in the comments section.