Recently I saw someone who created a bot for getting updates on rental apartments. This bot sends Telegram messages based on your parameters such as how many rooms, which city, elevator or not, etc.

I thought this was interesting and wanted to learn a bit about how one would do such a thing.
Quickly I learned that this is unethical and probably not legal and outside the terms of service. But I was still interested in the technical side.

One might think they can simply run some wget/curl/python requests or something similar, but this doesn’t give you the expected results when running against modern websites like Facebook that rely heavily on JavaScript.

Facebook and other sites are basically mini-applications running in your browser. When you visit Facebook, the initial HTML is pretty much empty - everything you see gets loaded by JavaScript after the page loads. So your simple HTTP request just gets you a skeleton page with none of the actual content you’re trying to scrape.

So maybe you’re thinking “okay, I’ll just use browser automation tools like Playwright, Puppeteer, or Selenium to solve this JavaScript problem.” But even if you overcome the dynamic loading, you’ll quickly run into the next wall.

Websites really don’t want you scraping them. They’ll check if you’re making requests too fast, if your browser fingerprint looks suspicious, or if you’re missing those subtle behavioral patterns that make you look human.

Some sites will even serve you completely different content if they suspect you’re a bot, or just throw a CAPTCHA at you to stop you dead in your tracks.

You might be sneaky and think - Let’s use Lambda or some cloud service so my IP will constantly change. Again, they have probably already faced this issue and you won’t even be able to send a GET request to Facebook from many cloud providers, as Facebook has simply blocked those known IP ranges.

Ok so this is clearly a difficult task, but googling around you can find businesses that seem to sell you the ability to scrape data from Facebook / LinkedIn and other popular websites that contain gold mines of personal information.

I stumbled across the “REDACTED” company which does exactly this. Looking at their website they advertise about having 150 million different residential IP addresses. - Wow, how does one get 150 million residential IPs? That would certainly make scraping much easier.

TLDR: Well the answer, after some digging around, is that some free VPN services, in addition to giving you “free” VPN access, are quietly turning your computer into their slave. They rent out your internet connection to companies who need residential IPs, effectively transforming your home computer into an Facebook stalker or LinkedIn data harvester.

You might have seen something similar when installing Adobe Reader in the past, it would auto-select and try to install McAfee on your computer. This is basically that same trick, but even dirtier.

Sadly, if something is too good to be true, it probably is.