I would like to know how to scrape Google SERP in big project.

What I have:
  • Website application writen in PHP with simple user management
  • cUrl script which for 9 phrases scrape SERP every hour and fetch top 100 domains for each phrase saving them to db. It does 9(phrases)*10(pages)*24(hours)= 2160 requests per 24 hours. More accurately: at 10:00 1 request per max 6 seconds (3-6 sec freeze), finish cron process and wait for 11:00 and repeat it every hour. It uses cron. It worked good for last month and I didn't get banned and I believe it's very maximum what I can get before Google send me to hell.

What I want:

Expand concept of my cUrl script to multiple users.

Scenario: each user could get top 100 domains for each phrase he want per 1 hour. For example: user A set 3 phrases to analysis which gives 30 requests per hour for this user and user B set 8 phrases to analysis which gives overall 110 requests per hour which is impossible to handle that amout of requests for 1 IP without being punished by Google. I probably need to set up 1 proxy server for each user to get 1 unique IP on which I can get maximum 9 requests per hour. BUT EVEN THEN it's just bad to provide user to analyze only 9 phrases every hour.

The only desperate idea I see for now is to buy really a lot of hostings with cron and somehow "assign" for example 10 proxy servers (each with unique IP) to each hosting with cron. Let's say I have 1000 users so I need 100 hostings with cron and 10 unique IP for each hosting which should give 1000 unique IPs overall.

It would be nice to give every user possibility to expand his 9 analysis per hour for example to 80 or even 800 regardless of costs. I want to know how to implement my suggested user <=> IP idea or any solution that would be better. I have never had any experience with proxies and managing IPs.