I’ve been doing quite a bit of scraping, not using Morph I’ll admit.
I just wanted to throw this topic open, on the ethics of scraping.
I generally think it’s good to pad the requests to a single server if you’re scraping several pages.
Otherwise you’re going to potentially slow it down.
I just wondered if people were identifying themselves in their scrapers?
This article presents a compelling argument, that if we want governments to be open and transparent, then our scraping should be transparent also.
Therefore, the article suggests:
- Identifying yourself in the header (using Python Requests in my case).
- Checking the robot.txt for permission.
I’m of two minds about this, since it should be up to me how I choose to consume HTML.
Do others have thoughts on these issues?