RSS Scraping
RSS scraping is where a website owner takes an RSS feed and places it on their website without attribution or a link back to the original author. Some people feel any RSS placed on a website is scraping.
Have you ever had an RSS feed scraped? What did you do about it? How much content were you delivering via the feed? Did you change how you sent your posts after you found the scraper?
If you've never had a feed scraped, what would you do if you found your site scraped like that? Is RSS scraping something that should be allowed? What do you think of RSS scraping?
RSS Feed Scraping Tools
Here are 3 tools you can use, depending on the circumstances, to scrape an RSS feed from a page that doesn’t publish one
FeedFire
FeedFire was the first of these tools I discovered several years ago and I still use it on occaision today. You enter a web page URL and it delivers any new link-text appearing there over time in the form of an RSS feed.
There’s not a whole lot you can do with it and it’s a little messy, but it’s really easy and very fast to use. If you’ve got a simple page with headline links, say a company’s old-fashioned “news” page with links to PDF press releases for SEO or something – FeedFire is perfect. You’ll get every link on the page in your feed, but once you mark the extra links (like to the home page) as read you’ll forget it ever happened. Still, I’d keep this for personal use only if possible. It’s quick and dirty.
FeedYes
I used FeedYes in assembling an OPML file for a client recently and it worked great. It’s a joy to use, in fact. Like FeedFire, FeedYes picks all the links out of a page – but then it asks you to click on the first one on the list that’s useful and on the last one. In this way it determines which link fields on a page to track, instead of tracking them all. Very nice. It’s a touch harder to use, but not really.
Feed43
This is what I used for that and so far it looks like it worked great.
Feed43 is awesome. It displays the source code for any page you tell it to look at, then lets you identify any part of that code as the begining and end of an RSS item. For example, if you want to scrape and there are no links but author’s name is in bold. I told Feed43 that items for a feed start when the bold tag closes and end when the next open bold tag appears. That worked great. The other fields are a touch confusing, but I got them figured out. The help pop-ups are only marginally useful. Once you get it down, though, Feed43 is no problem.
The advanced features are great. You can export all your feeds in OPML and you can password protect your feeds.
Let me know if you have any questions........