Feedly Was Stealing Your Content -- Here's the Story, And Their Code

Last week, Feedly rolled out a controversial new "feature" -- hijacking feed links to steal traffic from millions of bloggers.

Redirecting shared links with Feedly to Feedly's own view of the article instead of the article itself on the original site is a concern for original content creators on many blogs. It not only results in loss of traffic, but is also deceptive for those who follow a particular blog.

Here's the full story of why people are angry, and how one blogger helped to right the situation. I also delve into their source code to show you just how dirty their little tricks are.

Credit due: The Digital Reader was the original source for this news -- I just decided to investigate a little further and see exactly what they were up to.

First, The Good News

At the time of writing, the behaviour has been somewhat corrected so that shortened Feedly links are indeed being sent to the originators site, but a quick examination of the HTTP status code revealed that the redirect wasn't being done in the typical server level way with a 301 or 302 redirect (200, which Feedly is sending, means "yep, we've got that page, hold on"; 404 means “not found"; 301 means "permanently redirecting to another URL; while 302 means "temporary redirection").

This meant the redirection was being performed in JavaScript, so I wanted to know more. Using a command line webpage fetching tool called curl, I was able to grab the source code of a sample Feedly link to Techmeme.com before the redirection occurred (since CURL won’t execute JavaScript) -- and it revealed some surprising tidbits. Here's what I found.

(I've’ve uploaded the full source here if you’d like to take a look -- I’m only featuring some interesting snippets below)

Some people were worried about the SEO implications of basically having their content stolen and re-published elsewhere; the good news is that Feedly correctly set the rel=“canonical” meta tag to instruct Google that all link values should be passed onto the original site. However, it's impossible to ascertain if this was added after complaints began or was present from the start.

        	
	<link rel="canonical" href="http://www.techmeme.com/131202/p30#a131202p30" />

They're Stripping Ads

In what was probably a misguided attempt at duplicating a Readability type functionality, which strips a page down to it’s core essentials, Feedly was stripping all advertising, tracking, and social share buttons that may have been embedded in the original feed item. Here's the full list of things being stripped out:

        
var visualExcludePatterns = [ "feedproxy","feedburner","/~","feeds.wordpress.com","stats.wordpress.com","googleadservices.com","feedads","tweet-this", "fmpub","-ads","_ads","pheedo","zemanta","u.npr.org/iserver","openx.org","slashdot-it","smilies","/ico-","commindo-media.de","creatives.commindo-media","doubleclick.net","i.techcrunch","adview","/feed.gif",".ads.","/avw.php”,"wp-digg-this","feed-injector","/plugins/","tweetmeme.com","_icon_","/ad-","share-buttons","feedsportal.com","buysellads",
"holstee","musictapp","/ad_","/button/","donate.png","/sponsors/","googlesyndication.com","/pagead","/adx","assets/feed-fb","assets/feed-tw","feedburner.com/~ff","gstatic.com","feedsportal.com"];

Taking out a "donate" button seems particularly galling, for some reason.

They’re Hijacking Links

Here we come to the most serious point, for not only were Feedly scraping the content from your site, they were then stripping any original social buttons and rewriting the meta-data. This means that when someone subsequently shared the item, they would in fact be sharing the Feedly link and not the original post. Anyone clicking on that link would go straight to Feedly.

Screenshot of scraped content from TheDigitalReader

So what, you might ask? When a post goes viral, it can be of huge benefit to the site in question -- raising page views and ad revenues, and expanding their audience. Feedly was outright stealing that specific benefit away from the site to expand it's own user base. The Feedly code included checks for mobile devices that would direct the users to the relevant appstore page.

        
function action( where )
{
var actionName = "follow";
var url = "http://feedly.com/#" + encodeURIComponent( "subscription/" + feedInfo.id );
if( /iPhone|iPad/i.test( navigator.userAgent ) )
{
actionName = "install";
url = "http://itunes.apple.com/us/app/feedly/id396069556";
}
else if( /android/i.test( navigator.userAgent ) )
{
actionName = "install";
url = "market://details?id=com.devhd.feedly";
}
        _gaq.push( [ '_trackEvent', bucket(), actionName + "." + where, feedInfo.id ] );
        window.setTimeout( function() { document.location.href = url;},  20 );
window.event.cancelBubble = true
window.event.stopPropagation();
window.event.preventDefault();
}

It wasn't "just making the article easier to view" -- it was stealing traffic, plain and simple. That’s really not cool.

Their First Fix: A Hardcoded Exclusion List

When The Digital Reader first complained to Feedly, their response was to re-code the Javascript to include an exclusion list. They literally added a check to every Feedly link to see if it was an item from The Digital Reader, and if so to bypass the page hijacking.

        
var siteExcludePatterns = [ "/TheDigitalReader/" ];
function shouldExcludeSite( url )

This is of course an absolutely ludicrous way of doing this -- were they planning on adding to that list as time went by and more bloggers complained?

Nate, from The Digital Reader responded:

where do you get off demanding that I opt out of your hijacking? It’s like saying that I should have to ask someone to stop hitting me in the face wallet. And yet you think that is reasonable?

Their Second Fix: A Quick Hack to Bypass All The Code

After what I can only assume was overwhelming numbers of complaints that followed, they adjusted the hijacking filter as follows:

        
if( kind == "partial" || shouldExcludeSite( "http://www.techmeme.com/131202/p30#a131202p30" ) || true )
{
document.body.innerHTML = "";
document.location.href = "http://www.techmeme.com/131202/p30#a131202p30";
}

"Partial" refers to the scraped content being a full or a partial feed -- there’s no point in hijacking feeds that only publish an excerpt after all. Presumably, this function began as the only check that occurred when choosing whether to send the user to the original site or not. You can see the first fix after that, which calls the function to check if this site is on the list of sites that have opted out; but then we see their final fix in place -

        
|| true.

If you have any programming experience, you’ll recognise the quick hack that says "the following code will always be run", and it’s usually used only in debugging. If any of those 3 conditions are true (the first two no longer matter), Feedly redirects the users instantly to the original site.

And that's where it stands now. So what have we learnt?

Basically, Feedly went about creating a kind of slimmed down reading experience, but the way they went about it -- rewriting links to propagate their own service through subsequent social shares was pretty damned disgusting. This isn’t the only bad move Feedly has made recently either - last month, they began requiring log in with Google+ accounts (having seen how well Google+ login is working for YouTube, I guess), but that too was quickly reverted. The lesson is -- you might want to start finding an alternative feedreader, unless you were already suckered into paying $99 for a Pro account.