Defeating BuzzMyFx Content Scrapers

burglar_smallMy next post was going to be one about WordPress issues, but then something else came up.  That post will still go live on Wednesday.  Right now, I want to talk to you about content thieves and scrapers.

We had a run-in with some content scrapers two years ago.  That scraper took the content, but left the image links intact.  At the time, I showed how to defeat that particular variety of scraper.  This scraper, however, was trickier.

I’m not sure what the purpose of “BuzzMyFx” is beyond content hijacking.  If you “check” to see if your site is scraped by them (by going to YourSiteName.buzzmyfx.com), you might see that your site isn’t being scraped.  However, your mere act of checking will CAUSE them to start scraping your site.  Scraped sites have all content redirected through their servers.  Images, Stylesheets, JavaScript files, and more all seem to pour through BuzzMyFx’s servers instead of yours.  What’s worse is that, since all links go to BuzzMyFx now, clicking on a link to another site causes that site’s content

It didn’t take long to deduce what was going on.  BuzzMyFx is a server side scraper.  Imagine someone coming to your site under normal circumstances.  They tell their browser to load “www.MyWebSite.com”.  The browser then contacts the server hosting your site asking for that page.  The server gives the page to the browser which shows it to you.  Simple, right?

BuzzMyFx adds an extra layer.  If you go to MyWebSite.BuzzMyFx.com, your browser goes to BuzzMyFx’s server first.  BuzzMyFx’s server then contacts your server (as if it was a browser) for the page.  Your server gives the page to the “BuzzMyFx browser” as it does to all other browsers requesting pages. BuzzMyFx then alters the page’s code to direct all links back to them.  They also add in their own StatCounter script and change ad code to give them the revenue instead of the site owner.  Finally, they give the changed version of the page to you.

Pretty scummy, right?  Of course, by doing this, they are committing massive copyright infringement at the very least.  At $750 – $150,000 per infringement, dozens of infringements per site scraped, and possibly hundreds of thousands of sites affected, this could land them on the hook for millions of dollars.  Then there are the problems encountered if they are using a trademarked logo/name without authorization.

So how do you stop them?

Thankfully, servers keep logs of every visit.  As you loaded this up to read this post, my server dutifully recorded information such as your IP address, where you were referred from, the current date and time, and what page you were loading up.  This happens at all websites you visit, but not all people know how to read the logs.  As a webmaster, I am well versed in reading server logs.

I loaded up their scraped version of my site while checking my server logs and there it was: 192.151.156.170.  That was the IP address doing the scraping.

Next, I opened up my “.htaccess” file.  This is a special file on your web site that controls who can access your site and what they can and can’t see.  I added the following lines at the beginning:

RewriteCond %{REMOTE_ADDR} ^192\.151\.156\.170$
RewriteCond %{REQUEST_URI} !/content-thief.html
RewriteRule ^(.*)$ /content-thief.html [R,L]

Finally, I created a simple HTML page called “content-thief.html” with big, bold, red letters warning people that this was a scraped site and they should go to my real site.  (I didn’t link to my real site since the link would be altered, so I just spelled it out.)  You can go ahead and copy my “content-thief.html” page for your own usage.  Just be sure to change the site name to your own.

Unfortunately, BuzzMyFx has already cached some of my content, so the main page of my “BuzzMyFx-ed” site doesn’t show this warning.  Still, as their content expires and their server tries to grab the new content, it will be replaced by my warning.  (I went easy on them.  My initial reaction was to redirect them to some hard core pornography.  I didn’t want my name linked with that though.)

The other problem is that they can change their IP address which will let them bypass this rule.  I can add their new IP address in, but it will be a constant effort to keep up with them.  Perhaps the best remedy would be for all affected site owners to contact the people who run this “service.”  Unfortunately, they’ve hidden who they are from WHOIS, but they can’t hide two things:  1) Their domain name is registered from eNom and 2) Their site is hosted by CloudFlare.com DataShack.net.  If we can’t get them to stop, we can always get their hosting and domain name cut off.

Here’s hoping this scraper menace ends soon so we can all get back to producing great content instead of trying to protect our content from being scraped.

UPDATE:  CloudFlare.com is denying being their host.  As Heather commented below, they say they are a “reverse proxy, pass-through security service.”  I’m guessing that BuzzMyFx is using CloudFlare to hide their server’s real IP address.  However, the IP address I obtained that was seizing my content (192.151.156.170) isn’t “hidden” at all.  That IP address comes from DataShack.net.  So focus communication on them, not CloudFlare.

UPDATE #2:  If you aren’t technically inclined enough to know how to fiddle with htaccess and/or FTP files to your server, but you are using WordPress, you can also use the WP-Ban plugin to keep them off your site.  This plugin lets you list IP addresses and even leave a specific message for those IP addresses to see.

UPDATE #3: According to Lazy Budget Chef, even if you manage to contact BuzzMyFx, they will try to sell you a domain protection package to “steal the blogger’s legal right to their blog, their log in credentials, mailing list, and other personal information.”  So even if you manage to contact these scrapers, don’t sign anything they give you!  You shouldn’t need to sign some form of contract for them to cease scraping – they should just stop.  Be very wary of these people.

UPDATE #4: It looks like we’ve won this battle.  BuzzMyFx seems to be down.  They could still flee to another hosting provider (or even the same one signed up under a different account) and start their service back up.  Even if they don’t come back, I’m sure other scrapers will take BuzzMyFx’s place.  Still, you need to take each victory as it comes.  Congratulations and thanks for helping take down this scraper, everyone!

NOTE: The “burglar” image above is by tzunghaor and is available from OpenClipArt.org.

54 comments

  • I’d go to CloudFlare and send them this message: (Altered to fit your site, mind you.)

    My name is (FILL IN) and I am the President/Owner of (WEBSITE/BUSINESS). A website that your company hosts (according to WHOIS information) is infringing on at least one copyright owned by my company.

    An entire website was copied onto your servers without permission. The original website, to which we own the exclusive copyrights, can be found at:

    (YOUR WEBSITE URL)

    The unauthorized and infringing copy can be found at:

    (THE SCRAPED SITE/URL)

    This letter is official notification under Section 512(c) of the Digital Millennium Copyright Act (”DMCA”), and I seek the removal of the aforementioned infringing material from your servers. I request that you immediately notify the infringer of this notice and inform them of their duty to remove the infringing material immediately, and notify them to cease any further posting of infringing material to your server in the future.

    Please also be advised that law requires you, as a service provider, to remove or disable access to the infringing materials upon receiving this notice. Under US law a service provider, such as yourself, enjoys immunity from a copyright lawsuit provided that you act with deliberate speed to investigate and rectify ongoing copyright infringement. If service providers do not investigate and remove or disable the infringing material this immunity is lost. Therefore, in order for you to remain immune from a copyright infringement action you will need to investigate and ultimately remove or otherwise disable the infringing material from your servers with all due speed should the direct infringer, your client, not comply immediately.

    I am providing this notice in good faith and with the reasonable belief that rights my company owns are being infringed. Under penalty of perjury I certify that the information contained in the notification is both true and accurate, and I have the authority to act on behalf of the owner of the copyright(s) involved.

    Should you wish to discuss this with me please contact me directly.

    Thank you.

    (YOUR NAME)
    (YOUR ADDRESS)
    (YOUR PHONE NUMBER)
    (YOUR EMAIL ADDRESS)

    • TechyDad

      Thanks. Everyone who had their content scraped should definitely file this DMCA report.

        • Apparently this won’t help though. I did it and almost immediately got this reply in my email:

          CloudFlare received your abuse report dated January 20, 2014 regarding:
          buzzmyfx (dot) com

          Please be aware CloudFlare is a network provider offering a reverse proxy, pass-through security service. We are not a hosting provider. CloudFlare does not control the content of our customers.

          We are unable to process your report for the following reason(s):
          URL(s) do not resolve to CloudFlare IP addresses.

          Please reply to this message, keeping the report identification number in the subject line intact, with the required information.

          Regards,

          CloudFlare Abuse

      • This is what CloudFlare said in response to my DMCA

        CloudFlare received your abuse report dated January 20, 2014 regarding:
        buzzmyfx.com

        Please be aware CloudFlare is a network provider offering a reverse proxy, pass-through security service. We are not a hosting provider. CloudFlare does not control the content of our customers.

        We are unable to process your report for the following reason(s):
        URL(s) do not resolve to CloudFlare IP addresses.

        Please reply to this message, keeping the report identification number in the subject line intact, with the required information.

        Regards,

        CloudFlare Abuse

    • EdD

      CloudFlare is still serving the pages as cached snapshots as part of their service. These snapshots are from CloudFlare servers.

  • I have a question – how do you know if your site has been scraped without going to that site?

    • TechyDad

      Unfortunately, there doesn’t seem to be a way. You could Google “site:YourBlogName.BuzzMyFx.com” but that’s not a guarantee. Right now “site:TechyDad.BuzzMyFx.com” shows no results, but my site was scraped. It seems the only way to be sure is to try going to YourBlogName.BuzzMyFx.com, but doing this causes your site to be scraped. B tried this with my blog. It first said “Site not found!” but soon afterwards showed my site fully.

  • Thanks for sharing these details – I just found out that both of my blogs were scraped (and showed up before I checked) – I reported to the Google webmaster on each and will also work with my server to see what can be done.

  • off to email cloudflare- thanks so much for this. super helpful

  • UGH! I see my site has been scraped, too – I’m not quite sure where to go to see the IP address, or would it be the same one you found? And may I have permission to use the verbiage you used for your ‘warning page’ to create a similar one for my site? 🙂

    • TechyDad

      Definitely. You can copy my “Content Thief” page word for word. (Replacing my site’s name with your own site’s, of course.)

  • Ok I’m not a techie person– so where do I find the htaccess” file? I’m so confussed but I want to fix it as soon as possible

  • HOw do I create an html page? My site was scrapped so I’m working to figure this out asap!

    • TechyDad

      Go to http://www.TechyDad.com/content-thief.html and save that page locally. Next, open that page in Notepad and replace all references to “TechyDad” with your own site’s name. Save it an upload it. Alternatively, if you don’t know how to upload files/edit htaccess, but are using WordPress you can install the WP-Ban plugin ( http://wordpress.org/plugins/wp-ban/ ) to keep them off your site.

      • So I did this, but when you go to my scrapped site, it doesn’t show this error. Just my scrapped site

        • Well Hello Katie!!

          Anyhoo, I did the WP-Ban thing too, and it isn’t showing the error or anything, just the scrapped site. So I don’t know if the WP-Ban plugin does this.

          • TechyDad

            Unfortunately, BuzzMyFx caches their pulled content. (If you go to the TechyDad dot buzzmyfx dot com page, you’ll see my site as it was last night.) I’m not sure how long they keep their cache for. The WP-Ban or HTAccess changes can’t stop them from using their cached content, but it can prevent them from grabbing new content.

      • I’ve been hit and have just downloaded the WP-Ban as I’m not very techie and this is way over my head. What IP address, if any, do I add in or do I just leave it blank?

        • TechyDad

          As far as BuzzMyFx goes, you don’t need to do anything anymore as they’ve been shut down. In general, though, go in your WordPress admin panel to Settings->Ban. Enter the IPs you want to ban in the Banned IPs box and your message in the Banned Message box. Click Save Changes to save this.

  • This is scary stuff! Doing the google search didn’t show anything for my site, thankfully, but of course, it could be scraped anyway and I seriously do not want to check! If and when this scraper gets the boot, is there anything that needs to be done to get sites working like normal, or will the code automatically revert when it’s no longer being redirected?

    • TechyDad

      They aren’t altering your actual site, but a copy that they are making of your site. Once they are shut down, the copies will go away. Your site won’t be affected at all (except that you won’t have to worry about them stealing your content).

  • So, in a nutshell, this is almost futile to try to stop if they already have our site logged? I mean, they can rotate IP addresses daily, right? So wouldn’t the best course of action be to simply file DMCA on them? This is frustrating, to say the least! 🙁

  • Ok..so our site has been taken as well. Can you explain the revenue issues a little more. That part is confusing to me. How are they stealing our revenue and how can we see that on our ads dashboards? Thanks so much!

    • TechyDad

      When they grab your content, they don’t just show it to the person who requested it. They change it first. One of those changes might be taking out your ad code and replacing it with their own. This way they get the ad revenue instead of you.

  • Deb

    Thank you so much for this very helpful information!

  • Cloudflare isn’t their host

  • My site’s been stolen as well. I found that their host is actually datashack. The best option I could find to report copyright infringement is to email the DCMA letter to: security@datashack.net, in addition to contacting enom. I hope you’ll update your post so everyone floods them with notices!

  • Thanks so much for the updates! This is so scary to a person who doesn’t know much about the tech side to blogging. Headed over to add the plugin and email datashack

  • Our site was stolen as well. I ended up sending a notice not only to Datashack, but their DNS prover as well – eNom (abuse@enom.com). Let’s hit this scraper with all we have everyone!

  • Leslie Harris

    Okay – so I found the range of IP addresses for Datashack – If I put them the in the WP-BAN plugin won’t that keep out innocent people?

    Here’s the notice I posted with the plugin.

    Site Scrapers Beware: You Are Banned from Our Website. If you’re not a site scraper and want to have access – please go to your service provider http://www.datashack.net and tell them your IP address is in the range of site scraper sites that they are hosting and you are being penalized. Want to update us or reach us – please mail us at ….

    Is that overkill?

    • laura

      Do we have a complete list of the IP addresses to ban? Or is it just that one listed in the article? I’m going to do the WP plugin but want to make sure I know all of the addresses to block.

      • Leslie Harris

        I pulled the whole range of their IP addresses for the hosting site from domaintools.com and posted with the caveat that all sites with that range are being penalized because of the scraper site.

        • TechyDad

          The problem is that DataShack is a hosting provider. They can be hosting legitimate sites as well as sites like BuzzMyFx. Hopefully, BuzzMyFx-style sites are a minority that fly under the radar for as long as they can before getting shut down. Meanwhile, people hosting legitimate sites might have issues connecting with your site for legitimate purposes simply because someone (that they’ve never heard of) on another server with the same host did something bad. Personally, I’d stick with the one IP address and add to it as needed to keep the collateral damage to a minimum.

          • Leslie Harris

            Understood which is why i have provided our email address to contact if they need to. I can’t imagine a legitimate site needing to connect with my site without me knowing?

            Personally – I think this puts more pressure on the organization that is allowing this scraped content to exist to take action.

            🙂

  • Is it just a good idea to post what you posted anyway even if we haven’t been scraped, to let people know in case we are? Or is that just a waste of time? Thank you so much for helping us and being patient with our ignorance.

    • TechyDad

      If you’re asking if it’s a good idea to spread the word: Of course. The more people who know, the better. I’d love it if some of the big sites grabbed like MSNBC or Babble found out. Let them unleash their legal departments on BuzzMyFx and pass the popcorn!

      If you’re asking if it’s a good idea to file a DMCA report if you don’t know for sure that your site has been scraped, I’d be a little less enthusiastic. If you don’t know for sure that your site was scraped, I’d hesitate to make a potentially false DMCA report. However, you could write to security@datashack.net (thanks to Kelly for that e-mail) notifying them that you noticed that BuzzMyFx is committing massive copyright infringement. You could even give some examples of sites stolen. (Google search for “site:buzzmtfx.com” without the quotes.)

  • Also the heat has been put on eNom’s Facebook:

    But as an update looks like this site has been shut down…. for now HURRAY!

    Thank you so much for giving us all the info we need to help take care of this. While this is my first time on your site I WILL be back and bookmarking now. Thank you so much!

  • Pam

    As a totally non techy person with a stolen site, I am going to add the plug in. Thanks so much for sharing it.

  • Don’t wanna bust anyone’s bubble but here’s another Scraper site that’s stealing just about EVERYON’S content.
    http://4sponsor.com/

    I believe they’re hosted at GoDaddy and their emails are service@4sponsor.com, teresa@business.4sponsor.com and report@4sponsor.com

    I’ve sent a DMCA and my content on their site. You can read my FB post for more information: https://www.facebook.com/photo.php?fbid=10152138084288901&set=a.333042083900.151701.315172733900&type=1&stream_ref=10

    Thanks,
    Leslie

  • Great post. I also did the same thing last night for our site. I blocked the source IP from datashack at our firewall and also via cloudflare, which we use for our content delivery network. We also filed a DMCA request to Datashack, which is located in North Kansas City, MO. Unfortunately, the buzzmyfx.com domain seems to be protected by whoisguard which is located in the country of Panama. By the way, if you’re using wordpress and using WP-Ban, remember to add the source IP address into your ban list. You can get the IP address by going a ping or nslookup of the hostname yoursite.buzzmyfx.com or using a site such as http://www.kloth.net/services/nslookup.php

  • Thanks,

    I’ve sent an email to my host asking them if they can please do this since I have no idea how to. I made the image and sent it to them. GREAT idea. I am so darned fed up. Thanks TechyDad.

    Leslie

  • Mark

    If you are not on shared hosting, it is far easier to just use firewall rules to block them.

    I did this for my wife’s site: iptables -A INPUT -p tcp -s 192.151.156.170 -j REJECT

    -M

  • EdD

    As an FYI, CloudFlare servers are still hosting a cached version of buzzmyfx.com. They can’t hide behind the excuse that they are not the hosting provider for these cached sites.