SEO failure: AmazonS3 + AngularJS + Many pages not being crawled or indexed

by Mark M   Last Updated October 06, 2016 09:01 AM

I've looked over various threads here but nothing that seems to be the same issue. We're currently facing two problems at the moment. If I can address the first problem then it would definitely help diagnose the second issue.

Also, I've posted this to google's webmaster forum but no replies yet.

Our stack:

AngularJS, HTML + SCSS, AmazonS3 as our "web server" but as you may know is not really a web server. We have a redirect rule on the bucket to prefix any URL with a hash bang so the site functions properly. CloudFront in front of our S3 bucket.

First problem:

The "Fetch as Google" tool is truncating any url starting with a #! (hash bang) making it difficult to know if any of these pages can be crawled by Google. If this is working for other sites then the problem might be that we're using AmazonS3 as our "web server." I've checked other threads here and it seems to be working for other people.

Second problem:

Google is only indexing two pages of your site offtherecord.com. Feel to search "site:offtherecord.com" in google.

  1. offtherecord.com
  2. offtherecord.com/how-it-works

For the "how-it-works" page, Google is able to crawl this which requires a hash bang to get the content to render on the browser thereby requiring execution of JS and it works! However, it just doesn't seem to be able to crawl and/or index any of the other pages.

Putting https://offtherecord.com/how-it-works in the "Fetch as Google" tool causes a 301 redirect as expected to #!/how-it-works, however if I try to follow it in the tool then it truncates everything after #! url.

I've checked the Google crawler stats page on the webmaster tool and there are no crawler errors.

Similar threads:

  1. Google not crawling AJAX content: https://productforums.google.com/forum/#!topic/webmasters/_pdC55wUvfI;context-place=topicsearchin/webmasters/hash$20bang

  2. AmazonS3 + AJAX content: [stack exchange only allows 2 links for me]

We have html5 mode enabled in our AngularJs app via $locationProvider.html5Mode(true).hashPrefix('!');

Please advise on how we can address #1 and #2. We're looking into actually having a real web server if this is hurting our accessibility for search engine crawlers.

Thank you for your time



Answers 1


It seems you have a lot going on.

Your robots.txt needs to be looked at. I will leave it at that. Then...

  • Your text sitemap needs to be html for the site and xml for the search engine, not plain text.
  • Submit that sitemap to search engines (you can create your sitemap without the #! and it will work.)
  • Create a Webmaster Tools account for Google > add site > verify site submit Sitemap > Fetch as Google > Submit to Index.

You can also add the sitemap via htaccess

RewriteEngine On
RewriteRule ^sitemap\.xml$ /path_to_sitemap [L]

Make sure mod_rewrite is enabled

norcal johnny
norcal johnny
October 05, 2016 23:39 PM

Related Questions


Google-SEO-friendly ajax site since Oct 2015

Updated April 05, 2016 08:01 AM


My knockout.js website is not crawled by google

Updated October 13, 2016 09:01 AM


how server can identify ajax request?

Updated September 23, 2016 09:01 AM