-
Recent Posts
Recent Comments
Archives
- September 2019
- August 2019
- July 2019
- June 2019
- May 2019
- April 2019
- March 2019
- February 2019
- January 2019
- December 2018
- November 2018
- October 2018
- September 2018
- August 2018
- July 2018
- June 2018
- May 2018
- April 2018
- March 2018
- February 2018
- January 2018
- December 2017
- November 2017
- October 2017
- September 2017
- August 2017
- July 2017
- June 2017
- May 2017
- April 2017
- March 2017
- February 2017
- January 2017
- December 2016
- November 2016
- October 2016
- September 2016
- August 2016
- July 2016
- June 2016
- May 2016
- April 2016
- March 2016
- February 2016
- January 2016
- December 2015
- November 2015
- October 2015
- September 2015
- August 2015
- July 2015
- June 2015
- May 2015
- April 2015
- March 2015
- February 2015
- January 2015
- December 2014
- November 2014
- October 2014
- September 2014
- August 2014
- July 2014
- June 2014
- May 2014
- April 2014
- March 2014
- February 2014
- January 2014
- December 2013
- November 2013
- October 2013
- September 2013
- August 2013
- July 2013
- June 2013
- May 2013
- April 2013
- March 2013
- February 2013
- January 2013
- December 2012
- November 2012
- October 2012
- September 2012
- August 2012
- July 2012
- June 2012
- May 2012
- April 2012
- March 2012
- February 2012
- January 2012
- December 2011
- November 2011
- October 2011
- September 2011
- August 2011
- July 2011
- March 2011
- November 2010
Categories
Meta
Tag Archives: robots
How to Optimize Your WordPress Robots.txt
The post How to Optimize Your WordPress Robots.txt appeared first on HostGator Blog . What is a Robots.txt File? The robots.txt is a very small but important file located in the root directory of your website. It tells web crawlers (robots) which pages or directories can or cannot be crawled. The robots.txt file can be used to block search engine crawlers entirely or just restrict their access to certain areas of your website. Below, is an example of a very basic WordPress robots.txt file: This can look a little confusing at first so I will go over what some of this stuff means. User-agent: is there to specify directions to a specific robot. In this case we used “*” which applies to all robots. Disallow: is there to tell the robots what files and folders they should not crawl. Allow: tells a robot that it is okay to crawl a file in a folder that has been disallowed. Sitemap: is used to specify the location of your sitemap. There are other rules that can be used in the robots.txt file such as Host: and Crawl-delay: but these are uncommon and only used in specific situations. What is the Robots.txt File Used For? Every website that is crawled by Google has a crawl budget. Crawl budget is basically a limited number of pages that Google can crawl at any given time. You don’t want to waste your crawl budget on pages that are low quality, spammy or not important. This is where the robots.txt file comes in. You can use your robots.txt file to specify which pages, files and directories Google (and other search engines) should ignore. This will allow search engine bots to keep the priority on your important high-quality content. Below are some important things you might want to consider blocking on your WordPress website: Faceted navigation and session identifiers On-site duplicate content Soft error pages Hacked pages Infinite spaces and proxies Low quality and spam content This list comes straight from the Google Webmaster Central Blog . Wasting your crawl budget on pages like the ones listed above will reduce crawl activity on the pages that do actually have value. This can cause a significant delay in indexing the important content on your website. What You Should Not Use the Robots.txt For The robots.txt should not be used as a way to control what pages search engines index. If you’re trying to stop certain pages from being included in search engine results, you should use noindex tags or directives, or password-protect your page. The reason for this is because the robots.txt file does not actually tell search engines to not index content. It just tells them not to crawl it. While Google will not crawl disallowed areas from within your own website, they do state that if an external link points to a page that you have excluded, it may still get crawled and indexed. Is a Robots.txt File Required in WordPress? Having a robots.txt file for your WordPress website is certainly not required. Search engines will still crawl and index your website as they normally would. However, you will not be able to exclude any pages, files or folders that are unnecessarily draining your crawl budget. As I explained above this can greatly increase the amount of time it takes Google (and other search engines) to discover new and updated content on your website. So, all in all, I would say no a robots.txt file is not required for WordPress, but it’s definitely recommended. The real question here should be, “Why would you not want one?” How to Create a WordPress Robots.txt File Now that you know what a robots.txt is and what it is used for, we will take a look at how you can create one. There are three different methods and below I will go over each one. 1. Use a Plugin to Create the Robots.txt SEO plugins like Yoast have an option to create and edit your robots.txt file from within your WordPress dashboard. This is probably the easiest option. 2. Upload the Robots.txt Using FTP Another option is to just create the .txt file on your computer using notepad (or something similar) and name it robots.txt. You can then upload the file to the root directory of your website using an FTP (File Transfer Protocol) such as FileZilla . 3. Create the Robots.txt in cPanel If neither of the above options works for you, you can always log into your cPanel and create the file manually. Make sure you create the file inside your root directory. How to Optimize Your Robots.txt For WordPress So, what should be in your WordPress robots.txt? You might find this surprising, but not a whole lot. Below, I will explain why. Google (and other search engines) are constantly evolving and improving, so what used to be the best practice doesn’t necessarily work anymore. Nowadays Google not only fetches your websites HTML but it also fetches your CSS and JS files. For this reason, they do not like it when you block any files or folders needed to render a page. In the past it was ok to block things like the /wp-includes/ and /wp-content/ folders. This is no longer the case. An easy way to test this is by logging into your Google Webmaster Account and testing the live URL. If any resources are being blocked from Google Bot they will complain about it in the Page Resources tab. Below, I have put together an example robots.txt file that I think would be a great starting point for anyone using WordPress. User-agent: * # Block the entire wp-admin folder. Disallow: /wp-admin/ # Blocks referral links for affiliate programs. Disallow: /refer/ # Block any pages you think might be spammy. Disallow: /spammy-page/ # Block any pages that are duplicate content. Disallow: /duplicate-content-page/ # Block any low quality or unimportant pages. Disallow: /low-quality-page/ # Prevent soft 404 errors by blocking search pages. Disallow: /?s= # Allow the admin-ajax.php inside wp-admin. Allow: /wp-admin/admin-ajax.php # A link to your WordPress sitemap. Sitemap: https://example.com/sitemap_index.xml Some of the things I included in this file are just examples. If you don’t feel like any of your pages are duplicate, spammy or low quality you don’t have to add this part. This is just a guideline, everyone’s situation will be different. Remember to be careful when making changes to your website robots.txt. While these changes can improve your search traffic, they can also do more harm than good if you make a mistake. Test Your WordPress robots.txt File After you have created and customized your robots.txt it’s always a good idea to test it. Sign in to your Google Webmaster account and use this Robots Testing Tool . This tool operates as Googlebot would to check your robots.txt file and verifies that your URL’s have been blocked properly. Similar to the picture above you will see a preview of your robots.txt file as Google would see it. Verify that everything looks correct and that there are no warnings or errors listed. That’s it! you should be set up and ready to go now. My Final Thoughts As you can see, the robots.txt is an important part of your website’s search engine optimization. If used properly, it can speed up your crawl rate and get your new and updated content indexed much faster. Nevertheless, the misuse of this file can do a lot of damage to your search engine rankings so be careful when making any changes. Hopefully, this article has given you a better understanding of your robots.txt file and how to optimize it for your specific WordPress needs. Be sure to leave a comment if you have any further questions. Find the post on the HostGator Blog Continue reading
Posted in HostGator, Hosting, php, VodaHost
Tagged budget-on-pages, create-the-file, file, ftp, important, php, plugin, robots, search-engines, web hosting tips
Comments Off on How to Optimize Your WordPress Robots.txt
Will AI surpass humans?
Are we doing enough to bridge the skill gap between low-wage workers and the robots currently replacing those jobs? Will the evolution of ar… | Read the rest of http://www.webhostingtalk.com/showthread.php?t=1726746&goto=newpost Continue reading
Posted in HostGator, Hosting, php, VodaHost
Tagged bridge-the-skill, currently-replacing, evolution, hosting, read-the-rest, robots, skill-gap, the-evolution, the-rest, web hosting, web hosting lounge
Comments Off on Will AI surpass humans?
Robots guider
How can the website be made more easily accessible to SEO robots? (A special “robots page” or code perhaps?) Continue reading
Posted in BlueVoda, Hosting, php, VodaHost
Tagged auto-generator, facebook, hosting, network, querry, robots, search-engines, tutorials, vodahost, web hosting
Leave a comment