What is Crawl Errors?
Introduction
Crawl errors are problems search engines encounter when they try to access your pages. Crawl errors can also mention in a report in the inheritance version of Google Search Console. The crawl error report has two main sections:
- Site errors: These errors stop Googlebot from accessing your entire website.
- URL errors occur when Googlebot cannot access a specific URL.
Index Coverage Tracking Crawl Errors
We will only discuss the Errors section, as these problems will prevent Google from crawling or indexing your pages.
Site Errors
Site errors are problems that occur at the site level. Site errors mean that google and users cannot access any of your pages. So don’t skip these errors. There are three site errors that Google counts as crawl errors.
DNS Error
A DNS, “Domain Name System,” decodes a website’s IP address from a cord of numbers to usable letters and numbers. Essentially, it allows us to browse the Internet without knowing the IP address of every website we want to visit.
The DNS system works like this:
You type a domain name in the browser. The browser checks to see if the data for that domain is stored locally on the computer. If not, the browser will send a request to the local DNS server then the server connects to the DNS root nameserver to learn the server’s location for the domain’s first fragment: “.com”.
This is called the top-level domain (TLD) name server. Then, the DNS server connects to the TLD server.
Server Crawl Errors
Server errors are different from DNS errors. It means that Google could look up your URL on the DNS server. However, it cannot load the page due to a server problem.
This means that your server is taking too much time to respond, and the request from Google has timed out. Google will spend only a certain amount of time waiting for a response from the server. If it takes too long, the bot will answer.
DNS errors, a server error, is a massive problem for your website. It means that something went wrong with your server, and it prevents users and bots from accessing your website.
Robot Failure
Robot flaws refer to Google’s incapability to find and read a website’s robots.txt. For example, suppose DNS is the first step in connecting to the site server, step 2. Then reading robots.txt is the final step when Google crawls a website.
Google doesn’t want to crawl and index pages you don’t want, so if it can’t access a robots.txt file, it will postpone crawling until it can read the file. However, if you want Google to crawl every single page on your site, you can forget to add this file to your domain and ignore this error. If you see this fault in Google Search Console, check how you set up your robots.txt file.
When encountering a robot error, it’s worth noting that not having a robots.txt file is better than having a misconfigured one, as a broken robots.txt file will prevent Google from crawling your site altogether.
URL Errors
URL errors differ from site errors because they only apply to a particular page, not your site. Instead, they mark instances where Google requested a specific page but could not read it.
Your top bet is to add content to these pages to make them valuable or not index them so Google will no longer see them.
Fixing your 404 will depend on the error. It could be as simple as fixing errors in an internal link. If these are external links to older pages, use a 301 redirect to redirect to a new one. If it looks like a URL that people assume exists on your site, consider adding the page or turning to relevant content elsewhere.
Access Denied due to Crawl Errors
These errors occur when Google is not allowed to access a specific page. They are generally caused by the following:
- The password that protects the page
- Pages not allowed by robots.txt
- Your Hosting Provider Blocks Googlebot.
If you don’t need the URLs listed in this Crawl Errors area to appear in search results, you don’t require to do anything here. This is a confirmation that something is right. However, if you want these pages to appear in search results, you must fix what’s blocking Google.
- Remove page login requirement
- Remove the URL from your robots.txt.
- Contact your hosting provider to allow Googlebot
Do not confuse this error with the link directive or the meta robots tag. These URLs have nothing to do with it. Likewise, the unfollowed URLs in Crawl Errors are just URLs that Google could not wholly follow to their destination.
URL Inspection Tool
Google Search Console allows you to search individual pages on your website to detect indexing problems and crawl errors. In addition, you can access URL inspection for unique URLs in several ways:
By clicking the inspect URL link in the left navigation bar or entering the URL in the search bar by clicking the enlarging glass icon in the row for a URL in the performance report—at the top of the page. Then select a property from the Search Console welcome page.
Search Console URL inspection tool
- The search console tool tells you whether a page is in Google’s index and details found by Google when it tries to locate the page.
- The page where Google found the link to your page.
- The last time Google crawlers tried to access the page
- This is what the tool report includes for a page that returns an HTTP 404 status. URL inspection tool report
What is the problem with crawl errors?
The most obvious problem with having crawl errors on your site is that these errors prevent Google from accessing your content. Google can’t rank pages you can’t access. A high crawl error rate can also affect how Google views your website.
Many crawl errors can also impact how Google views the health of your website in general. For example, when Google’s crawlers have a lot of trouble accessing a site’s content, they may often decide that it’s not worth crawling these pages. This will cause your new pages to take much longer to get into Google’s index than they otherwise would.