Crawling, Indexing & Ranking | SEO -3

How Google Finds Webpages?

From finding new content on the web to ranking it, there are overall 3 steps involved. Let’s talk about those steps and learn how we can make our web pages accessible to Google and eventually rank them.

3 Steps For Ranking – Search Engine Code

1. Crawling:

This is the first step. Google or any other search engine like Bing and DuckDuckGo has pieces of code that are called crawlers or bots or spiders. The job of these bots is to keep crawling and finding new web pages from the existing known web pages via links.

How To Get Your Website Crawled

The crawler visits some web pages and finds all the external or internal links which are present on the page. It adds those newly found links to its list and crawlers them in order to find more links.

This process continues 24*7, making sure whatever new web pages are created by users are found by Google.

This is one way Google gets to know about web pages but there is one other way.

You can manually submit your website to Google either via search console or by using ping. Search Console is a tool that Google provides to webmasters for monitoring & improving the performance of their websites. We will learn how to use Search Console in a separate guide later.

With the ping method, you can submit a list of all of your website URLs to Google directly. For that, you need a sitemap. So let’s talk about Sitemap first.

Sitemap

A sitemap is a file containing the list of all of your website URLs. You can view any website’s sitemap by going to the domain.com/sitemap.xml URL.

You can either manually create a sitemap or you can automate this process using some tool online. Check this sitemap generator tool. 

If you are using some CMS like WordPress or Wix, then your sitemap is automatically created. To check your sitemap just go to your homepage and put /sitemap.xml, for example https://searchenginecode.com/sitemap.xml

If your sitemap is ready, then you can ask Google to crawl your website by typing https://www.google.com/ping?sitemap=FULL_URL_OF_SITEMAP in your browser.

For example https://www.google.com/ping?sitemap=https://searchenginecode.com/sitemap.xml

As said earlier, you can also submit your sitemap to Google via Search Console. We will talk more about Search Console and learn this later.

After submitting the website, you need to check if the website has been crawled and indexed by Google or not. To do that, type in site:your_website in the browser, if you can see your website in search results then it has been indexed. However, if you are not able to see then you may need to wait as Google takes time to crawl and index websites.

For example site:https://searchenginecode.com

Sometimes, you may have some pages in the website that you do not want to be indexed due to security reasons. To give directions to Google regarding what to crawl and what not to, we use something we call robots.txt file (read as robots dot TXT).

How To Get Your Website Crawled Efficiently?

Make sure every page of your website is accessible from every other page of your website.

We call this internal linking. The importance of internal linking in SEO can not be stressed enough.

The rule of thumb is to make every page reachable with 3 clicks at maximum. 

The pages which cannot be accessed within 3 clicks usually perform badly.  To improve internal linking, put the links of the most important pages in the footer since the footer is present on every page. Also, you can use menus to make pages accessible.

Robots.txt File

A robots.txt file is a text file that contains a set of directives to search engine bots. You can check any website’s robots file by going to domain.com/robots.txt, like https://searchenginecode.com/robots.txt

A simple robots.txt file would contain these lines:


User-agent: *

Disallow: /some-url

The User-agent indicates the search bot to which directive is meant for. We usually put * to indicate that this directive is for all bots like GoogleBot, Bingbot etc.

If we wanted to pass a directive only to Google bot, then we would have used something like this.

User-agent: Googlebot

The Disallow indicates which URLs not to crawl. The URL specified is disallowed to be crawled. In the above case, all search crawlers would not crawl URLs containing /some-url slug

Again, if you are using CMS like WordPress then your robots.txt file is automatically created.

2. Indexing

After crawling all the web pages, Google stores them in its own massive, massive database.

This storing of the websites in Google’s database is called Indexing. Remember, Google can only rank web pages that are in its index. It sometimes happens that Google may crawl a website but may index it sometime later.

How To Get Your Website Indexed By Google?

  1. Create a sitemap
  2. Submit the sitemap to Google.
  3. Whenever you create a new blog post, submit the URL to Google via search console.

3. Ranking

After forming the index, Google now begins to rank websites and show appropriate ones to the user whenever a query is made.

Search results depend on the user’s location, interests, intention, and lots of other factors. If you have done good SEO then you may be able to get listed at the top of ranked websites.

Remember, Google serves results from its index only. So you need to make sure your website is indexed and more importantly, every page of your website is indexed to grab every opportunity. 

Search Console is a robust tool from Google that shows you all the data you need to know about how your website performance, where the website is being ranked, what are the issues with your website and lots of other stuff. We will learn about Search Console later.

This should give you a good idea regarding how Google and the overall search landscape works.

Other search engines such as Bing and Baidu work in pretty much the same way. Now that you understand how things work, let’s start actually doing SEO on the website.

Some Advanced Resources:

Advanced Robots.txt Guide

Advanced Sitemap Guide

Leave a Reply

Your email address will not be published.

thirteen − ten =