Home Blog How to do the analys ...

How to do the analysis of a website: the main elements

Giorgia Schiappadori

A SEO Audit is a complete and detailed analysis of a website that considers every element relevant to search engines, starting from the technical aspects up to the content structure.

SEO Audit is a fundamental activity because it allows to know at 360° the current status of any type of site: blog, e-commerce or other. Through this analysis it is possible to assess the general status, detect the main structural errors, any critical issues that may affect the organic performance of the site and also the business objectives.

Furthermore, starting from the SEO Audit it is possible to identify margins for growth and improvement in terms of visibility and organic sessions and to elaborate a strategy aimed at making the site competitive on Search Engines compared to the reference market.

The main elements to consider during the SEO analysis of a website

Let’s see now, step by step, what are the main aspects to be evaluated during the implementation of an SEO Audit.

Evaluation of the current status

First of all, it is essential to check the status of the site in terms of traffic, positioning and organic performance. With the support of tools such as Google Search Console you can access data that provides valuable information about a site’s health, such as positioning, click, impression and CTR data. You can then go deeper into performance in relation to specific queries or pages.

These data are the first indicators of the current state of the site and the comparison of organic performance with respect to a given period and/or with respect to a previous period already makes it possible to make fundamental observations on the performance of the site. For example, a strong discrepancy between clicks and impressions may be a first evidence that reveals the need to investigate the reasons for this difference. In a situation like this it is certainly useful to go and check the type of content that populates the SERP for the query of interest: the possible presence of competitors, ads on the Google Ads Search network, Google Shopping ads, images or Google Knowledge Graph are just some of the reasons that could explain a substantial difference between the total number of clicks and impressions.

Evaluation of the indexing status of the site

After an initial evaluation of the current status of the site, it is necessary to check that all the desired pages are accessible for search engines and to evaluate the status of the indexing of the site.

You can quickly evaluate the indexing status using the Google Search Console. The tool, in fact, detects the difference between the number of pages present in the Google index and the total number of pages on the site. A strong discrepancy between these two values undoubtedly indicates a problem related to indexing and visibility. It is therefore necessary to examine the pages excluded from the index and assess the reason why they are excluded.

At this stage, one of the main elements to analyse is robots.txt, a text file that contains specific indications on which pages should be accessible and which should not. It is necessary to verify that the robots.txt is written correctly and that access to resources that should be accessible or vice versa is not denied. It is important to make sure that the correct sitemap.xml URL is indicated at the end of the file, including the full address.

The sitemap.xml must also be analysed. The sitemap contains information about all the resources on the site (pages, images, videos and other types of files) and is used by search engines to scan the site more efficiently. It is important to make sure that the Sitemap is error-free and that all pages are correctly inserted into it.

Finally, you must verify the correct implementation of Canonical tags. The URL indicated as Canonical indicates to search engines which version of a particular page should be indexed. Usually the canonical URL refers to the page itself but its correct setting is useful to solve possible duplication and/or cannibalisation problems. Viceversa, if the Canonical link is not correctly enhanced and implemented on the site, it could cause various and in some cases even very serious problems, both indexing and content duplication.

Analysis of the information architecture of a website

The information architecture represents the way contents are structured and organised within a site, it is the navigation tree and the definition of hierarchical importance of the topics on the site. The analysis of the information architecture of the site serves to assess the extent to which the contents are linked to each other and accessible to both Users and Search Engines.

At this stage, attention must be paid to two aspects in particular:

  • The depth of navigation: indicates the number of clicks that distance a content from the home page and determines the importance that the search engine must assign to a single content (the fewer clicks needed to reach it, the greater the importance associated with it);
  • The URL structure: which should be simple, short and meaningful, both to increase the probability that a user understands its meaning and clicks on the link, and to facilitate the indexing process. There are some good practices to follow when building SEO Friendly URLs, such as avoiding using special characters, capital letters or spaces. If you have URLs with parameters (e.g. http://www.example.com/results?search_type=search_videos&search_query=tpb&search_sort=relevance&search_category=25) it is important to make sure that they are correctly managed through the Google Search Console or with a properly enhanced Canonical link.

HTTP status code

Status codes are codes with which the server responds to a browser request for a resource. There are several status codes and, among those useful to identify within the SEO Audit, the most common are:

  • Status code 404: indicates that the resource has not been found. There are several reasons why a page responds with a 404 status code, such as incorrect removal of a page or change to the permalink. It is important to detect and fix this type of error in the shortest possible time because when a user displays a page that returns 404 status code it is very likely that he will leave the site and increase the bounce rate. Similarly, if Spiders do not find the resource linked internally to the site, the scanning process will not continue. Several solutions can be identified but the most commonly used is to set a 301 redirect from the resource that returns a 404 status code to a similar resource, which responds instead 200.
  • Status code 301: Indicates to search engines that the resource has been moved permanently. It is important to make sure that redirects are set correctly, that they point to pages with content related to the previous ones and that no redirect chains are created, i.e. resources that do more than one redirect before responding with a correct status code (200). In addition, it is recommended to limit the number of redirects as much as possible because a large number creates delays or hindrances during the indexing process.
  • Status code 5xx: indicates a server error that may be related to several reasons, including errors in the implementation of the site code or errors in the implementation of Redirect rules. In these cases it is advisable to intervene as soon as possible.

HTML and OnPage Best Practice code analysis

In addition to the aspects seen so far, there is a whole series of other elements to consider that relate to the analysis of HTML code.

The HTML code of the site is what Search Engine Spiders analyse.

One of the first things to do is to verify the absence of internal links in Nofollow. Nofollow tells search engines not to follow a certain internal link, so it is essential to limit its use in order not to hinder access to the pages.

Another aspect to consider is the use of breadcrumbs which indicate the position of the page with respect to the information architecture of the site and are important for two reasons:

  1. They are useful for the User to understand the route taken to that page and to better orient himself in the navigation;
  2. They are used by Search Engines because they facilitate the Crawling process.

There are also other factors to analyse in relation to the HTML code and it is necessary to verify that the OnPage SEO Best Practices are respected. Here are some elements that must be considered and evaluated at this stage:

  • Title Tag and Meta Description: the title tag indicates the SEO title of a given page and is one of the main positioning factors because it is one of the first elements that is taken into account by search engines to evaluate the content of the site. On the contrary, the meta description is not a positioning factor but is shown in SERP as a preview of the content and can significantly affect the CTR. In fact, the meta description is the only way we have to indicate to the user that that content really responds to his search, leading him to decide whether or not to click on the result.
  • Text structure and Heading tags management: during the analysis, particular attention must be paid to the way the texts on the pages of the site are structured. It is important to evaluate this aspect because only a text built in SEO perspective can best satisfy the user’s search intent. It is therefore essential to ensure that the texts are complete with all the necessary information, that they are divided into paragraphs and sub-paragraphs and that proper use is made of heading tags (H1, H2, H3, etc.).
  • Alt text and image optimisation: all images should have an appropriate filename representative of the image itself. Alternative text is even more important for accessibility issues, as it describes the image to users who, for various reasons, cannot or cannot see the image. In addition, the alternative text allows search engines to understand the object of the image and acts as anchor text when using the image as a link.

Performance and speed of the site loading

The loading speed of a site is a very important factor to consider both for the user’s browsing experience and because it has long been one of the main ranking factors. It is therefore essential to analyse the performance of the site and verify that the contents are loaded quickly and that the User has the possibility to start interacting with the page in a few seconds.

At this stage it is necessary to support tools such as Google PageSpeed Insights (Lighthouse) or GTMetrix. Using these tools you can test individual URLs, make a general evaluation of the site’s performance and identify any problems.

In addition, these tools offer suggestions for solving the problems detected by providing reference documentation.

Structured data testing

One of the last but fundamental stages of the SEO Audit is the verification of structured data.

Structured data are codes that are added to the pages of the sites (or individual elements on the page) to describe content in a standard (structured) way to search engines to add information and details that help to better understand what the content represents and to show information as relevant as possible in the search results, thus allowing to obtain advantages both in terms of positioning and CTR. To implement the structured data on the site you can refer to Schema.org, a project created by the main search engines (Google, Microsoft, Yahoo and Yandex).

Using Google’s structured data test tool, you can test individual URLs to see whether or not there is structured data on the site, which types of structured data have been implemented and identify any errors that need to be corrected.

Starting from the analysis of the structured data already present on the site it is possible to identify further types of structured data that could be added to enrich the description of the contents of the individual pages.

The most common Structured Data are:

  • Organization: structured markup that provides information on site ownership;
  • BreadcrumbList: structured markup that highlights the position of a particular page within the information architecture;
  • Article: structured markup that provides specific information about articles and posts (e.g. date of publication, author, etc.);
  • Product: structured markup that provides detailed information in relation to a product page (e.g. price, availability, reviews, etc.);
  • Event: structured markup that highlights details and information related to a specific event (e.g. location, date, duration, entry mode, etc.).


An SEO Audit, in conclusion, is a technical document that explores in depth the problems of a website. All the analysis and insights included in this article aim to improve the scanning of the site content by Search Engine Spiders. It is essential to improve the navigability and correlation between content, HTML code and meta tags, structured data and the speed of response of the pages, but one of the most important things to take into account while doing the SEO Audit analysis is the Crawl Budget, which is the time it takes Google (and in general Search Engine Spiders) to scan the pages of our site. One of the main objectives of an SEO optimisation activity is to optimise the time it takes the Spiders to scan our site, providing the best and most relevant content to our core business.