Crawling control in online shops: But how (so)?
by Gaurav Gupta
Especially with large online shops with many thousands up to millions of URLs, an SEO topic becomes particularly exciting: Crawling control. For large pages, the crawl can not be left to itself. Small technical details can have a big impact here. So how should you control crawling? And why at all? We have put together the most important points.
Understand your crawlbudget
For really big sites – as most shops are – it is important to understand that there is a so-called crawling budget. Search engines are not unlimited resources available to crawl all the URLs of the world on a regular basis. Your goal must therefore be to crawl search engines only or at least primarily the URLs, which are really important. How often Google crawls your page, you can see crawling statistics in your Search Console.
Even more detailed information gets you by the way, if your individual directories in the Search Console submit, in order to find differences and irregularities. (Correction: Just the Crawl stats since not work, but all other Crawl Reports) irregularities can be the following rashes among others:
Through an analysis of the logfiles we have found in this case that the bot of the Google image search has run into tons of unnecessary image URLs. This should not be so natural, so it was necessary to set up and reduce the image URLs.
When it comes to Crawlingsteuerung, such is logfile analysis enormously important. This is the only way to fully understand, monitor and control the activities of the crawlers.
But how do you control the crawling? Roll up your sleeves, now you’re off!
Plant your page architecture logically
The foundation for a well-optimized and crawlable shop is a clean and logical page architecture. This includes many factors. Of particular importance is a meaningful URL structure that reflects on speaking directories structure of the site. This must be weighed how deep this directory structure must be really – after all, it should continue as short and handy URLs be. In addition it is recommended that a breadcrumb navigation comprehensible all parents of different URLs for crawlers and users to link.
A bit closer to the practice I can explain by an example:
If the Breadcrumb navigation looks like this:
Men »Clothing» Shirts »Businesshemd
Then it is not useful to give all layers a separate directory, even if it were logical:
Instead, instead of ignoring the “clothing” directory, because it is clear from the context:
Side architecture is, of course, a topic in itself – I only treat it superficially. Nevertheless you should set the course here before you start with some wild corrections.
Links your internal links correctly
The higher your site is, the more urgent are the mistakes that tools like OnPage.org , Deep Crawl , Audisto show & Co.. For example, should your internal links to redirects and especially resulting forwarding chainsavoid . Also links to error pages should not happen to you, just like your Canonical should refer without detours on the correct sides. Internal links you should not attribute rel = nofollow give.
A few useful reports from OnPage.org are, for example, these:
Note your click path length
If you have a big page, you should also make sure that your URLs are not too far from the home page. In OnPage.org, this report is called “Links” “click path” and looks like this:
From a certain page size, it can be quite natural and in the sense of the page hierarchy that not all URLs are three clicks away from the home page. Therefore, OnPage.org also works with yellow bars. However, the user has to click seven times more often to get from the start page to the final URL, which is much too much. Then you should ask yourself whether these pages are important and whether yours is not better integrated into the page (again keyword side architecture). If they are not important, then you should consider why they still exist at all.
Uses a Sitemap
For large pages it is a great help for search engines to have a sitemap for orientation. 50,000 URLs or 10MB allowed in a Sitemap. If you have more URLs, you should a Sitemap index file to use. However, such a sitemap must also be maintained. If there are some outdated URLs in the sitemap, it is counterproductive.
In store systems using parameters (eg example.com/damen-sneaker-beispielmodell? Is recommended color = blue & groesse = 38 ). By means of parameters, crawlers can already interpret which URLs are interdependent, what their content is, and how important they are.
But URLs are most important for pages with no parameters, the better choice. In the Search Console (Crawl »URL parameters), you can tell Google what these parameters are for, how they affect the content, and whether they should be crawled:
However, the instructions to crawl URLs are misleading. An analysis of log files shows that it not can forbid the crawl. As a recommendation to Google it is nevertheless useful to configure the parameters.
Sets filter correctly
In the case of shop systems, the main challenge is usually dealing with filtering. Each new filter causes the number of shop URLs to exponentially increase, which can cause the crawler to break. Many filter possibilities with divisible URLs are however urgent in the sense of the user and thus also in the sense of SEO.
Instead, it is recommended that with PRG-patterns to work. Simply by clicking on the filter, only the content is changed, but not the URL. After that, a URL change is forced, which the crawler does not respect. If you want to know more about how this works, I recommend you the very good explanation of Mario Schwertfeger . The main advantage of this method is that the URLs still exist, but they are not always internally linked in all variations.
Improves your loading times
Directly in connection with the crawlbudget is also the size of the page. Quick loading pages can also be captured faster by the crawler. The charging time is therefore especially for very large sides an important SEO lever. And with email sites like testmysite.thinkwithgoogle.com that makes even fun! Is also extremely worthwhile to look at the file sizes and the waterfall plots of gtmetrix.com . For the latter, you can also use the developer tools from Chrome or Firefox.
Keep the whole picture in view
Ultimately, it is important that you keep your page as lean as possible. Most of the tasks are simply handicrafts, where there is no “maybe”, but only “right” and “wrong”. Of course, you have to weigh how many resources you use to implement it. But the beauty is: If the points are done, then you have made your page better. Google is happy, you are happy – and if the article helped you, I too am happy!
Are there any important measures? Then let me know in a comment!
Do you have questions burning under your fingernails? If she helps Seokratie ! You’re all too technical, and you need someone to take a close look at your site and guide you through the necessary actions? Then, your SEO request .