[SmartCrawl Pro] Initial site crawl for sitemap

I have approximately 4,000 posts over the last 12 months and only yesterday realized (duh!) there was no sitemap.

Though I had the Yoast Premium SEO plugin installed I had never used it.

Doing some homework online I came to the conclusion I should use your SiteCrawler.

With 4,000 +/- posts I tried the first crawl and only 1,000 URLs were found (976 of which needed to be added to the sitemap).

I tried upping the limit on the number of URL’s to 5,000 (since I read Google will accept up to 50,000 URLs in a single sitemap file) but couldn’t do so.

How do I get all of the posts to be crawled and added to the sitemap (and not have to do it in many annoying passes (where I also get the message “Crawler is cooling down… try later).

Thank you,
Duane

  • Satwik Relwani
    • Staff

    Hey Duane,
    Hope you are having a good day :slight_smile:

    As I checked your sitemap ( https://thecirculareconomy.com/sitemap.xml ) it seems that a sitemap is successfully generated & all the posts are added to the sitemap.

    [attachments are only viewable by logged-in members]

    Since you have more than 1000 links, Smartcrawl automatically switches to Split Sitemaps, a separate sitemap is generated for Posts Categories, Pages, etc up to 1000 URLs in each sitemap.
    You can learn more about SmartCrawl Sitemaps here:
    https://wpmudev.com/docs/wpmu-dev-plugins/smartcrawl/#sitemap

    You can check your sitemap here :
    https://thecirculareconomy.com/sitemap.xml

    Feel free to get back to us for any further query regarding this :slight_smile:

    Kind Regards,
    Satwik

  • Duane
    • WPMU DEV Initiate

    Thank you for the reply.

    The that the crawler continually reports “Total URLs Discovered 1000” is still confusing.

    I looked at the multi-part XML files and do see that all URLs (I didn’t verify, but the totals appear to add up) do appear to be there.

    Again, the question remains, with Google accepting up to 50,000 URLs in a single sitemap XML, why are we forced to have multiple passes/multiple files with the SmartCrawl?

    Thank you,

    Duane

  • Satwik Relwani
    • Staff

    Hey Duane,
    Hope you are having a good day :slight_smile:

    Search engines need to use your server resources to download any kind of file, including sitemap files.
    Even if you have a powerful server, large sitemaps will increase the chances of truncated responses or timeouts. And therefore, large sitemaps will increase the chances of crawling errors.

    A single sitemap with 10,000+ products is going to be a strain on your server whenever search engine bots try to download it. It’s much better to have 10 sitemaps with 1000 links than one massive one.
    You can give this article a read :
    https://www.joomlashack.com/blog/tutorials/use-small-sitemaps/

    Also If you wish to increase the limit, you can simply do that by going to –
    SmartCrawl > Sitemap > Settings > Generation Method > Split Tab > Number of links per sitemap.

    [attachments are only viewable by logged-in members]

    I hope this helps :slight_smile:

    Feel free to reach back to us for any queries.

    Kind Regards,
    Satwik

    • Duane
      • WPMU DEV Initiate

      Thank you for the reply.

      I tried setting the file size to larger that 5,000 at the very start and couldn’t change the number to anything higher than 1,000.

      This morning after another regularly scheduled crawl was completed, 978 URLs were still listed as not being included in the sitemap (despite repeatedly clicking on include in sitemap at the bottom of the list.)

      I ran a manual crawl this morning and it still showed 975 URLs not included in the sitemap.

      Again, I clicked on the “include in sitemap” option (can’t recall the “official” terminology associated with the button at the bottom of the list.)

      DS

  • Satwik Relwani
    • Staff

    Hey Duane,
    Hope you are well today :slight_smile:

    Apologies for my previous answer, I would like to omit it a bit.
    Smartcrawl has a predefined limit of 1000 entries per Sitemap and cannot be increased from the plugin itself.

    If you wish to increase the limit, please add the following line to the “wp-config.php” file above the “/* That’s all, stop editing */” line, you can find the wp-config.php file in your WordPress root directory.

    define( 'SMARTCRAWL_SITEMAP_POST_LIMIT', 5000 );

    Since you want 5000 entries per sitemap I have added the value on the above code, you can adjust it accordingly.
    Once it’s done, run the crawl again so it could update the sitemap.

    Also as you mentioned :

    I ran a manual crawl this morning and it still showed 975 URLs not included in the sitemap.

    You could try by clearing the Cache of your website & check if that fixes the problem.

    However, If this is still happening, you need to provide us Support Access so that we can check this more closely and look for a fix.

    You can do that by –
    WPMU Dev > Support > Support Access > Grant Support Access.

    You can read more about it here :
    https://wpmudev.com/docs/getting-started/getting-support/#enabling-support-access

    Let me know here once it’s done as I won’t be notified automatically.

    Kind regards,
    Satwik

  • Duane
    • WPMU DEV Initiate

    I ran the crawler just about 30 minutes ago, still 945 URLs not included.

    I granted support access.

    While you are in there feel free to do a complete backup of “everything” and try to restore “everything” except for wp-config.php and see if you can figure out why the restore hangs “forever”.

    DS

  • Duane
    • WPMU DEV Initiate

    I made the change to the wp-config.php file increasing the number of the crawl to 5000 and tried setting the number of links per sitemap to 4999.

    After hitting “Save Settings” an error message still appears at the top of the page indicating the maximum number allowed is 1000.

    Here’s the section from the WP-Config.php file:

    * obito https://codex.wordpress.org/Debugging_in_WordPress
    */
    define(‘WP_DEBUG’, false);
    define( ‘SMARTCRAWL_SITEMAP_POST_LIMIT’, 5000 );

    /* That’s all, stop editing! Happy blogging. */

    Has anyone actually tried these changes that were recommended?

    At first I was told to simply set the number in the dashboard to a higher number and I had already tried that and got the same error message.

    Now I was told to change the wp-config.php file and that change also didn’t work.

    It seems as if these are simply guesses and no one has actually verified if this works.

    Beyond these on-going issues, nowhere do I see an actual number of how many links, total, are in the sitemap file.

    It really shouldn’t be anywhere near this burdensome or take this much time to get a decent working total sitemap.

    Between these problems with the sitemap and the on-going problems with Snapshots, I am getting ready to abandon WPMU DEV.

    Not complaining, just being honest that so far, neither of the two things which were the things that made me choose WPMU DEV’s plugins work.

    Duane

  • Satwik Relwani
    • Staff

    Hey Duane,

    I apologize for the inconvenience caused to you using our products, we try our best to make it as efficient as possible but due to hosting or plugin issues, sometimes we do get cases like these however we make sure that we are available 24/7 to support our users using our products.

    As in this case, I changed the Sitemap from split to default and ran a new crawl and that fixed the problem, you can see the total sitemap URLs on the top of the Sitemap as in the image below.

    [attachments are only viewable by logged-in members]

    [attachments are only viewable by logged-in members]

    Also while I was working the Support Access seems to have stopped working, I would request you to Enable it again so that we can complete our testing and provide you for a fix for Split Sitemaps too.

    As mentioned in one of my previous replies, split sitemaps take less server resources and reduce the load on the server, even though the Default one does the job currently, we need to test more on your website to conclude this since we got some server errors during the crawl.

    Also, you mentioned about the Restore problem using the Snapshot plugin on your website, I would request you to please open a new ticket or contact us on Live chat through the link below :
    https://wpmudev.com/hub/support/#get-support

    Feel free to reach back to us for any queries.

    Kind Regards,
    Satwik