Set up sitemap generation with StreamX

In this tutorial, you will set up a StreamX Mesh that is capable of automatically creating and managing a sitemap.

Website sitemap files are a structured way of describing relevant pages, assets, and their relationships to search engine crawlers. For non-trivial setups, creating sitemap files when multiple source systems are involved quickly becomes a challenge. StreamX’s inherent support for aggregating data from multiple, heterogeneous source systems can dramatically simplify sitemap generation, even for complex setups.

This tutorial covers:

  • Setting up and running StreamX Mesh to generate sitemaps dynamically.

  • Publishing website content and ensuring it is included in the sitemap.

  • Verifying sitemap updates when content is added.

  • Unpublishing content and confirming sitemap adjustments.

Prerequisites

To complete this guide, you will need:

Verify that no other StreamX instance or any other application that uses ports 8080 and 8081 is running.

Step 1: Get the source files

Clone the Git repository containing source files for the example:

git clone -b 1.1.0 https://github.com/streamx-dev/streamx-docs-resources.git

Step 2: Run the StreamX Mesh

The StreamX Mesh for this tutorial consists of three services that take care of generating and serving sitemap files while HTML pages come and go.

  1. Open the terminal and go to set-up-sitemap-generation-tutorial inside the cloned project directory.

  2. Run the StreamX Mesh by using the following command:

    streamx run
  3. Wait for the following output:

    -------------------------------------------------------------------
    STREAMX IS READY!
    -------------------------------------------------------------------
    ...
    -------------------------------------------------------------------
    Network ID:
    ...
    Mesh configuration file: ./mesh.yaml
    -------------------------------------------------------------------

Step 3: Publish content

  1. Publish the index.html page for a hypothetical site by using the following command:

    streamx publish -s 'content.bytes=file://site/index.html' pages index.html
  2. Open your web browser and go to http://localhost:8081.

  3. Verify that the page index.html is accessible.

  4. Then go to http://localhost:8081/sitemap.xml in your web browser.

  5. Verify that the sitemap contains an entry for the index.html page.

    There might be a few seconds delay in sitemap generation
  6. Publish another example page article.html by running the following command:

    streamx publish -s 'content.bytes=file://site/article.html' pages article.html
  7. Publish sample pages for another sub-site for blogs by executing the following commands:

    streamx publish -s 'content.bytes=file://blog/blog.html' pages blog.html
    streamx publish -s 'content.bytes=file://blog/entry.html' pages blog/entry.html
  8. Visit http://localhost:8081/sitemap.xml again.

  9. Verify that the sitemap now contains all four entries.

Step 4: Unpublish content

  1. Unpublish the page blog/entry.html by using the following command:

    streamx unpublish pages blog/entry.html
  2. Visit http://localhost:8081/sitemap.xml.

  3. Verify that the entry for http://localhost:8081/blog/entry.html has been removed from the sitemap.

Summary

Congratulations! You have just confirmed that sitemaps are automatically re-generated whenever you add or remove pages from StreamX. This automated process ensures that your website’s sitemap is always up-to-date, simplifying SEO optimization and enhancing search engine discoverability.