Build data aggregation with StreamX

Data aggregation involves collecting and combining data from various sources to create new entities, offering a unified view or summary. When composite data parts are spread across multiple systems, managing the data flows can become complex. StreamX addresses this issue by offering a central assembly point and delivering pre-computed data to the web server.

In this tutorial, we will use a simple implementation of data aggregation with StreamX.

This tutorial covers the following topics:

  • Data aggregation from multiple sources

  • Page generation by using the StreamX Rendering Engine, including:

    • Managing page templates

    • Managing template data

Prerequisites

To complete this guide, you will need:

Verify that no other StreamX instance or any other application that uses port 8081 is running.

Step 1: Get the sources

Clone the Git repository containing source files for the example:

git clone -b 1.1.0 https://github.com/streamx-dev/streamx-docs-resources.git

Step 2: Run the StreamX Mesh

Our example StreamX Mesh is configured to aggregate data from multiple independent sources and merge them into a unified entity. The system combines:

  • Product information from PIM

  • Pricing data from the internal system

  • Customer reviews from FMS

The computed data is then fed into the StreamX Rendering Engine, which generates the final target pages.

  1. Open the terminal and go to build-data-aggregation-tutorial inside the cloned project directory.

  2. Run the StreamX Mesh by using the following command:

    streamx run
  3. Wait for the following output:

    -------------------------------------------------------------------
    STREAMX IS READY!
    -------------------------------------------------------------------
    ...
    -------------------------------------------------------------------
    Network ID:
    ...
    Mesh configuration file: ./mesh.yaml
    -------------------------------------------------------------------

Step 3: Publish template and data

Publish template

  1. Publish the site/template.html to the renderers channel with the following command:

    streamx publish -s 'template.bytes=file://site/template.html' renderers template.html

    Where:

    • -s indicates that an external plain text file is the source for the published content.

    • renderers is the channel you are publishing the template to.

    • template.html is the publish key.

Publish rendering context

The StreamX Rendering Engine requires additional context:

  • Data that triggers page generation

  • The type of generated output

  • Names of generated results.

Once this context is defined, you can proceed with publishing the required data.

  1. Run the following command to provide the necessary page generation details:

streamx publish rendering-contexts pages-rendering-context rendering-contexts/pages-rendering-context.json

Publish product data

  1. Publish the data/product.json to the data channel with the following command:

    streamx publish -s 'content.bytes=file://data/product.json' data product:1

    Where the number 1 following the colon represents the id, serving to consolidate entities from several channels.

  2. Open your web browser and go to http://localhost:8081/generated/1.html.

  3. Verify that the page is accessible, but has no price and no reviews.

Step 4: Update optional data

  1. Publish the data/price.json to the data channel with the following command:

    streamx publish -s 'content.bytes=file://data/price.json' data price:1
  2. Open http://localhost:8081/generated/1.html.

  3. Verify that the page contains the price.

  4. Now unpublish the data by using the price:1 key with the following command:

    streamx unpublish data price:1
  5. Visit http://localhost:8081/generated/1.html.

  6. Confirm that the page generated from product:1 data is published, but its price is not available.

Step 5: Update multivalued data

Publish reviews

  1. Publish the data/review_1.json and data/review_2.json to the data channel with the following commands:

    streamx publish -s 'content.bytes=file://data/review_1.json' data review:1:firstReviewHash
    streamx publish -s 'content.bytes=file://data/review_2.json' data review:1:secondReviewHash
  2. Refresh http://localhost:8081/generated/1.html.

  3. Verify that the page now contains two reviews.

Unpublish part of the data

  1. Unpublish a review with the review:1:firstReviewHash key with the following command:

    streamx unpublish data review:1:firstReviewHash
  2. Visit http://localhost:8081/generated/1.html.

  3. Confirm that the review generated from review:1:firstReviewHash has disappeared, but the second review is still visible.

Summary

Congratulations! You have learned how to create pages from multiple external sources by using the StreamX Rendering Engine.