Blog | Samuel JOSET | Samuel JOSET

Real-Time Data Analysis: Building a Lambda Architecture with AWS DynamoDB and Lambda

June 1, 2023 · 6 min read

Back End Engineer @ Forge The Web

If you're here, it's probably because you have an interest in big data. This field offers many applications. One of them, real-time data analysis, holds a significant place in the current concerns of startups.

In this article, we're going to define what a Lambda architecture is and explain how to implement it. For the sake of simplicity, we will exclusively use AWS tools. Nevertheless, the concepts we're going to discuss can be applied with any other tool.

Are you ready? Let's go!

1. Understanding Lambda Architecture

Let's take a moment to understand what a Lambda architecture is:

Lambda architecture is a data processing structure designed to handle immense amounts of data. It is devised to meet the challenges posed by large-scale data management systems.

The specificity of this architecture relies on three essential parts: batch processing *(Batch Layer), real-time processing (Speed Layer) and and results providing (Serving Layer)

The Batch Layer is the layer where all incoming data are indexed. It handles data processing and generates views from it. These views are pre-computed and stored. The advantage of pre-computing these views is that it makes read operations extremely fast.

The Speed Layer operates parallel to the batch processing layer. Its primary role is to bridge the gap between the most recent data received and the latest pre-computed view from the Batch Layer. In other words, it takes charge of the data that have not yet been processed by the Batch Layer and calculates them in real time.

The Serving Layer combines the results of the Batch and Speed layers to provide a unified view of the data. It ensures that the most recent data are always available for analysis, even if they have not yet been processed by the Batch Layer. Thus, when a request is made, the result is always obtained from the most recent data. This is real-time data analysis at its finest!

2. AWS DynamoDB, Lambda, and Event Bridge to the Rescue

To implement our Lambda architecture, we will mainly use three AWS services: AWS DynamoDB, AWS Lambda, and AWS EventBridge.

AWS DynamoDB will be used for storage. Its scaling capacity and performance allow us to store and retrieve any amount of data, regardless of the traffic intensity. For this reason, DynamoDB will serve both the Batch Layer and the Speed Layer.

We will also use AWS Lambda to run our code. This service has the advantage of executing the code only when necessary, and automatically scaling resources based on demand. In addition, AWS Lambda allows code to be executed in response to an event, through a CRON or even an HTTP request.

Finally, AWS EventBridge will allow us to receive data asynchronously and trigger functions from specific events. It is via EventBridge events that we will trigger an AWS Lambda function that will execute the code responsible for storing data in DynamoDB.

3. Building the Batch Layer with AWS DynamoDB

The construction of the Batch layer involves processing and storing a significant volume of data in batches. Several points need to be addressed in this part:

When will the batches be created?
What will the format of the batches be?
How will the data enabling batch creation be stored?

3.1. Retrieving Data from DynamoDB

The first step in our process is to retrieve the data from DynamoDB. We will use the AWS SDK to perform this task.

const AWS = require('aws-sdk');
const documentClient = new AWS.DynamoDB.DocumentClient();

const fetchRecords = async (tableName) => {
  const params = {
    TableName: tableName,
  };
  const data = await documentClient.scan(params).promise();
  return data.Items;
}

3.2. Batch Processing of Data

After retrieving the data, the next step is to process them in batches. Batch processing is not just a matter of grouping data into packets. It can involve several operations depending on the needs, including aggregation, enrichment, or data cleaning.

Data aggregation: commonly used to summarize data. For example, sales data can be aggregated by region, by day, etc. We can also calculate new values through calculations of sums, averages, minimums, maximums, etc.
Data enrichment: We may also want to add additional information to our existing data from other sources. This can, for example, include adding demographic information to sales data, adding geographic information to location data, etc.
Data cleaning: This step should be systematic with each batch generation. It involves several operations, such as the removal of null values, the correction of data entry errors, the standardization of data formats, etc.

In our example, we will group the data into batches in the simplest possible way, without performing aggregation or enrichment. However, keep in mind that the logic of batch processing can and should vary according to your needs.

const createBatches = (data, batchSize) => {
  const batches = [];
  for (let i = 0; i < data.length; i += batchSize) {
    batches.push(data.slice(i, i + batchSize));
  }
  return batches;
}

3.3. Storing the Processed Data

Once our data has been grouped into batches, we need to store them again in DynamoDB.

const storeBatchedData = async (tableName, data) => {
  for (const batch of data) {
    const params = {
      TableName: tableName,
      Item: batch,
    };
    
    try {
        await documentClient.put(params).promise();
    } catch (err) {
        console.error(err);
    }
  }
}

3.4. Configuring the Lambda Trigger with AWS EventBridge

Next, in serverless.yml, we will configure the processBatchData function to be triggered at regular intervals.

functions:
  processBatchData:
    handler: handler.processBatchData
    events:
      - schedule: rate(24 hours)

In this example, our processBatchData function is triggered once a day thanks to the schedule event.

With these different steps, we have everything necessary for the operation of our Batch layer.

4. Configuring the Speed Layer with AWS Lambda

We are going to set up an AWS Lambda function that will be triggered each time a new event is emitted by AWS EventBridge. This function will be tasked with receiving real-time data and storing it for later use.

First, let's add a new function to our serverless.yml file:

functions:
  RealTimeDataReceiver:
    handler: handler.realTimeDataReceiver
    events:
      - eventBridge:
          pattern:
            source:
              - "my.app"
            detail-type:
              - "DataEvent"

Next, we need to create the corresponding realTimeDataReceiver function in our handler.js file:

module.exports.realTimeDataReceiver = async event => {
  const rawData = event.detail;
  
  // Storing the received data in DynamoDB
  await dynamoDb.put({
    TableName: process.env.DYNAMODB_TABLE,
    Item: rawData,
  }).promise();

  return {
    statusCode: 200,
    body: JSON.stringify(rawData),
  };
};

In this example, the realTimeDataReceiver function simply retrieves the event's data and stores it in DynamoDB for future use. It's important to note that the data processing does not occur in this function. Indeed, the Speed Layer in our setup is solely responsible for the receipt and storage of real-time data.

5. Creation of the Service Layer

Now, let's tackle the third and final component of the Lambda architecture. Its sole role is to respond to client requests by providing a consolidated view of the results of the calculations carried out by the Batch and Speed layers. It's this layer that allows end users to access the processed data and the derived information generated by the architecture.

To do this, we will use another AWS Lambda function to create this service layer. This function will be triggered by an HTTP request and will return the corresponding results from the database.

We will modify the serverless.yml file to add the new function:

functions:
  serveData:
    handler: handler.serveData
    events:
      - http:
          path: /data
          method: get
          cors: true

Here, we have configured our new serveData function to be triggered when a GET request is sent to the "/data" URL.

Here's a pseudo-implementation of this function in the handler.js file:

const AWS = require('aws-sdk');
const dynamoDb = new AWS.DynamoDB.DocumentClient();

module.exports.serveData = async event => {
  const params = {
    TableName: process.env.DYNAMODB_TABLE,
    // Add your filtering parameters here, such as the user ID or date, according to the needs of your application
  };

  const data = await dynamoDb.scan(params).promise();

  return {
    statusCode: 200,
    body: JSON.stringify(data.Items),
  };
};

In this example, we have created a Lambda function that simply scans our DynamoDB table for all data and returns the results. Once again, this is a very basic implementation and you will need to customize the fetch options according to your needs. But the fundamental idea remains to provide a convenient interface for retrieving the data processed by the Lambda architecture.

6. Points of Attention

Implementing a Lambda architecture with AWS may seem straightforward on the surface, but it presents specific challenges that require particular attention:

Database Throttling: With AWS DynamoDB, it is crucial to understand and manage the provisioned read and write capacity. An incorrect allocation can lead to throttling, which can slow down your database operations and affect the performance of your system. Remember that DynamoDB allows you to dynamically adjust these capacities according to your needs.
Lambda Execution Time Limits: AWS Lambda has a maximum execution time of 15 minutes. For tasks that require a longer processing time, you may need to reorganize the logic of your application so that it can be executed in several parts, or consider other options, such as transferring these tasks to an EC2 instance.
Service Layer Response Time: The service layer aims to provide a unified and updated view of the data. However, the complexity of merging data from the Batch layer and the Speed layer can lead to delays. Make sure to test and optimize the response time of this layer.
Monitoring: A large volume of processing also results in a large volume of logs. Ensure you set up filters and alarms to not miss errors in the logs.

7. Conclusion

The Lambda architecture isn't a universal solution for all use cases. However, when it comes to managing a large volume of incoming data and providing real-time analysis results, it often emerges as the most suitable option.

Measuring Execution Time in Node.js: The Decorator Approach

February 1, 2023 · 2 min read

Samuel Joset

Back End Engineer @ Forge The Web

Hello everyone! Have you ever wondered, "How much time does this code actually take to execute?". If the answer is no... Well, you're going to start today! Measuring the execution time of an application is often necessary for considering improvements or for writing relevant logs.

Today, we are going to see how to obtain this information painlessly. And it's through a decorator that we are going to do it. Let's get started!

1. Our Tool: The Function Decorator

In a previous article, I already explained why decorators are a method of choice to enhance the capabilities of a function. We can add features to a function without modifying the initial code.

We are going to leverage this concept to create a simple decorator that will allow us to measure the execution time of our functions.

Let's take a look at our decorator:

const measureExecutionTime = (fn) => {
    return (...args) => {
        const start = Date.now();
        const result = fn(...args);
        const end = Date.now();
        return { result, executionTime: end - start // time in ms };
    };
};

Now, let's see how to apply it to an existing function:

const add = (a, b) => a + b;

const addWithExecutionTime = measureExecutionTime(add);

console.log(addWithExecutionTime(5, 7)); // displays { result: 12, executionTime: <execution time> }

In this example, we have an add function that adds two numbers. With our measureExecutionTime decorator, we get a new function, addWithExecutionTime. This improved function still adds two numbers, but it also returns the time it took to execute.

2. Handling Different Function Types: Synchronous and Asynchronous

The function we created previously is simple and does the job, however, it has a problem: it does not handle asynchronous functions.

To remedy this issue, we will also create a second function that can be used to measure the execution time of an asynchronous function:

const measureExecutionTimeAsync = (fn) => {
    return async (...args) => {
        const start = Date.now();
        const result = await fn(...args);
        const end = Date.now();
        return { result, executionTime: end - start // time in ms };
    };
};

This function is very similar to the first one. The only difference being that the new function returns a promise instead of a classic data type.

3. Conclusion

The article ends here. This function is simple, but don't confuse utility with complexity. It will be useful to you in all your Node.js development.

Ensure Immutability : The Decorator Approach

January 1, 2023 · 3 min read

Samuel Joset

Back End Engineer @ Forge The Web

Introduction

Who hasn't encountered those insidious bugs caused by variables or objects mysteriously changing their values during code execution? I won't even ask if this has happened to you, as the answer is easy to guess.

I would not say that variable mutability is a bad thing. In fact, it can be very useful to save memory when operating in an environment with limited RAM, such as on a Raspberry Pi or an embedded device.

However, it can also lead to undesirable side effects and make our code harder to understand and debug. That's why immutability is a fundamental principle in many programming languages and paradigms, including functional programming!

In a previous article, we explored the many advantages offered by immutability. We also learned how to extend the functionalities of our functions using decorators.

Now, we're going to combine these two concepts and see how to make an initially mutable program, immutable with the help of a decorator. Are you ready? Let's go!

2. The Immutability Decorator

This article gets straight to the point. We will create a decorator that turns mutable functions into immutable ones, explain how it works, its usefulness, and its limitations.

To remind you, a decorator is a function that takes another function as an argument and returns a new function that extends the functionality of the original one. In our case, our decorator will deeply copy all the parameters of the original function (thus ensuring their immutability) before executing it. For simplicity, we will use Ramda's clone function to make the copies:

const R = require('ramda');

const makeImmutable = (fn) => {
    return (...args) => {
        const deepCopiedArgs = args.map(arg => R.clone(arg));
        return fn(...deepCopiedArgs);
    };
};

In this way, even if fn modifies its arguments, it will not affect the original objects passed in as parameters. This is a way to get immutable versions of our functions while keeping the original functions if needed. The code is elegant and reusable.

3. Application

Now, let's see how to apply our new decorator to an existing function. To illustrate this, we will focus on the classic case of bubble sort. This is a well-known algorithm that, in its simplest version, modifies the array it is sorting. However, thanks to our decorator, we are going to teach it good manners and ensure that it preserves the immutability of the parameters it receives.

Let's start by creating our bubble sort function:

const bubbleSortMutable = (array) => {
    let n = array.length;
    for(let i = 0; i < n-1; i++) {
        for(let j = 0; j < n-i-1; j++) {
            if(array[j] > array[j+1]) {
                // Swap array[j] and array[j+1]
                let temp = array[j];
                array[j] = array[j+1];
                array[j+1] = temp;
            }
        }
    }
    return array;
};

This function will correctly sort our array, but by modifying it. To solve this problem, we will apply our makeImmutable decorator to our bubble sort function:

const bubbleSortImmutable = makeImmutable(bubbleSortMutable);

Our bubbleSortImmutable function will do exactly the same task as bubbleSortMutable, BUT without modifying the original array.

let numbers = [5, 2, 9, 1, 5, 6];
console.log(bubbleSortImmutable(numbers)); // displays [1, 2, 5, 5, 6, 9]
console.log(numbers); // displays [5, 2, 9, 1, 5, 6]

Mission accomplished! Even after sorting our array with bubbleSortImmutable, our original array remains unchanged.

4. Advanced Uses and Limits of makeImmutable

Now that we have covered the basics of our makeImmutable decorator, it's time to take a look at some of its more advanced uses and limitations.

4.1. Advanced Uses

Beyond protecting against undesirable mutability, our makeImmutable decorator can be used to create "snapshots" of an object's state before and after a function's execution. This can prove invaluable in testing environments, when we want to verify whether a function alters an object unexpectedly.

4.2. Limits of makeImmutable

While the makeImmutable decorator is a powerful tool for managing mutability, it has its limitations. For example, it cannot prevent a function from modifying global objects or object properties that are accessible through other means (such as through the use of this in JavaScript).

5. Conclusion

We have come to the end of our article. Rest assured, no arrays were harmed during this demonstration. The function we created is very useful for applying the concept of immutability in a simple way within our program. However, it can never replace a well-thought-out and organized code architecture. Therefore, it is essential to know when and where to use it properly.

Baby steps to functional programming - Understanding Higher-Order Functions

November 1, 2022 · 3 min read

Samuel Joset

Back End Engineer @ Forge The Web

In this post, we'll explain what are higher-order function and explore the how to use the most common of them.

1. What are Higher-Order Functions?

A higher-order function (or a HOC) is a function that can take another function as an argument or return a function as its result.

It is only possible with some programmation languages where functions can be treated like any other value. In these cases we say that functions are first-class citizens in this programming language. For example, there treated as first-class citizens in Javascript. Thanks to that we can create higher-order function.

Some examples of higher-order functions in JavaScript include map, filter, and reduce. These functions take another function as an argument and use it to transform or filter an array.

2. Common Higher-Order Functions

JavaScript has several built-in higher-order functions that are commonly used. A good start to understand and use HOC in our daily coding sessions is to get familiar with the basic HOCs. The most common of them are map, filter and reduce. Let's see what they do and how to use them.

2.1. Map

The map function is used to transform an array by applying a function to each of its elements and returning a new array of the same length with the transformed values. The original array is not modified. Here's an example:

const numbers = [1, 2, 3, 4, 5];

const doubled = numbers.map((num) => num * 2);

console.log(doubled); // [2, 4, 6, 8, 10]

In this example, the map() function is used to double each element in the numbers array and return a new array containing the transformed values.

The powerfull side of this function is that you can easily apply a function to a list of item instead of one. It is way more concise and readable than foreach. And as you may know less code means less place of bugs.

2.2. Filter

The filter() function is used to create a new array that contains only the elements from the original array that pass a given test. An element pass the given test only if the function used returns true:

const numbers = [1, 2, 3, 4, 5];

const evenNumbers = numbers.filter((num) => num % 2 === 0);

console.log(evenNumbers); // [2, 4]

In this example, the filter() function is used to create a new array containing only the even numbers from the numbers array.

2.3. Reduce

The reduce function is used to agragate some or all of the elements of an array into a single value. It takes two arguments: a reducer function and an initial value. The reducer function is applied to each element of the array, accumulating a value that is returned at the end. Here's an example:

const numbers = [1, 2, 3, 4, 5];

const sum = numbers.reduce((acc, curr) => acc + curr, 0);

console.log(sum); // 15

In this example, the reducer function takes two arguments: acc (short for "accumulator") and curr (short for "current"). The acc value starts at 0 and is updated with the sum of each element in the array.

These three HOCs are among the most commonly used in functional programming. When well used and they can significantly simplify your code and improve its readability.

3. Conclusion

Like pure function, understanding and using higher-order functions is an essential aspect of functional programming as we use them in every more advanced concepts.

We have covered some of the most commonly used higher-order functions. They are useful in so many cases that you can use them every day. I personally use them daily.

Once you'll get comfortable with them you will be able to tackle some more advanced functional programming concepts.

Baby steps to functional programming - Pure function

October 1, 2022 · 3 min read

Samuel Joset

Back End Engineer @ Forge The Web

Pure functions are a fundamental concept in functional programming, and they play a crucial role in developing high-quality, maintainable code.

The purpose of this article is get familliar to the concept of pure functions.

1. What are Pure Functions?

Pure functions are functions that have the following characteristics:

Pure functions always produce the same output given the same input. They have no side effects and do not depend on any external state or global variables.
Pure functions have no observable effects outside of their own scope. They do not modify any external state, like changing the value of a variable or updating the DOM.
Pure functions are referentially transparent, meaning that they can be replaced with their return value without affecting the behavior of the program.

1.1. Examples of Pure Functions:

// returns the sum of two numbers.
const add = (a, b) => a + b;

// returns the product of two  
const multiply = (a, b) => a * b;

// returns a greeting message
const greet = (name) => `Hello, ${name}!`;

As you can see, each of these functions takes input and returns output without modifying any external state.

1.2. Pure VS Impure Functions:

In contrast to pure functions, impure functions have side effects or depend on external state, which can make them less reliable and harder to reason about. Here is an example of an impure function:

let counter = 0;

/**
 * @function increment - Impure function that increments a counter and returns its value.
 * @returns {number} - The updated value of the counter.
 */
const increment = () => {
  counter++;
  return counter;
};

The function increment modifies the external state of the counter variable each time it is called, making it impure. It is also not referentially transparent, as replacing the function with its return value would not increase the counter.

2. Benefits of Pure Functions

Overall, the benefits of pure functions make them a valuable tool for writing maintainable, testable, and reusable code. By striving to write pure functions whenever possible, we can build more reliable and robust applications.

2.1. Predictability and Reliability

Since pure functions always produce the same output for the same input, they are predictable and reliable. This makes it easier to reason about the behavior of the code, reducing the risk of introducing bugs.

2.2. Testability

Pure functions are also easier to test than impure functions. Since they do not depend on external state, we can pass any input we want and expect the same output every time. This makes it easy to write automated tests to ensure that the function works as expected. Here's an example using the multiply function:

/**
 * Returns the result of multiplying two numbers
 */
const multiply = (a, b) => a * b;


// Tests using jest
describe('multiply', () => {
  it('should return the product of two numbers', () => {
    expect(multiply(2, 3)).toBe(6);
    expect(multiply(-1, 5)).toBe(-5);
    expect(multiply(0, 10)).toBe(0);
    expect(multiply(2.5, 4)).toBe(10);
  });
});

These tests ensure that the multiply function returns the correct result for a variety of inputs. By testing the function in isolation, we can be confident that any errors are the result of a problem with the function itself, rather than external factors.

2.3. Reusability and Composability

Pure functions are also highly reusable and composable. Since they do not depend on external state, they can be used in a wide variety of contexts without fear of unexpected side effects.

3. Conclusion

Pure functions are an important concept in functional programming and can offer numerous benefits to the development process. By creating functions that are deterministic, reliable, and easy to reason about, developers can write high-quality, maintainable code that is less error-prone and more performant.

By creating pure functions, developers can also take advantage of features like referential transparency and immutability, which can improve the readability and efficiency of their code. Additionally, the testability of pure functions allows developers to catch errors early in the development process, reducing the time and effort required for debugging.

Overall, understanding and utilizing pure functions is a valuable skill for any developer.

1. Understanding Lambda Architecture​

2. AWS DynamoDB, Lambda, and Event Bridge to the Rescue​

3. Building the Batch Layer with AWS DynamoDB​

3.1. Retrieving Data from DynamoDB​

3.2. Batch Processing of Data​

3.3. Storing the Processed Data​

3.4. Configuring the Lambda Trigger with AWS EventBridge​

4. Configuring the Speed Layer with AWS Lambda​

5. Creation of the Service Layer​

6. Points of Attention​

7. Conclusion​

1. Our Tool: The Function Decorator​

2. Handling Different Function Types: Synchronous and Asynchronous​

3. Conclusion​

Introduction​

2. The Immutability Decorator​

3. Application​

4. Advanced Uses and Limits of makeImmutable​

4.1. Advanced Uses​

4.2. Limits of makeImmutable​

5. Conclusion​

1. What are Higher-Order Functions?​

2. Common Higher-Order Functions​

2.1. Map​

2.2. Filter​

2.3. Reduce​

3. Conclusion​

1. What are Pure Functions?​

1.1. Examples of Pure Functions:​

1.2. Pure VS Impure Functions:​

2. Benefits of Pure Functions​

2.1. Predictability and Reliability​

2.2. Testability​

2.3. Reusability and Composability​

3. Conclusion​

1. Understanding Lambda Architecture

2. AWS DynamoDB, Lambda, and Event Bridge to the Rescue

3. Building the Batch Layer with AWS DynamoDB

3.1. Retrieving Data from DynamoDB

3.2. Batch Processing of Data

3.3. Storing the Processed Data

3.4. Configuring the Lambda Trigger with AWS EventBridge

4. Configuring the Speed Layer with AWS Lambda

5. Creation of the Service Layer

6. Points of Attention

7. Conclusion

1. Our Tool: The Function Decorator

2. Handling Different Function Types: Synchronous and Asynchronous

3. Conclusion

Introduction

2. The Immutability Decorator

3. Application

4. Advanced Uses and Limits of makeImmutable

4.1. Advanced Uses

4.2. Limits of makeImmutable

5. Conclusion

1. What are Higher-Order Functions?

2. Common Higher-Order Functions

2.1. Map

2.2. Filter

2.3. Reduce

3. Conclusion

1. What are Pure Functions?

1.1. Examples of Pure Functions:

1.2. Pure VS Impure Functions:

2. Benefits of Pure Functions

2.1. Predictability and Reliability

2.2. Testability

2.3. Reusability and Composability

3. Conclusion