1. Welcome to our site! Electro Tech is an online community (with over 170,000 members) who enjoy talking about and building electronic circuits, projects and gadgets. To participate you need to register. Registration is free. Click here to register now.
    Dismiss Notice

Parsing message packets using Finite State Machine

Discussion in 'Code Repository' started by misterT, May 6, 2013.

  1. misterT

    misterT Well-Known Member Most Helpful Member

    Joined:
    Apr 19, 2010
    Messages:
    2,697
    Likes:
    368
    Location:
    Finland
    Recommended before this: http://www.electro-tech-online.com/threads/finite-state-machine-basics.134426/

    Many embedded systems, like hobby robot projects, communicate with a PC or a sensor via serial communication. Comercial sensors and devices send their data in a well defined package format. It is also good design practice to use well formatted data packets in custom embedded projects. The packet format can be very simple.

    Parsing packets in one function works in the simplest cases, but can be hard to maintain and expand if more comlicated that one or two bytes at a time. The solution becomes fast clumsy and ugly.

    One solution to this problem is to write a parser module that handles the parsing. There are many ways to parse data, this one uses a function pointer to implement simple Finite State Machine. It is non-blocking and runs when the microcontroller is "idle". It could also use uart interrupt to parse each received character when it arrives.

    The last CODE-box contains the complete code. It can be copy pasted to your favorite editor for easy reading. The code is not intended to be used "as is". It is intended to demonstrate the concept, but It should be easy to adapt for any project.

    The example scenario

    this parser is designed to read data packets in the following format:

    #[ID][LENGTH][DATA][CRC]$

    Where:
    # = Start of message char, length 1 byte
    [ID] = 3 character identifier, 3 bytes
    [LENGTH] = Length of data field in bytes, 32 bit unsigned int, Big Endian
    [DATA] = Data itself
    [CRC] = 16 bit CRC checsum
    $ = End character


    We assume that the data comes from UART one byte at a time. We also assume that the communication is properly initialized and we have following functions at our use:

    Code (C):

    /* Reads a single character from uart buffer.
       This is a blocking function, it waits until character can be read. */

    uint8_t uart_readchar();

    /* Returns the number of bytes that can be read from uart */
    uint8_t uart_canread();
     
    In this example we imagine that we are receiving data to control a mobile robot. We are expecting three different types of data:

    ID: "SPD", Speed
    ID: "STR", Steering angle
    ID: "DTA", Request to send some kind of data back (what kind of data is defined in the packet [DATA] field)

    When we receive the [ID] field of the packet, we need to compare its value in order to find out what kind of data we have received. We do not want to do inefficient and clumsy string compare, instead we read the ID characters into 32 bit unsigned integer and compare that to a constant. To help us create these constants easily we have a macro:

    Code (C):

    /* This macro combines four bytes into one 32 bit unsigned integer */
    #define IDENTIFIER(a,b,c,d)    ((((uint32_t)a)<<(3*8)) + \
                                    (((uint32_t)b)<<(2*8)) + \
                                    (((uint32_t)c)<<(1*8)) + \
                                    (((uint32_t)d)<<(0*8)))


    /* This is how we would compare if a received identifier is a "DTA" */
    if (data_id == IDENTIFIER('D','T','A',0)) { /* Do something */}
     
    Notice how the identifier is "left adjusted" so that the first character is in the most significant byte of the integer and the LSB is always zero. this is a little optimization. The MSB is compared first and if the test fails there is no need to thest the other bytes. if the MSB would be always zero, we would always do a useless comparison.

    We also need some storage space for our received data. this can be done in many ways, but for this examle we simply declare some (global) variables:

    Code (C):

    #define BUFFER_SIZE 20
    uint8_t  data_buffer[BUFFER_SIZE]; // This needs to be large enough to hold the largest data packet we expect.
    uint32_t data_id;        // Data ID
    uint32_t data_length;    // Data Length
    uint32_t data_crc;       // Data Length
     

    The Finite State Machine

    The heart of our FSM is a function pointer along with the state functions. We declare them like this:

    Code (C):

    /* State machine states */
    void wait_for_start(void); // Waits for the # character
    void parse_id(void);       // Reads the 3 char identifier, stores it in data_id
    void parse_length(void);   // Reads [LENGTH] field, stores it in data_length
    void parse_data(void);     // Reads the data into data_buffer[]
    void parse_crc(void);      // Reads the crc checksum (and checks the data)
    void parse_end(void);      // End of data packet handling.. start all over.

    /* Pointer to a function that takes no parameters and returns nothing.
       We initialize the pointer to point to first state. */

    void (*parse_next)(void) = wait_for_start;
     
    We could call our function pointer directly to reduce nested function calls, but because parsers are many times running only when the microcontroller is idle, it is more elegant to write a wrapper function that calls the pointer. this is also convenient place to do some error checking.

    Code (C):

    /* Function declaration.. should go into module .h file */
    void parse_uart_data(void);

    /* Function definition */
    void parse_uart_data(void) {
        if (parse_next != NULL){ // Check for NULL pointer
            parse_next(); // Call the state function
        }
    }
     
    The above code is practically all we need to keep our state machine running. The state functions update the parse_next function pointer to the next state. It all starts from the wait_for_start function. Before I list all the state functions, I will show how the parser is used in the main function.

    Code (C):

    /* Inlcude the parser module */
    #include "parser.h"

    /* The main entry point */
    void main(void) {
       
        /* Initialize uart communications etc. */
            // omitted..
       
        /* Infinite loop. Sometimes called the Idle loop */
        while(1) {
           
            /* Check if data is available in uart */
            if (uart_canread()) {
               
                /* Call the parser */
                parse_uart_data();
               
                /* If we use a flag to indicate that we have received
                   a complete data packet, we should handle it here. */

            }
        }
    }
     
    Next I list all the state functions. I hope the comments in the code explain what is going on. Notice that I do the "uart_canread()" -check in all the states. this is redundant and could be removed, but because this is designed as reusable module, I leave them in. First state waits for the start character:

    Code (C):

    /* Waits for the # character */
    void wait_for_start(void) {
        /* If we do not have data available, we return */
        if (!uart_canread()){ return; }
       
        /* If the data is start character, we update the state */
        if (uart_readchar() == '#'){
            /* Initialize data variables, just in case */
            data_id = 0;
            data_length = 0;
            data_crc = 0;
            parse_next = parse_id; // Next we parse the [ID] field
        }
    }
     
    The above function is called until a start character # is received. After that the state transitions to read the data ID. We use a static variable to count received bytes. this could be done using a clobal variable also, but I like keep the scope of my variables as small as possible. Notice how the single bytes are received and shifted so that the first id-character ends up in the most significant byte.

    Code (C):

    /* Reads the 3 char identifier, stores it in data_id */
    void parse_id(void) {
        static uint8_t count=0; // Counts the received data bytes
       
        /* If we do not have data available, we return */
        if (!uart_canread()){ return; }
       
        /* Add the character and move to the left
           At the end we end up with integer containing the ID from left to right. Last byte zero */

        data_id += uart_readchar();
        data_id <<= 8;
        count++;
       
        /* State transition rule */
        if (count == 3){
            count = 0; // reset counter
            parse_next = parse_length; // Next data is the LENGTH field
        }
    }
     
    We could check here if the ID we received is a proper one, and if not, we would set the next state to wait_for_start. We assume that this scenario rarely happens so it is more efficient not to do this check. However, if we do get corrupted data and the next four bytes (the data_lenght bytes) form a big number, we will waste time receiving buffer full of carbage. Next up is the LENGTH field:

    Code (C):

    /* Reads [LENGTH] field, stores it in data_length */
    void parse_length(void) {
        static uint8_t count=0; // Counts the received data bytes
       
        /* If we do not have data available, we return */
        if (!uart_canread()){ return; }
       
        /* Add received byte to the data_length variable.
           The bytes arrive in Big Endian order. */

        data_length += uart_readchar();
        count++;
       
        /* State transition rule */
        if (count == 4){
            count = 0; // reset counter
            parse_next = parse_data; // Next data is the DATA itself
            return;
        }
       
        /* Move the received so that it ends up in right place */
        data_length <<= 8;
    }
     
    Big Endian order means that the most significant byte arrives first. Finally we are ready to receive the [DATA] itself:

    Code (C):

    /* Reads the data into data_buffer[] */
    void parse_data(void) {
        static uint8_t count=0; // Counts the received data bytes
       
        /* If we do not have data available, we return */
        if (!uart_canread()){ return; }
       
        /* Add received byte to the data_length variable.
           The bytes arrive in Big Endian order. */

        data_buffer[count] = uart_readchar();
        count++;
       
        /* If we run out of buffer memory, we cant handle that packet.
           Only thing we can do is try again with the next packet */

        if (count > BUFFER_SIZE) {
            count = 0;
            parse_next = wait_for_start; // Wait for the next packet
        }
       
        /* State transition rule */
        if (count == data_length){
            count = 0; // reset counter
            parse_next = parse_crc; // Next data is the LENGTH field
            return;
        }
    }
     
    CRC checksum state is very similar to the others.

    Code (C):

    /* Reads the crc checksum (and checks the data) */
    void parse_crc(void) {
        static uint8_t count=0; // Counts the received data bytes
       
        /* If we do not have data available, we return */
        if (!uart_canread()){ return; }
       
        /* Add received byte to the data_length variable.
           The bytes arrive in Big Endian order. */

        data_crc += uart_readchar();
        count++;
       
        /* State transition rule */
        if (count == 2){
            count = 0; // reset counter
           
            /* At this point we do the CRC check,
               if it fails, we transition to wait for next package */

            if (!check_crc()) { parse_next = wait_for_start; return; }
           
            parse_next = parse_end; // End of parsing
            return;
        }
       
        /* Move the received so that it ends up in right place */
        data_crc <<= 8;
    }
     
    There are many option how the final state should handle incoming (received) data. It could raise a flag to indicate a proper package has arrived, or it can handle the data itself. Because our example case is so simple, we choose to handle the data right away:

    Code (C):

    /* End of data packet parsing, handle data and start all over. */
    void parse_end(void) {
        /* If we do not have data available, we return */
        if (!uart_canread()){ return; }
       
        /* If the data is not the end character, we bail out at the last minute,
           This can be kind of silly if we have gone this far, but if the last
           character is not correct.. something wrong is going on. */

        if (uart_readchar() != '$'){
            parse_next = wait_for_start; // Start all over
        }
       
        /* Handle data */
        if (data_id == IDENTIFIER('S','T','R',0)) {
            /* Call the function that handles this type of package */
        }
        else if (data_id == IDENTIFIER('S','P','D',0)) {
            /* Call the function that handles this type of package */
        }
        else if (data_id == IDENTIFIER('D','T','A',0)) {
            /* Call the function that handles this type of package */
        }
       
        parse_next = wait_for_start; // Start all over
    }
     
    One major optimization to this parser would be to use data buffer between uart and the parser. The uart would write incoming data to the buffer and the parser would read from the buffer. this way the parser could wait until there is multiple bytes available in the buffer, and instead of repeatedly calling state functions for every single bytes, it could call every state only once for every package.

    The complete code:

    Code (C):

    #include <stdlib.h>
    #include <stdint.h>

    /* Reads a single character from uart buffer.
       This is a blocking function, it waits until character can be read. */

    uint8_t uart_readchar();

    /* Returns the number of bytes that can be read from uart */
    uint8_t uart_canread();

    /* This macro combines four bytes into one 32 bit unsigned integer */
    #define IDENTIFIER(a,b,c,d)    ((((uint32_t)a)<<(3*8)) + \
                                    (((uint32_t)b)<<(2*8)) + \
                                    (((uint32_t)c)<<(1*8)) + \
                                    (((uint32_t)d)<<(0*8)))


    /* This is how we would compare if a received identifier is a "DTA" */
    //  if (data_id == IDENTIFIER('D','T','A',0)) { /* Do something */}

    /* Global variables */
    #define BUFFER_SIZE 20
    uint8_t  data_buffer[BUFFER_SIZE]; // This needs to be large enough to hold the largest data packet we expect.
    uint32_t data_id;        // Data ID
    uint32_t data_length;    // Data Length
    uint32_t data_crc;       // Data Length

    /* State machine states */
    static void wait_for_start(void); // Waits for the # character
    static void parse_id(void);       // Reads the 3 char identifier, stores it in data_id
    static void parse_length(void);   // Reads [LENGTH] field, stores it in data_length
    static void parse_data(void);     // Reads the data into data_buffer[]
    static void parse_crc(void);      // Reads the crc checksum (and checks the data)
    static void parse_end(void);      // End of data packet handling.. start all over.

    /* Pointer to a function that takes no parameters and returns nothing.
       We initialize the pointer to point to first state. */

    void (*parse_next)(void) = wait_for_start;

    /* Function declaration.. should go into module .h file */
    void parse_uart_data(void);


    /******************************************************************************/
    /* PUBLIC FUNCTIONS ***********************************************************/



    /* Entrypoint for the parser,
       this is only function that needs to be called by the user */

    void parse_uart_data(void) {
        if (parse_next != NULL){ // Check for NULL pointer
            parse_next(); // Call the state function
        }
    }
    /******************************************************************************/



    /******************************************************************************/
    /* PRIVATE FUNCTIONS **********************************************************/



    /* Reads the 3 char identifier, stores it in data_id */
    void parse_id(void) {
        static uint8_t count=0; // Counts the received data bytes
       
        /* If we do not have data available, we return */
        if (!uart_canread()){ return; }
       
        /* Add the character and move to the left
           At the end we end up with integer containing the ID from left to right. Last byte zero */

        data_id += uart_readchar();
        data_id <<= 8;
        count++;
       
        /* State transition rule */
        if (count == 3){
            count = 0; // reset counter
            parse_next = parse_length; // Next data is the LENGTH field
        }
    }
    /******************************************************************************/



    /* Reads [LENGTH] field, stores it in data_length */
    void parse_length(void) {
        static uint8_t count=0; // Counts the received data bytes
       
        /* If we do not have data available, we return */
        if (!uart_canread()){ return; }
       
        /* Add received byte to the data_length variable.
           The bytes arrive in Big Endian order. */

        data_length += uart_readchar();
        count++;
       
        /* State transition rule */
        if (count == 4){
            count = 0; // reset counter
            parse_next = parse_data; // Next data is the DATA itself
            return;
        }
       
        /* Move the received so that it ends up in right place */
        data_length <<= 8;
    }
    /******************************************************************************/



    /* Reads the data into data_buffer[] */
    void parse_data(void) {
        static uint8_t count=0; // Counts the received data bytes
       
        /* If we do not have data available, we return */
        if (!uart_canread()){ return; }
       
        /* Add received byte to the data_length variable.
           The bytes arrive in Big Endian order. */

        data_buffer[count] = uart_readchar();
        count++;
       
        /* If we run out of buffer memory, we cant handle that packet.
           Only thing we can do is try again with the next packet */

        if (count > BUFFER_SIZE) {
            count = 0;
            parse_next = wait_for_start; // Wait for the next packet
        }
       
        /* State transition rule */
        if (count == data_length){
            count = 0; // reset counter
            parse_next = parse_crc; // Next data is the LENGTH field
            return;
        }
    }
    /******************************************************************************/



    /* Reads the crc checksum (and checks the data) */
    void parse_crc(void) {
        static uint8_t count=0; // Counts the received data bytes
       
        /* If we do not have data available, we return */
        if (!uart_canread()){ return; }
       
        /* Add received byte to the data_length variable.
           The bytes arrive in Big Endian order. */

        data_crc += uart_readchar();
        count++;
       
        /* State transition rule */
        if (count == 2){
            count = 0; // reset counter
           
            /* At this point we do the CRC check,
               if it fails, we transition to wait for next package */

            if (!check_crc()) { parse_next = wait_for_start; return; }
           
            parse_next = parse_end; // End of parsing
            return;
        }
       
        /* Move the received so that it ends up in right place */
        data_crc <<= 8;
    }
    /******************************************************************************/



    /* End of data packet parsing, handle data and start all over. */
    void parse_end(void) {
        /* If we do not have data available, we return */
        if (!uart_canread()){ return; }
       
        /* If the data is not the end character, we bail out at the last minute,
           This can be kind of silly if we have gone this far, but if the last
           character is not correct.. something wrong is going on. */

        if (uart_readchar() != '$'){
            parse_next = wait_for_start; // Start all over
        }
       
        /* Handle data */
        if (data_id == IDENTIFIER('S','T','R',0)) {
            /* Call the function that handles this type of package */
        }
        else if (data_id == IDENTIFIER('S','P','D',0)) {
            /* Call the function that handles this type of package */
        }
        else if (data_id == IDENTIFIER('D','T','A',0)) {
            /* Call the function that handles this type of package */
        }
       
        parse_next = wait_for_start; // Start all over
    }
    /******************************************************************************/
     
     
    Last edited: May 6, 2013

Share This Page