An RTOS-based Architecture for Industrial Wireless Sensor Network ...

7 downloads 1284 Views 1MB Size Report
Sensor Network Stacks with Multi-Processor. Support. Zhibo Pang1,2, Kan Yu3, Johan Åkerberg1, Mikael Gidlund1. 1, Corporate Research, ABB AB, VästerÃ¥s, ...
An RTOS-based Architecture for Industrial Wireless Sensor Network Stacks with Multi-Processor Support Zhibo Pang1,2, Kan Yu3, Johan Åkerberg1, Mikael Gidlund1 1, Corporate Research, ABB AB, Västerås, Sweden 2, ICT School, Royal Institute of Technology (KTH), Stockholm, Sweden 3, Mälardalen University, Västerås, Sweden {pang.zhibo|johan.akerberg|[email protected]}, [email protected]

Abstract-The design of industrial wireless sensor network (IWSN) stacks requires the adoption of real time operation system (RTOS). Challenges exist especially in timing integrity and multi-processor support. As a solution, we propose an RTOS-based architecture for IWSN stacks with multi-processor support. It offers benefits in terms of platform independency, product life cycle, safety and security, system integration complexity, and performance scalability. An implemented WirelessHART stack has proven the feasibility of the proposed architecture in practical product design. And future challenges as well as suggestions to standard improvement are discussed.

I.

INTRODUCTION

The industrial wireless sensor network (IWSN) has gathered more and more notice in recent years. Superior reliability, determinism, timeliness, and security are emphasized. Since 2007, three standards, the WirelessHART [1], ISA100.11a [2], and WIA-PA [3], have been released based on the IEEE 802.15.4. One of the strong similarities of them is the TDMA (time division multiple access) -based media access mechanism. In the TDMA mechanism, all communications among nodes are allocated and limited within corresponding timeslots. For example, WirelessHART uses 10ms fixed timeslots which allows for maximum packet sizes while still allowing time drift to be technical feasible. ISA100.11a needs flexible time slots to allow for duocast. WIA-PA uses 802.15-4-2006 super-frame with configurable timeslot too. This is essential to reduce the possibility of collision (and thus increase the communication reliability), and to meet the critical requirement of timing determinism of industrial applications. To do this, all the nodes must be synchronized precisely, i.e. the jitter of synchronization should be much smaller than the length of time slot. Also, the stack designer must guarantee that the node can finish everything within the time slot. Such timing critical requirement has become one of the primary challenges to design the protocol stacks. Firstly, it is challenging to finish the complicated tasks (packet parsing, encryption, decryption, authentication, etc.) within such a short timeslot by the processor with limited resource (clock frequency, memory, energy supply, etc.). Secondly, the IWSN stacks are often only a part of the timing critical tasks that the

978-1-4673-4569-9/13/$31.00 ©2013 IEEE

device should execute. It is much more difficult to ensure the timing integrity in a complicated multi-task system. At the same time, the rapidly increasing complexity and other specific requirements of industrial systems have made it necessary to adopt the real time operation system (RTOS) in the IWSN stacks. These requirements include safety, security, availability, support for actuators, system integration, network size, product life cycle, etc. [4]. Furthermore, the use of multiprocessor1 (and/or multi-core) architecture has become another trend in the industrial system design because it is an effective way to improve the system scalability and manage the complexity and cost. The multi-processor support of IWSN stacks is, therefore, needed to follow this trend. Additionally, dedicated high performance chips for IWSN are rare, but low cost (also low performance) IEEE802.15.4 system-on-chips (SoCs) are very common in commercial markets [5]. It is attractive if we can use these low cost commercial chips combining with high performance industrial processors, socalled “low-high combination”. So, the multi-processor support of IWSN stacks is a realistic requirement for the best trade-off between performance and cost. However, the adoption of RTOS and support of multiprocessor have made it more challenging to guarantee the timing integrity. An optimized architecture is needed, but existing study on this topic is insufficient. The benefits and challenges of modular and multiprocessor design for the wireless sensor nodes have been analyzed in [6] and [7], but with a lack of concrete practices on IWSN. In [8], the authors have given a guideline for the WirelessHART stack, but they focused more on the communication architecture rather than implementation practices. In [9], Song et al. have presented an implementation of the stack but have not mentioned the adoption of RTOS. In [5], multiple generations of hardware are presented with more details, but the software architecture of the stack, especially RTOS and multi-processor support, is still not presented in detail. And in their suggestions for chip vendors, there is no consideration of the so-called “low-high combination” hardware. 1 In this paper, the term of “multi-processor” includes both “multiple cores in single chip” and “multiple cores in multiple chips”.

1216

OS Abstraction

Protocol Engine

Data Engine

Memory Abstraction

WSN Stack

Fig. 1. The proposed RTOS-based architecture for IWSN stacks

In this paper, we propose an RTOS-based architecture for IWSN stacks with multi-processor support. This architecture offers significant benefits in terms of platform independency, product life cycle, safety and security, system integration, and performance scalability. An implemented WirelessHART stack has proven the feasibility of the proposed architecture in practice and challenges are discussed as well. The rest of the paper is organized as follows: the proposed architecture is introduced in section II; the case study of WirelessHART and experimental results are presented in section III; and conclusions and future work are in section IV. II. THE PROPOSED ARCHITECTURE A. The proposed RTOS-based architecture The basic view of the proposed architecture of IWSN stack is illustrated in Fig. 1. The stack comprises two major parts: the stack core and the platform abstraction layer (PAL). The stack core is the body of the protocol which is platform independent. It is composed of a Data Engine for zero-copy data management, a Protocol Engine to handle inter-layer communication, and all the layers specified by the standard. The PAL is an abstraction of the implementation platform including the memory, radio frequency transceiver, and RTOS. So an instance of a PAL is platform dependent. By separating the PAL from the stack core, the stack core is isolated from the implementation platform. In most industrial applications this is important since the communication stack is normally only a small part of the total device application and the design cannot be designed according to the communication stacks. In order to port the stack from one platform to another (e.g. to use new hardware or RTOS or reuse the implementation in new products), we only need to modify the PAL instead of the stack core. This could significantly simplify the porting work which is important to prolong the product lifecycle of the industrial systems as aforementioned. The first principle of the architecture is that, the communication and information access interfaces between layers are normalized, so-called the normalized inter layer interface (NILI). The NILI is a common template for the implementation of all the layers. It specifies common attributes

and methods that all the layers should implement. In object oriented programming (OOP) environment like C++, this can be easily realized by inheritance: the NILI is implemented in the protocol engine as a basic class; the other layers are implemented as derived classes; a part of member functions of the base class can be overloaded to perform layer-specific functions; and the member functions for the inter-layer communication and information access are consistent for all the layers. It is valuable to mention that, the hardware abstraction layer (HAL) is a part of the “platform” instead of a part of the “stack”. As both the hardware platform and the RTOS are getting more and more complicated, they are normally provided by thirdparties. As a result, the drivers in HAL are designed by these parties instead of the stack designer. Stack designer’s task is, in addition to the stack core itself, to design the PAL based on the application programming interfaces (APIs) provided by the designers of the HAL and RTOS. B. Multi-processor support The multi-processor view of the proposed architecture is illustrated in Fig. 2. The layers in the stack core are partitioned into multiple parts allocated in different processors. Between every two adjacent parts, a pair of serialization layer is inserted. Correspondingly, the hardware, driver, and abstraction of interprocessor communication (IPC) interface are added. Fig. 2 is an example for two processors, but the proposed architecture is suitable for any other number of cores. To do this in the OOP environment, we just need to create more objects of the serialization layer, and insert them in required places. The second principle of the architecture is that, the serialization layer works as a transparent channel in between the two neighbor layers. Here, “transparent” means, if the two neighboring layers are allocated in the same processor after removing the serialization layer, they can work properly without any modification. This in realized by the NILI mechanism. In particular, the serialization layer is also based on the NILI, exactly the same as all other layers. One protocol layer interacts with the serialization layer exactly in the same way when it interacts with its neighbor layer without the serialization layer. In the case of multiple cores in single chip, the IPC hardware is the on-chip interconnection networks; and in the case of multiple cores in multiple chips, it is the off-chip interconnection circuits. In the former case, the IPC interface can be expected to have lower power consumption, shorter latency, and higher throughput. But the latter is suitable for the aforementioned “low high combination” hardware (as shown in the case study that we will present in next sections). C. The normalized inter layer interface (NILI) The NILI specifies all the attributes and operations for stack construction and inter layer interaction. In particular, each layer in the stack core has at least one dedicated thread and one dedicated mailbox. All the inter layer interactions are through this mail box and is managed by the RTOS. These interactions

1217

cross-layer information synchronization. To minimize the overhead, they are used flexibly according to the pattern of information updating. 1. 2.

3. 4.

Fig. 2. The multi-processor view of the proposed architecture

are classified into two types and described by the following message formats respectively. 1. To send or receive a packet: packet payload, packet type, priority, timeout, etc. are specified in the message. 2. To query or modify status or configuration of one layer: management command number, target layer, parameter, value, return format, etc. are specified. At the same time, a protocol stack is constructed by setting the UpperLayerPointer and LowerLayerPointer in each layer. The NILI has specified the operations of these pointers including Add() and Remove(). These pointers tell the RTOS to put a message in (or to get a message from) correct layer when the operation of Put() (or Get() of the message is called. D. Cross layer information synchronization To support flexible partitioning of layers, the global variables (accessible by all the layers) are avoided. The only method for one layer to access the information in another layer is through the mailbox of NILI. Obviously, this could reduce the efficiency of operation comparing to directly accessing the global variables. Four mechanisms have been designed for this

Periodical polling: one layer periodically requests the information of another layer. Periodical reporting: one layer periodically reports its information to another layer. This is typically used by the lower layers to report its status to top layer (e.g. the application layer. Comparing to the periodical polling, it produces less traffic. Event-driven notification: one layer reports its information to another layer only when a particular event happens. Instant request: one layer can request the information from another layer whenever it needs, and the target layer responses when it receives the request message.

E. Advantages and challenges The proposed RTOS-based architecture offers the following advantages for IWSN protocol stack design. 1. Increased platform independency: isolated by the PAL, the stack core is platform independent. The stack core could be ported to new hardware and/or RTOS by only modifying the PAL. Thus the product life-cycle can be significantly prolonged. 2. Improved safety and security: if some virtualization platform, like the PikeOS [10], is used, the implementation of IWSN stack can be isolated from the safety/securitycritical codes. The effect of faults or attacks that happen in the stack can be limited within the stack itself, without spreading into the whole system. 3. Simplified system integration: by normalizing the interfaces of protocol layers and the adoption of RTOS, it gets easier to integrate the IWSN stack into a bigger system. 4. Better performance scalability: by supporting multiprocessor hardware, the proposed architecture offers flexible selection of number of cores and chips. Thus, better trade-offs between performance and cost could be reached. 5. Flexible stack mash-up: the feature of “stack construction” of the propose architecture enables more flexible stack “mash-up”. That is, we can combine layers from different standards into on stack if these layers are normalized based on the NILI. This is very useful especially when we design a multi-standard system, or gateway, which is already very common in industrial area. As the expense of the above advantages, new challenges are introduced by the proposed architecture. They will be discussed further through a real design case in next sections. 1. Timing integrity: the RTOS causes some timing overhead to the inter layer interaction and intra layer processing. Although the overhead has been minimized by state-of-theart RTOS, high quality optimization is still necessary for timing critical layers. Moreover, the serialization layer also affects the timing integrity more or less.

1218

Sniffer Software

Sniffer Manager Prototyping Board

Fig. 3. Diagram of a WirelessHART stack based on the proposed architecture

2.

3.

Memory footprint: the RTOS could cause overhead in memory footprint comparing to without RTOS. However the memory size for packet buffers is usually much bigger than this overhead in ISWN standards. So this is not a big issue for IWSN stacks. CPU load: the RTOS the cross layer information synchronization cause extra CPU load in addition to the IWSN protocol itself. III. CASE STUDY AND EXPERIMENTAL RESULTS

The proposed architecture has been applied in an IWSN platform. The platform is required to support multiple existing standards and future standards. Many safety and security requirements are considered to bring the platform into practice. As a case study, we have implemented the WirelessHART stack based on this architecture. A.

Implementation overview

Fig. 3 is the diagram of the implemented WirelessHART stack for the Field Device. The full stack is partitioned onto two processors. The radio processor is based on the low cost chip ST32W108CC from ST Microelectronics. It integrates an IEEE802.15.4-compatible transceiver, a 24MHz ARM CoretexM3 CPU and 16KByte SRAM. Limited by the hardware resources, only lower layers of the stack are allocated in this processor, including the physical layer, data link layer, and the serialization layer. The application processor is based on the ST32F217ZG from the same vendor. The remaining layers of the stack including the application layer, network layer and serialization layer are allocated in this chip. The application processor has a 120MHz ARM Coretex-M3 CPU, 132KByte on-chip SRAM and 1Mbyte on-chip Flash. On the prototyping board, we add a 512KByte SRAM and a 4MByte Flash on board for the application processor. Many other application modules are integrated in the application processor, too, including field buses and device applications.

Fig. 4. The prototype of the implemented WSN stack

The RTOS that we use in both radio processor and application processor is the embOS from SEGGER. The inter processor communication is realized by the high speed UART interfaces which are available in both chips. The HAL library including the drivers for Flash memory, timers, I/Os, UART, and radio transceiver is provided by the chip supplier. The library of embOS for these chips is provided by the RTOS vendor. We implement the PAL by re-packaging the APIs of the HAL and RTOS libraries according to our specification of NILI. B. The prototype system A prototype of the WirelessHART Field Device is implemented. The demonstrator includes the prototyping board, the WiAnalys sniffer software and hardware, and WirelessHART Network Managers from multiple vendors (Fig. 4). The implemented Field Device can interoperate with all these Network Managers. A typical working procedure starts from joining the network, through key distribution and link allocation, until to periodically reporting process data in the burst mode. C. TDMA timing Many efforts have been done to improve the timing integrity of the data link layer and physical layer which are the most timing-critical. As the timing of sending a packet is relatively easier comparing to receiving a packet, we only explain the timing of receiving a packet (Fig. 5). The activities to receive or send a packet is triggered by a slot timer which is synchronized to the network time source. In WirelessHART, the timer interval Tslot is 10ms. Once an interrupt of the slot timer happens, the RTOS issues a signal to the TDMA controller to activate this thread from waiting state. Then the TDMA controller looks up the link table to check if the current slot is for receiving or not. If it is, the TDMA Controller puts a message into the mailbox of its “lower layer”. The RTOS hands over this message to the mailbox of the physical layer because the LowerLayerPointer of data link layer is set to the physical

1219

Fig. 6. Diagram of the equipments to measure the timing integrity

Fig. 5. Sequential diagram and timing definitions of the data link layer and physical layer to receive a packet

layer. Then, the physical layer is activated by the RTOS to handle the message. It starts the radio transceiver to listen on the selected channel for a specified time. If a packet is received before time-out, the physical layer puts the received packet into the mailbox of its “upper layer”. The RTOS hands over this message to the mailbox of the data link layer because the UpperLayerPointer of physical layer is set to the data link layer. Then the data link layer is activated to handle this packet. It first parses the packet and confirms the Message Integrity Code (MIC) by performing the CCM mode (Counter with CBC-MAC (corrected)) AES-128 block cipher. If the MIC is valid, it creates an acknowledgement (ACK) packet by performing the CCM and AES-128 again. Finally, the ACK packet is sent through the radio transceiver. The timing integrity requires the above activities to be finished within 10ms. To guarantee the reliability, some timing margin TRxMargin should be reserved as well. One measure that we take to optimize the timing is to use hardware acceleration for the CCM and AES-128 calculation. Another temporary measure is to call the radio transceiver directly in the data link layer to send the ACK packet, instead of through physical layer. D. Timing measurement As limited by the debugging methods provided by the chip, it is also tricky to evaluate the timing integrity. Common method is to measure the successful rate of packet delivery by the sniffer. But this is only suitable for high level evaluation. To see more timing details, other methods should be found. As shown in Fig. 6 and Fig. 7, the timing integrity is measured by two methods. Firstly, internal events (e.g. the edge of slot timer) are indicated by toggling some output pins. Simultaneously, the envelope of the radio signal is captured by an RF detector. We observe the waveforms at the output pins and the RF detector by an oscilloscope, which is an obvious and real time representation of the timing integrity. Secondly, important timing parameters are calculated in the code and

Fig. 7. Hardware setup to measure the timing integrity

recorded in static variables. We track these variables by the IAR debugger via the “Live Watch” window. As it captures variables only when the RTOS is in idle state, and thus, this method approximately avoids any visible impact on the timing. It is import to know that, the time spent by the CPU to toggle the output pins and to calculate the timing parameters causes errors to the results, so it should be minimized. Comparing to these two methods, single-step tracking or printing through serial port is not suitable because they both cost too much time. E. Measured Results From the captured waveforms (one example is shown in Fig. 8), we have seen that, the timing integrity is acceptable. In particular, the jitter of slot timer is mostly less than 200us which is small enough for WirelessHART Field Device. The packets between the manager and field device are properly transmitted within corresponding slots. From the calculated timing parameters in Table I, we have seen that, the timing margin is quite small. The timing overhead caused by the RTOS is nearly illegible in the total time. Moreover, the time spent by the calculation of CCM and AES128 is minor after the adoption of hardware acceleration. The time spent by the listening (TRfRx) is the major part of the total

1220

UART in a future version. However, this has not been tested so far as limited by the prototype board. IV. CONCLUSION AND FUTURE WORK

1.895

We have proposed an RTOS-based architecture with multiprocessor support for IWSN protocol stacks. The advantages have been proven by a case study of WirelessHART stack that has been implemented on a low cost two processor platform. Besides the issues related to timing integrity, the extra traffic caused by the inter layer interaction should be evaluated and optimized in the future. In the current prototype, we have noticed that, a huge amount of messages are transmitted between layers in addition to the effective packets. The primary reason is, in the WirelessHART standard, all the information in lower layers is managed through HART commands from the application layer. This is not an issue if we can use global variables to share all cross layer information. But in the proposed architecture, these global variables have to be avoided to support multi-processor. Hence, a series of internal messages should be triggered between the application layer and lower layers to execute such a command. Besides the optimization of our architecture, to improve the standards to be more “RTOS and multi-processor friendly” is a feasible strategy. The handson experiences gathered from practical implementations should be feedback to the design and update of the standards. Power consumption is another potential issue. The RTOS and IPC of the proposed architecture may consume extra energy. But the fundamental mechanisms of the protocol itself (e.g. the TDMA and security algorithms) might be the primary obstacle for low power design. So in our future plan, it is expected to be more effective to optimize the power consumption under a broader context, e.g. not only considering the stack implementation but also jointly looking into the protocols and the whole system integration architectures like field buses.

0.129

REFERENCES

Fig. 8. Waveforms of the radio signal envelope and slot timer during a typical round of interaction between the Network Manager and Field Device TABLE I DATA LINK LAYER TIMING FOR RECEIVING PACKETS (PACKET SIZE=100BYTE, WITH HARDWARE ACCELERATION FOR CCM AND AES-128) Symbol

Description and MCU activities

Tslot

Time slot of TDMA RTOS hands over the SlotTimer’s event to the TDMAController; if in Rx slot, the TDMAController sends a request to receive a packet from lower layer (the PhysicalLayer) RTOS hands over the request to the PhysicalLayer The PhysicalLayer receives a packet form RF transceiver and submits to upper layer (the DataLinkLayer) The RTOS hands over the packet to the DataLinkLayer The DataLinkLayer parses the packet and confirm the MIC including the execution of CCM The DataLinkLayer creates an ACK packet including the execution of CCM The DataLinkLayer sends the ACK packet through the RF transceiver Margin of the data link layer timing when receiving packets

Tos1 Tos2 TRfRx Tos3 TDLLRx TCreateACK TRfTxACK TRxMargin

Value (ms) 10.0 0.233 0.150 6.343 0.085 0.614 0.551

[1]

time. In addition, the flying time (it is about 3.2ms for a 100Byte packet) of the packet is much smaller than TRfRx (it is about 6.3ms for a 100Byte packet). It implies that, there is big delay caused by the driver of radio transceiver. As we couldn’t access the code of this driver, it has not been optimized so far, thus the temporary shortcut to transmit the ACK from the data link layer and not from the physical layer. We have observed the impact of the IPC on the timing integrity. When the data rate of UART is set to 500Kbps, the maximum packet rate (when the packet size is 100Byte) that the radio processor can handle (without any distortion of timing integrity and packet loss in the serialization layer) is 4 packets per second. Higher packet rate or lower UART data rate will damage the timing integrity in the data link layer or packet loss in the serialization layer. In this implementation, we have applied the direct memory access (DMA) for the UART interface which has reduced the CPU load to handle the UART interface. We expect this interference could be further minimized when we apply the hardware follow control of

WirelessHART standard, HART Communication Foundation. www.hartcomm.org [2] ISA100 standard, International Society of Automation. www.isa.org/isa100 [3] WIA-PA standard, Chinese Industrial Wireless Alliance. www.industrialwireless.cn/en/ [4] Akerberg, J.; Gidlund, M.; Bjorkman, M. Future research challenges in wireless sensor and actuator networks targeting industrial automation. IEEE Int. Con. on Industrial Informatics (INDIN), 2011, 410-415 [5] Xiuming Zhu; Song Han; Mok, A.; Deji Chen; Nixon, M.. Hardware challenges and their resolution in advancing WirelessHART. IEEE Int. Con. on Industrial Informatics (INDIN), 2011, 416-421 [6] Edmonds, N.; Stark, D.; Davis, J. “MASS: modular architecture for sensor systems”. Int. Sym. Information Processing in Sensor Networks (IPSN), 2005. 393-397. [7] Utz Roedig, Sarah Rutlidge, James Brown, Andrew Scott, “Towards Multiprocessor Sensor Nodes”. ACM Workshop on Hot Topics in Embedded Networked Sensors (HotEmNets '10), 2010 [8] Chen, Deji, Nixon, Mark, Mok, Aloysius, WirelessHART™: Real-Time Mesh Network for Industrial Automation (1st ed.), Springer, 2010 [9] Jianping Song; Song Han; Mok, A.K.; Deji Chen; Lucas, M.; Nixon, M.. WirelessHART: Applying Wireless Technology in Real-Time Industrial Process Control. IEEE Real-Time and Embedded Technology and Applications Symposium, 2008, 377-386 [10] PikeOS, SYSGO AG, http://www.sysgo.com

1221

Powered by TCPDF (www.tcpdf.org)