Blockchain-Based Distributed Software-Defined Vehicular Networks

0 downloads 0 Views 1MB Size Report
Nov 2, 2018 - Permission to make digital or hard copies of all or part of this work for personal ... Currently, Blockchain (BC) has emerged to address the above.
Session: Software-Defined Vehicular Networks

DIVANet’18, October 28-November 2, 2018, Montréal, QC, Canada

Blockchain-Based Distributed Software-Defined Vehicular Networks via Deep Q-Learning Chao Qiu

F. Richard Yu

Fangmin Xu

Beijing Univ. of Posts and Telecomm. Beijing, China [email protected]

Carleton University Ottawa, ON, Canada [email protected]

Beijing Univ. of Posts and Telecomm. Beijing, China [email protected]

Haipeng Yao

Chenglin Zhao

Beijing Univ. of Posts and Telecomm. Beijing, China [email protected]

Beijing Univ. of Posts and Telecomm. Beijing, China [email protected]

ABSTRACT

1

Nowadays, in order to support flexibility, agility, and ubiquitous accessibility among vehicles, software defined networking has been proposed to integrate with vehicular networks, known as software defined vehicular network (SDVN). Due to a variety of data, flows, and vehicles in SDVN, a distributed SDVN is necessary. However, how to reach consensus in distributed SDVN efficiently and safely is an intractable problem. In this paper, we use a permissioned blockchain approach to reach consensus in distributed SDVN. The existing permissioned blockchain has a number of drawbacks, such as low throughput. We virtualize the underlying resources (e.g., computing resources and networking resources), jointly considering the trust features of blockchain nodes to improve the throughput. Accordingly, we formulate view change, computing resources allocation, and networking resources allocation as a joint optimization problem. In order to solve this joint problem, we use a novel deep Q-learning approach. Simulation results show the effectiveness of our proposed scheme.

Nowadays, in order to provide flexibility, agility, and ubiquitous accessibility among vehicles, software defined enabling technology has been proposed to integrate with vehicular networks, which is referred as software defined vehicular networks (SDVN) [8]. Due to the increasing number of vehicles [6], more than one controller in SDVN is needed [12], known as distributed SDVN. However, how to reach consensus among multiple controllers safely and efficiently is challenging in distributed SDVN [15]. Although many traditional methods have been researched to reach consensus among controllers [10, 13], there are some challenges remaining to be solved when used in SDVN, including [11] 1) In traditional methods, messages and signaling are exchanged and verified frequently, which causes significant performance drawbacks for inherent functions of controllers, such as flow routing, vehicular messaging. Thus, a third party consensus method is needed. 2) Safety and liveness do not garner enough attentions in traditional methods. Being the brain of SDVN, controllers may contain confidential and private information. And many security threats have emerged, which aim to exploit the weaknesses of current distributed SDVN. This fact has fueled the needs to explore safer and more dependable consensus methods. 3) Traditional methods are only intended for small to mediumsized networks. With the rapid increase of the number of vehicles, a large-sized consensus method is necessary. Currently, Blockchain (BC) has emerged to address the above challenges. BC is a distributed ledger to record transactions, where untrusted individuals can interact with each other in a verifiable manner. In distributed SDVN, we use blockchain to interconnect multiple controllers in a distributed blockchain manner. Specially, it is a permissioned BC [2]. Here, permissioned BC acts as a trusted third party to record network-wide views, e.g., network events, OpenFlow commands, and disperses them among multiple controllers safely and dependably. In permissioned BC, practical Byzantine fault tolerance (PBFT) protocol is used to maintain system resilience against Byzantine failures by replication [4]. This extensive all-to-all messaging makes PBFT robust, but causes performance drawbacks. Its throughput is limited by many aspects, including 1) the reliability of blockchain nodes; 2) the computing capability of PBFT system; and 3) the networking capability of PBFT system. Thus, the underlying resources

CCS CONCEPTS • Theory of computation → Reinforcement learning; Sequential decision making;

KEYWORDS Software defined vehicular networks, blockchain, Byzantine fault tolerance, throughput, deep Q-learning. ACM Reference Format: Chao Qiu, F. Richard Yu, Fangmin Xu, Haipeng Yao, and Chenglin Zhao. 2018. Blockchain-Based Distributed Software-Defined Vehicular Networks via Deep Q-Learning. In 8th ACM International Symposium on Design and Analysis of Intelligent Vehicular Networks and Applications (DIVANet’18), October 28-November 2, 2018, Montreal, QC, Canada. ACM, New York, NY, USA, 7 pages. https://doi.org/10.1145/3272036.3272040 Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. DIVANet’18, October 28-November 2, 2018, Montreal, QC, Canada © 2018 Association for Computing Machinery. ACM ISBN 978-1-4503-5964-1/18/10…$15.00 https://doi.org/10.1145/3272036.3272040

8

INTRODUCTION

Session: Software-Defined Vehicular Networks

DIVANet’18, October 28-November 2, 2018, Montréal, QC, Canada

(e.g., compute, network) can be abstracted, virtualized, and managed globally [14], joint considering all related aspects, so as to improve the throughput of permissioned BC. In this paper, a permissioned BC is used to reach consensus in distributed SDVN. Since lots of data needs to be synchronized in SDVN, and the throughput of BC system is limited by many aspects, we virtualize and jointly consider them to enhance the throughput, i.e., the trust features of nodes in BC, the computing capability of BC, and the networking capability of BC. Accordingly, we formulate view change, computing resources allocation, and networking resources allocation as a joint optimization problem. This joint problem is hard to be solved by traditional method, due to its high dimension and complexity [7]. Thus, we use a novel deep Q-learning approach to solve this problem. Finally, some simulations are taken to show the effectiveness of our proposed scheme. The rest of this paper is organized as follows. In Section 2, multiple controllers are interconnected in a blockchain manner, and detailed consensus steps are presented. Section 3 gives system model. In Section 4, problem formulation is given, where deep Q-learning is used to solve this problem. Some simulation results are presented and discussed in Section 5. Finally, we conclude this article, and present some future works in Section 6.

2

Figure 1: The structure of BC-based distributed SDVN with the virtualization of computing and networking. Table 1: The format of a transaction. Field Number Signature MAC

BLOCKCHAIN-BASED DISTRIBUTED SDVN

Payloads

In this section, we describe how multiple controllers are interconnected in a blockchain manner. We begin with network model, then detailed consensus steps are presented, along with theoretical analyses.

2.1

Description The position of this transaction in the transaction packet. The signature of this transaction. The MAC of this transaction. Some local events and local OpenFlow commands.

and networking resources are limited in distributed SDVN environment. Thus, we virtualize the computing resources and networking resources, as well manage these two resources dynamically, so as to improve the throughput of BC. For a comprehensive perspective, we present the structure of permissioned BC-based distributed SDVN with the vritualization of computing and networking in Fig. 1.

Network Model

We assume that there are N nodes in permissioned BC, which can be expressed by N = {1, ..., N }. The permissioned BC ensures safety if up to a fraction (often 1/3) of nodes is faulty, i.e., f = N3−1 nodes are malicious at most. Thus, state machine replication mechanism is used in permissioned BC to achieve consensus in an environment where nodes are partially trusted. There are C controllers in distributed SDVN, which can be expressed by C = {1, ..., C}. Some strong adversaries enable to coordinate faulty nodes in BC to compromise the replication mechanism. However, they cannot break the following cryptographic technologies, such as signatures, message authentication codes (MACs), and collisionresistant hashing. These cryptographic messages can be expressed as [4].

2.2

Consensus Steps and Theoretical Analyses

Each controller collects its local events and local OpenFlow commands as Transaction #1, Transaction #2, …, Transaction #n. The format of a transaction is shown in Table 1. Then all controllers send their transactions to BC at fixed periods. After verifying signatures, MACs, and smart contract, BC returns an validated block to all controllers, whose format is shown in Table 2, and appends this block in the blockchain. All controllers learn the payload in the block, know network view in other controllers, and update local network view, so as to synchronize global network view among multiple controllers. After giving procedures between controllers and BC system, we introduce detailed steps inside the BC, as shown in Fig. 2, along with theoretical analyses [3]. The numbering of each step in this figure is the same as the one used in the remainder of this subsection. 1. All controllers send transactions to all nodes. All controllers send their transactions, as message ⟨⟨block⟩σc , c⟩σ⃗c , to all nodes, where c is the controller ID. The total number of transactions is b. According to a certain view change protocol,

• ⟨m⟩σi means that message m is signed with a pubilc key from node i. • ⟨m⟩σi, j means that message m is authenticated by node i with a MAC for node j. • ⟨m⟩σ⃗i means that message m is authenticated by an array of MACs with node i for every replicas. Lots of computing resources and networking resources are necessary to verify the above cryptographic messages and smart contract, as well deliver verified messages in BC. However, computing

9

Session: Software-Defined Vehicular Networks

DIVANet’18, October 28-November 2, 2018, Montréal, QC, Canada

Table 2: The format of a block. Field Version Timestamp Block ID Block payload

Therefore, by now, the required time is

Description Block version number Creation time of this block The identifier of this block Transactions in this block (Transaction #1, …, Transaction #n)

2

µ µ b + + . Band b(θ + α) + (N − 1)α α + b(θ + α)

(4)

3. Each replica sends PREPARE messages to others. The replica sends ⟨PREPARE, p, c, H (m), n⟩σn to all others, where n is the replica ID. After receiving 2f matching PREPARE messages with the local PRE-PREPARE message, it will enter the following steps. Here, each replica generates (N − 1) MACs. After sending these prepare messages, the primary and each replica need to verify 2f MACs. Therefore, by now, the required time is µ b + + Band b(θ + α) + (N − 1)α µ µ + . 2f α α + b(θ + α) + (N − 1)α 3

4. All nodes send COMMIT messages to others. All nodes send ⟨COMMIT , p, c, H (m), n⟩σn⃗ to all others. After receiving 2f matching messages, it will verify smart contract. If valid, it will append this block in the blockchain and go to the following steps. Here, all nodes need to generate (N − 1) MACs. After sending these commit messages, all nodes verify 2f MACs, and b smart contracts. We assume verifying one smart contract needs β CPU cycles. Therefore, by now, the required time is

Figure 2: The detailed consensus steps inside BC.

which will be introduced in the following section, one node is selected to be a distinguished node, called the primary, such as node 2 in Fig. 2. And other nodes are called replica. Only the primary does some operations about these transactions. Firstly, the primary verifies MACs of all transactions. If valid, it verifies signatures of all transactions. If still valid, it enters the following steps. Here, based on the work in [9], the time required of a single peer to transmit b transactions from controllers to BC system can be expressed as b , (1) thop = Band where Band is the median bandwidth. And we assume that verifying one signature needs θ CPU cycles, verifying and generating one MAC need α CPU cycles. In this phase, the primary verifies b MACs and b signatures. Each node in BC has the computing speed of µ Hz. Thus, the time required to verify MACs and signatures can be expressed as µ . (2) b(θ + α)

µ b + + Band b(θ + α) + (N − 1)α µ µ + + α + b(θ + α) + (N − 1)α 2f α + (N − 1)α µ . 2f α + bβ

4

(6)

5. All nodes send the validated block to all controllers. All nodes reply ⟨REPLY , block, n⟩σn,c to all controllers, where block means the validated block. Firstly, each controller verifies 2f valid reply messages. Then, in order to make sure that smart contract is executed correctly in each BC node, each controller needs to verify b transactions in each reply message. If they are the same in 2f reply messages, the controller accepts this block and learns these transactions to update the corresponding network views. Here, all nodes generate C MACs. After sending these reply messages, one controller verifies 2f MACs, and 2f b transactions. We assume verifying one transaction needs ς CPU cycles. Therefore, by now, the required time, which is the total time to reply these b transactions, is

Therefore, by now, the required time is µ b + . Band b(θ + α)

(5)

(3)

µ b + + Band b(θ + α) + (N − 1)α µ µ + + α + b(θ + α) + (N − 1)α 2f α + (N − 1)α µ µc + , 2f α + bβ + Cα 2f α + 2f bς

2. The primary sends PRE-PREPARE messages to all others. The primary sends ⟨PRE − PREPARE, p, c, H (m)⟩σp⃗ to all replicas. Let p and H (m) denote the primary node ID and the hashed result of the issued block, respectively. After receiving the PRE-PREPARE message, each replica verifies MACs and signatures. Here, the primary generates (N −1) MACs. Each replica verifies one MAC from the primary, b MACs and b signatures in transactions. Similarly, the time of a single peer is necessary to deliver pre-prepare messages.

D =5

(7)

where µc is the computing speed in all controllers. Additionally, we consider the primary is partially trusted, which has the probability of k to slow down the BC system, and is represented by the trust feature of BC nodes. The more trusted node has

10

Session: Software-Defined Vehicular Networks

DIVANet’18, October 28-November 2, 2018, Montréal, QC, Canada

Since there are lots of edge computing servers, and some other computing tasks that use these edge computing servers, we can’t exactly know how many computing resources are in edge computing server e at next time slot. Therefore, we model the computing resources in edge computing server e as a random variable ζ e . It can be divided into Y discrete intervals, as Y = {Y0 , Y1 , ..., YY −1 }. Let ζ e (t) represent the computing resources in edge computing server e at time slot t. According to a certain transition probability, ζ e (t) changes from one state to another. The Y × Y computing state transition probability matrix can be expressed as:

smaller k. Thus, the expected required time to rely b transactions can be expressed as µ b + + Band b(θ + α) + (N − 1)α µ µ + + α + b(θ + α) + (N − 1)α 2f α + (N − 1)α µ µc )+ . 2f α + bβ + Cα 2f α + 2f bς

D =(1 − k)(5

(8)

Thus, the throughput of the BC is µ 1 b T = [(1 − k)(5 + + b Band b(θ + α) + (N − 1)α µ µ + + α + b(θ + α) + (N − 1)α 2f α + (N − 1)α µ µc ]trx/s. )+ 2f α + bβ + Cα 2f α + 2f bς

Πn (t) = [ϑas bs (t)]Y ×Y , (9)

CompR e (t) =

Trust Feature Model

All nodes in BC have various trust features, such as safe and compromised. Since out of centralized security management, it is difficult to know what the trust feature is for a BC node in the next instant. Thus, we model the trust feature in node n as a random variable δ n . And it is divided into discrete levels as ξ = {ξ 0 , ξ 1 , ..., ξ L−1 }. We consider the trust feature of δ n to be δ n (t) at time slot t. A Markov chain is used to model the transition of trust feature in BC node. We denote the transition probability of δ n (t) from one state Xs to another Ys as κ Xs Ys (t). Therefore, the transition probability matrix K n (t) in node n can be expressed as

where κ Xs Ys (t) = ξ.

3.2

(10)

|δ n (t)

= Xs ), and Xs , Ys ∈

Pr (δ n (t

+ 1) = Ys

(13)

Several cryptographic messages need to be delivered by enough bandwidth insider the BC system. Considering vehicular networks are bandwidth-limited, we allocate networking resources, i.e. bandwidth resources, virtually to BC system, so as to improve the throughput. Since there are lots of available bandwidth resources, we cannot exactly know how many bandwidth resources are available at next time slot. Therefore, we model the bandwidth resources as a random variable ηb . It can be divided into H discrete intervals, as D = {D0 , D1 , ..., DH −1 }. Let ηb (t) denote the bandwidth resources at time slot t. According to a certain transition probability, ηb (t) changes from one state to another. The H × H bandwidth state transition probability matrix can be denoted as:

SYSTEM MODEL

K n (t) = [κ Xs Ys (t)]L×L ,

ζ e (t)sm sm = . tm qm

3.3 Networking Model

In order to improve the throughput of BC system, we give trust feature model, computing model, and networking model in this section.

3.1

|ζ e (t)

And the computing rate is

From (9), we can see that in order to improve the throughput of the BC system, we need to select more trusted node as the primary (called view change protocol), more computing resources to verify cryptographic messages and smart contract (called computing resources allocation), as well more networking resources to deliver messages (called networking resources allocation).

3

(11)

where ϑas bs (t) = + 1) = bs = as ), and as , bs ∈ Y. The time to execute computing task Tm is qm tm = e . (12) ζ (t) Pr (ζ e (t

Υb (t) = [γθ s φs (t)]H ×H ,

(14)

where γθ s φs (t) = + 1) = φ s = θ s ), and θ s , φ s ∈ D. Summarily, in order to relieve the drawbacks and improve the throughput of BC system, it is necessary to virtualize and jointly consider computing resources, networking resources, as well view change protocol. This joint optimization problem is highly dimensional, dynamic, and complex, and is hard to be solved by traditional methods. Thus, we will propose a deep Q-learning approach to solve this problem. Pr (ηb (t

4

Computing Model

|ηb (t)

PROBLEM FORMULATION

In this section, we propose a deep Q-learning approach to address the joint optimization problem. We describe this problem as a Markov decision process, by defining state space, action space, and reward function. We then present the deep Q-learning approach.

Some cryptographic operations, e.g., verifying signatures and smart contract, as well generating and verifying MACs, are computed in BC system. Considering computing resources are limited in vehicular environment, we offload the computing tasks to edge computing servers to obtain more computing capabilities, so as to improve the throughput. We use Tm = {sm , qm } to denote the computing task m, where sm means the size of the computing task m, and qm means the required number of CPU cycles to compute this task. We assume there are E edge computing servers, which can be denoted as E = {1, ..., C}.

4.1

State Space

The state space is the trust features of all BC nodes, the computing resources of all edge computing servers, and the networking resources of available bandwidths. Therefore, state space can be expressed as

11

Session: Software-Defined Vehicular Networks

δ 1 (t)  S(t) = ζ 1 (t)  1 η (t)

4.2

δ 2 (t) ζ 2 (t) η 2 (t)

... ... ...

δ n (t) ζ e (t) ηb (t)

... ... ...

DIVANet’18, October 28-November 2, 2018, Montréal, QC, Canada

Q-learning, as a typical reinforcement learning approach, mimics human behaviors to take actions to the environment, in order to obtain the maximum long-term rewards. It uses action-state value function Q(s, a) to represent the feedback from each decision. Usually, a temporal difference method is used in Q-learning to evaluate Q(s, a)

δ N (t)

 ζ E (t)  .  η B (t) 

(15)

Action Space

Q(s, a) ← Q(s, a) + α(r + γ max Q(s ′, a ′ ) − Q(s, a)), ′

As the learning agent, it should determine which node is the primary node (i.e. view change), which edge computing server can be offloaded (i.e., computing resources allocation), and which bandwidth can be used (i.e., networking resources allocation). Thus, action space is denoted as follows. A(t) = {AN (t), AE (t), AB (t)}, 1) AN (t)

a

where α ∈ (0, 1] is learning rate in Q-learning. The action with maximum Q(s, a) can be chosen by the learning agent at each step. Traditionally, Q(s, a) is stored in Q-table. However, with the explosion of data dimension and complexity, it is barely possible to have all Q(s, a) and store them into Q-table. The development of deep networks is a viable approach to solve the above problem. Many researches have advocated to use deep networks to approximate Q(s, a), instead of Q-table, i.e., Q(s, a, ω) ≈ Q(s, a), where ω is the set of weights and biases in deep networks. This is the heart of deep Q-learning (DQL). We present the DQL algorithm in Algorithm 1, where ϵ−greedy policy is used to balance the exploitation and the exploration.

(16)

[a 1 (t), a 2 (t), ..., an (t), ..., a N (t)]

where = means which node is the primary node. Here, an (t) ∈ {0, 1}, where an (t) = 1 means node n is the primary node; otherwise an (t) = 0. The ∑ consensus system only has one primary, thus nN=1 an (t) = 1. 2) AE (t) = [a 1 (t), a 2 (t), ..., ae (t), ..., a E (t)] denotes which edge computing server is offloaded. Similarly, ae (t) ∈ {0, 1}, where ae (t) = 1 means edge computing server e is offloaded, otherwise ae (t) = 0. And only one edge computing server can be offloaded ∑ at one time slot, thus eE=1 ae (t) = 1. 3) AB (t) = [a 1 (t), a 2 (t), ..., ab (t), ..., a B (t)] represents which bandwidth can be used by BC system. Here, ab (t) ∈ {0, 1}, where ab (t) = 1 means bandwidth b is selected; otherwise ab (t) = 0. Specially, only one bandwidth can be used by BC system at one ∑ time slot, thus bB=1 ab (t) = 1.

4.3

Algorithm 1 DQL 1:

2:

Reward Function

3:

According to (9), state space, and action space, we model the system throughput as reward function, which can be defined as

4: 5:

N ∑

1 b R(t) = [(1 − an (t)δ n (t))(5 ∑B b b a (t)ηb (t) n =1 b =1 ∑E ae (t)CompR e (t) + e =1 b(θ + α) + (N − 1)α ∑E e e e =1 a (t)CompR (t) + α + b(θ + α) + (N − 1)α ∑E ae (t)CompR e (t) + e =1 2f α + (N − 1)α ∑E ae (t)CompR e (t) + e =1 ) 2f α + bβ + Cα µc + ]trx/s. 2f α + 2f bς

4.4

(18)

6: 7: 8: 9:

(17)

10:

Initialization: Initialize evaluated deep networks with weights and biases set ω. Initialize target deep networks with weights and biases set ω ′ . for k = 1 : K do Reset the environment with a randomly initial observation Sini , and S t = Sini . while S t ! = S t erminal do Select action At based on ϵ−greedy policy. Obtain immediate reward R t and next observation S t +1 . Store experience (S t , At , R t , S t +1 ) into the experience replay memory. Randomly sample some batches of (Si , Ai , Ri , Si +1 ) from the experience replay memory. Calculate target Q-value Q t ar дet (s) in target deep networks: if S ′ is S t erminal Q t ar дet (S) = R S , else Q t ar дet (S) = R S + γmax A′ Q(S ′, A′ ; ω ′ ). Train evaluated deep networks to minimize the loss function L(w) L(ω) = E[(Q t ar дet (s) − Q(S, A; ω))2 ].

Deep Q-Learning 11:

The learning agent makes decisions about view change, computing resources allocation, and networking resources allocation. This joint problem is highly dimensional, dynamic, and complex, thus it is barely possible to be solved by traditional methods. Therefore, we consider to use a novel deep Q-learning approach to solve the problem.

12: 13: 14:

12

Periodically, update target deep networks. S t ← S t +1 end while end for

(19)

Session: Software-Defined Vehicular Networks

DIVANet’18, October 28-November 2, 2018, Montréal, QC, Canada

5 SIMULATION RESULTS AND DISCUSSIONS In this section, we first show simulation settings. Then, some simulation results are discussed.

5.1

Simulation Settings

In this simulation, we use Python 2.7.10 with TensorFlow 1.4.0 [1]. There are seven BC nodes, i.e., N = 7, f = 2, five edge computing servers, and six available bandwidths. The trust features of BC nodes can be very secure, secure, medium, compromised, and very compromised. We assume the transition probability matrix of trust features in BC nodes as follows.  0.5 0.15 0.125 0.12 0.105  0.15 0.5 0.125 0.12 0.105  0.5 0.125 0.12  . K = 0.105 0.15 (20) 0.105 0.12 0.125  0.5 0.15   0.05  0.105 0.12 0.125 0.15 The computing resources of an edge computing server can be high, medium, low, and very low. And we assume the transition probability matrix as follows.

Figure 3: Training curves tracking the throughput of BC system under different schemes.

 0.5 0.3 0.15 0.05   0.3 0.5 0.15 0.05 . (21) Π =  0.15 0.3 0.5 0.05 0.15 0.3 0.5 0.05 Similarly, the networking resources of each available bandwidth can be very high, high, medium, low, and very low. The transition probability matrix as follows.

• Existing scheme with traditional view change, using local computing resources and local networking resources. We call it existing scheme.

0.45 0.16 0.14 0.13 0.12 0.16 0.45 0.14 0.13 0.12   (22) Υ = 0.12 0.16 0.45 0.14 0.13 . 0.12 0.13 0.16 0.45 0.14   0.12 0.13 0.14 0.16 0.45 We set the required number of CPU cycles to verify one signature θ as 8 Mcycles, the required number of CPU cycles to verify and generate one MAC α as 0.05 Mcycles, the required number of CPU cycles to verify smart contract β as 15 Mcycles, and the batch size of a block b as 1Mb. For the performance comparison, there are five schemes simulated: • Proposed DQL-based scheme with view change, computing resources allocation, and networking resources allocation. We call it DQL-based scheme. • Proposed DQL-based scheme with view change, and computing resources allocation, but without networking resources allocation, which only uses local networking resources. We call it DQL-based scheme without networking allocation. • Proposed DQL-based scheme with computing resources allocation, and networking resources allocation, but with the traditional view change protocol used in [4]. We call it DQLbased scheme without node choice. • Proposed DQL-based scheme with view change, networking resources allocation, but without edge computing servers, which only uses local computing resources. We call it DQLbased scheme without computation offloading.

13

5.2

Simulation Results

Fig. 3 shows training curves tracking the throughput of the BC system under different schemes. We use the AdamOptimizer [5] in TensorFlow with the learning rate 1e −5 . As we can see, DQLbased scheme has the best throughput. The reason is that with our proposed scheme, a more trusted node can be selected as the primary node, more computing resources are allocated to BC system, and more networking resources are used by BC system. Virtualization and management the underlying resources are necessary to relieve the drawback and improve the throughput of BC system. This figure also presents the convergence performance of DQL. After training deep networks, we use them in the following simulation. Fig. 4 shows the simulation curves between the BC throughput and the number of BC nodes under different schemes. With the increase of BC nodes, the system throughput decreases. The reason is that more BC nodes need more operations about verifying and generating signatures, MACs, and smart contract. But our proposed scheme still has better throughput.

6

CONCLUSIONS AND FUTURE WORK

In this paper, we proposed a permissioned blockchain-based consensus approach in distributed SDVN. Due to a variety of data and vehicles in SDVN, as well the drawback and the low throughput of permissioned BC, we virtualized computing resources and networking resources, jointly considering the trust features of blockchain nodes, to relieve the drawback and improve the throughput of the BC system. Accordingly, we formulated view change, computing resources allocation, and networking resources allocation as a joint

Session: Software-Defined Vehicular Networks

DIVANet’18, October 28-November 2, 2018, Montréal, QC, Canada

REFERENCES

10 4

4.5

[1] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016). [2] Elli Androulaki, Artem Barger, Vita Bortnikov, Christian Cachin, Konstantinos Christidis, Angelo De Caro, David Enyeart, Christopher Ferris, Gennady Laventman, Yacov Manevich, et al. 2018. Hyperledger fabric: a distributed operating system for permissioned blockchains. In Proc. Conf. on Thirteenth EuroSys. 30. [3] Pierre-Louis Aublin, Sonia Ben Mokhtar, and Vivien Quéma. 2013. RBFT: Redundant byzantine fault tolerance. In Proc. Conf. Distributed Comp. Sys.’ 13. 297–306. [4] Miguel Castro and Barbara Liskov. 2002. Practical Byzantine fault tolerance and proactive recovery. ACM Trans. on Comp. Sys. 20, 4 (2002), 398–461. [5] Trishul M Chilimbi, Yutaka Suzue, Johnson Apacible, and Karthik Kalyanaraman. 2014. Project Adam: Building an Efficient and Scalable Deep Learning Training System.. In OSDI, Vol. 14. 571–582. [6] Joao M Duarte, Torsten Braun, and Leandro A Villas. 2017. Addressing the effects of low vehicle density in highly mobile vehicular named-data networks. In Proc. ACM DIVANet’17. 117–124. [7] Ying He, F Richard Yu, Nan Zhao, Hongxi Yin, and Azzedine Boukerche. 2017. Deep Reinforcement Learning (DRL)-based Resource Management in SoftwareDefined and Virtualized Vehicular Ad Hoc Networks. In Proc. ACM DIVANet’17. 47–54. [8] Xumin Huang, Rong Yu, Jiawen Kang, Yejun He, and Yan Zhang. 2017. Exploring mobile edge computing for 5G-enabled software defined vehicular networks. IEEE Wire. Comm. 24, 6 (2017), 55–63. [9] Uri Klarman, Soumya Basu, Aleksandar Kuzmanovic, and Emin Gün Sirer. [n. d.]. bloXroute: A Scalable Trustless Blockchain Distribution Network WHITEPAPER. ([n. d.]). [10] Teemu Koponen, Martin Casado, Natasha Gude, Jeremy Stribling, Leon Poutievski, Min Zhu, Rajiv Ramanathan, Yuichiro Iwata, Hiroaki Inoue, Takayuki Hama, et al. 2010. Onix: A distributed control platform for large-scale production networks.. In OSDI, Vol. 10. 1–6. [11] Chao Qiu, F Richard Yu, Fangmin Xu, Haipeng Yao, and Chenglin Zhao. 2018. Permissioned Blockchain-Based Distributed Software-Defined Industrial Internet of Things. In Globecom Workshops (GC Wkshps), 2018 IEEE. 1–7. [12] Chao Qiu, Chenglin Zhao, Fangmin Xu, and Tianpu Yang. 2016. Sleeping mode of multi-controller in green software-defined networking. EURASIP Journal on Wireless Commu. and Net. 2016, 1 (2016), 282. [13] Amin Tootoonchian and Yashar Ganjali. 2010. Hyperflow: A distributed control plane for openflow. In Pro. Conf. Internet Netw. Manag. 3–3. [14] Fei Richard Yu, Jianmin Liu, Ying He, Pengbo Si, and Yanhua Zhang. 2018. Virtualization for Distributed Ledger Technology (vDLT). IEEE Access 6 (2018), 25019– 25028. [15] Yaomin Zhang, Haijun Zhang, Keping Long, Xiaoming Xie, and Victor Leung. 2017. Resource Allocation in Software Defined Fog Vehicular Networks. In Proc. ACM DIVANet’17. 71–76.

4

3.5

DQL-based scheme DQL-based scheme without networking allocation DQL-based scheme without node choice DQL-based scheme without computation offloading Existing scheme

Throughput (trx/s)

3

2.5

2

1.5

1

0.5

0

6

8

10

12

14

16

18

20

The number of consensus nodes

Figure 4: The simulation curves between the BC throughput and the number of BC nodes under different schemes. optimization problem. In order to solve this joint problem, we used a novel deep Q-learning approach. Some simulations were presented to show the effectiveness of our proposed scheme. Caching resources are very important in permissioned BC, thus some works are in progress to virtualize caching resources to relieve the drawback of BC system.

ACKNOWLEDGMENTS The authors would also like to thank the anonymous referees for their valuable comments and helpful suggestions. The work is supported by the Key Program of the National Natural Science Foundation of China (Grant No 61431008), and BUPT Excellent Ph.D. Students Foundation (No 2015010100).

14