International Journal of Computer Networks & Communications (IJCNC)

AIRCC PUBLISHING CORPORATION

ENHANCING AVAILABILITY FOR DISTRIBUTED REPLICATED SERVICES CONSIDERING NETWORK EDGE AVAILABILITY

Manghui Tu1, Liangliang Xiao2 and Dianxiang Xu3

1Department of CITG, Purdue University Northwest, Hammond, Indiana

2Department of CSIT, Frostburg State University, Frostburg, Maryland

3Department of Computer Science, Boise State University, Boise, Idaho

Abstract


Mechanism to improve data or service availability is critical for an enterprise to ensure the quality of service in terms of availability. Replication has been used to improve system availability. The number and location of the replicas are two impact factors on availability. In this paper, we will consider the impact of the node and network edge failures on the availability of replicated data or services. The Effective availability modeling approach is designed and efficient availability computing algorithms are developed to model and compute availability of replicated services for systems with the tree topology. The availability enhancement problem (maximizing the objective function) is transformed to a p-median problem (minimizing the objective function) through re-define the availability enhancement problem. An efficient replica allocation algorithm is developed to improve data availability in tree networks, with a runtime complexity of O(K|V|2), where K is the number of replicas and |V| is the number of nodes in the tree network. Finally, experimental studies have been conducted to evaluate how efficient and effective the proposed availability computing algorithm and the availability enhancement algorithm on improving the availability of replicated data or services. The results show that the proposed solutions are efficient and effective on availability computing and availability enhancement. 

Keywords


Network Protocols, Wireless Network, Mobile Network, Virus, Worms & Trojon

1. Introduction


It is well known that the utility of the data and service is mainly limited by availability [1, 2, 3, 4], and the dynamic environment and unreliable internet connectivity raise the concerns on availability of distributed systems [5, 6, 7, 8, 9]. Replication and erasure coding techniques [7, 8, 9, 10, 11, 12, 30] have been used to enhance data availability. However, the data object that is replicated also impacts the effective availability. Replicating data at highly available sites will result in high data availability. Also, widely distributing data can result in higher availability than distributing data in local clusters. In [8, 9], it has shown that even 100% local availability does not necessarily provide high availability to end users. The network failures may, for 1.5-2 %of the time, prevent clients from successfully accessing a cluster [5, 6, 7, 13]. Thus, the location of data stored in the system plays a significant role in overall system availability.

Most of the existing research on system availability consider the availability of the storage nodes only [11, 12, 14]. Some other research also considers the impact of data consistency on service availability [8, 9]. All these research do not consider the reach availability of users’ requests to service/data sites. As discussed earlier, availability of the network links also impacts service data availability, especially the reach availability of users’ requests. In [6, 7], network link availability is considered, but the evaluation algorithm itself requires exponential computation time relative to the number of storage nodes even in a tree graph with a single replica. Thus, it is not feasible to compute the availability of distributed systems with replications. Also, the work in [6, 7] does not provide an availability model in a system with more than one replica, i.e, the system has multiple service/data sites available to serve a user’s request.

In this paper, we consider modeling the service/data reach availability (i.e., the availability of service/data to remote requests) in distributed systems in a tree network. Data or services are replicated in the system to improve both performance and availability. Since the system has multiple service/data sites available to serve a user’s request, a single site failure does not lead to the unavailability to user’s request. Therefore, our problem is fundamentally different from network reliability problem (reliable only ensures the access to a single site) [15]. To assess system availability, efficient algorithms are developed to compute system availability considering both node and link failures.

The remainder of the paper is organized as follows. System modeling is described in Section II. An algorithm on computing system availability in a tree network is presented in Section III.  Section IV, issues on how to reduce the runtime complexity and how to improve the service/data availability will be discussed. Efficient replica allocation algorithms to improve system availability will be introduced in Section V. Experimental studies conducted will be described in Section VI to give some numerical results of the proposed replica allocation algorithm applied in a system with tree network topology. Section VII briefly discusses some related works in availability modeling and enhancement and Section VIII concludes the paper.

2. System & Availability Modeling


The Study shows that user access requests usually follow a constant path routing to data sites and most routes are stable in days or even weeks [16]. So the data access routes in such a distributed system can be modeled as a tree graph, T = (V, E), as shown in Fig.1. Let e(u, v) denote the edge that directly links two nodes u and v, and d(u, v) denote a single path between u and v. Note that e(u, v) itself is a path between u and v with a single edge. The system hosts a set of data objects replicated to different nodes. For simplicity, we consider a single data object d. The set of nodes hosting replicas is defined as the residence set, R = {u| u Î V and u holds a replica}, and |R| is the number of replicas. A read request needs to access the closest node to the request. If it is not available, then, the request will be further forwarded to another node. Let r denote a read request and Ar(u) denote the number of read access requests from a node u over a time period.  Majority of the user accesses are reading accesses, update accesses are not considered in this research.


Figure 1. A sample tree network with replica.

In a distributed system, data or service availability may be affected by other factors such as node up/downtime, system load, data consistency, and network failure or congestion [5, 8, 9]. According to [8, 9], weaker consistency would result in higher availability. Here, we assume that users are allowed to read stale data and, hence, the effect of data inconsistency will have no impact on data availability. Also, due to replication, the overloading problem is not an issue, hence, the impact of system load on data availability is not considered.

2.1. Data Availability Analysis

If the preferred node in R is not available, then the read request will be forwarded and served at the next available node. This process repeats until it is finally served by one of the replicas, or finally none of the replica can serve the read request, by choosing different combinations of paths. The access path of such a read request can be modeled as a directed graph, and thus both the network topology and the location of the read requests originally issued will affect data availability. Therefore, to compute data availability, we need to consider requests originated from different nodes.



2.2. Data Availability Modeling in a Tree Network


Figure 2. A sample tree graph rooted at u.


Subsequently, SA(u, Tu, R) = 1 – SUA(u, Tu, R).  The data availability for a request from another node x, e.g., SA(x, Tx, R) (note that the tree should now be rooted at x) can be computed in a similar way. Based on the analysis above, the system availability of d is computed as the average availability for requests in the entire system. Let SA(G, R) denote the system availability of data d. Then, we have,


The algorithm Tree_SA(T, R), which is developed to compute SA(T, R) in a tree graph T = (V, E) is shown in Fig. 3. The algorithm works as what follows. It calls the recursive algorithm T_SUA(u, Tu, R) for each node u in T to compute the unavailability for request issued at node u. The algorithm T_SUA(u, Tu, R) recursively compute the data unavailability for the tree rooted at node u in T, by traversing each link and each vertex once. The computation uses the models we proposed earlier. Note that the children of node v are the nodes in the new tree rooted at node u (There are different implemetnations on building a new Tu in the original tree T, e.g., 1) physically re-build the tree rooted at Tu;  2) build a new virtual tree Tu by tracking the neighboring nodes for each node v in T, such that the unvisited neighboring nodes are child of v and the visited neighboring node is the parent node of v in new tree Tu.

Figure 3. The algorithm Tree_SA(T, R) in tree graph T.

In Theorem 1, we show that the availability d in tree T, SA(T, R), is monotonically increasing.

Theorem 1. Let R1 and R2 denote two resident sets of T and R1 Í R2, then SA(T, R1) £ SA(T, R2).

Proof:  Consider R1 = R2. According to the definition of SA(T, R1) and SA(T, R2), we know that SA(T, R1)  = SA(T, R2).

Consider R1 Ì R2. For each request in a node u, it has at least one more node that can provide a replica of data object d to access. Without loss of generality, assume that |R2| =|R1| +1, and R2 = R1 È {v}. According to the availability model, the unavailability of v, 1− NA(v) < 1. According to the availability computing model, for each request r issued at node u, which means SUA(u, Tv, R2) < SUA(u, Tv, R1), or SA(u, Tv, R2) > SA(u, Tv, R1). Thus, we have SA(T, R1) < SA(T, R2).  It follows that SA(T, R1)  £ SA(T, R2) if R1 Í R2.                 

We now show that the runtime complexity of our proposed availability computing algorithm in a tree T is in the order of the square of the number of nodes in T.

Theorem 2. The runtime complexity of algorithm Tree_SA(T, R) is  O(|V|2).

Proof. It suffices to prove the correctness of recursive relations.

Let vj be a leaf node. If vjÏ{rj1, …, rji} or q £ 0, the value of G(vj,q,rji) should not be counted since the situation is not included in the definition. Therefore we set G(vj,q,rji)=-∞. On the contrary, if vj=rjsÎ{rj1, …, rji} and q ³ 1, G(vj,q,rji) equals to the optimal value of the sub-problem defined on the sub-tree Tj, which is the node vj. Therefore G(vj,q,rji) = Ar(vjAjs.

If rjiÎVj or q < 0, the value of F(vj,q,rji) should not be counted since the situation is not included in the definition. Therefore we set F(vj,q,rji)=-∞. If rjiÏVj and q = 0, the leaf node vj has to fetch data from node rji. Therefore F(vj,q,rji)=Ar(vjAji. If rjiÏVj and q ³ 1, the leaf node vj can either fetch data from node rji or fetch data from itself. Therefore F(vj,q,rji)=max{F(vj,0,rji),G(vj,1,vj)}.

Let vj be an internal node. If Vj∩{rj1, …, rji}=Æ or q £ 0 or i = 0, the value of G(vj,q,rji) should not be counted since the situation is not included in the definition. Therefore we set G(vj,q,rji)=-∞. If rjiÏVj, at least one node must be selected in {rj1, …, rji} ∩ Vj is equivalent to at least one node must be selected in {rj1, …, rji1} ∩ Vj. Therefore G(vj,q,rji)=G(vj,q,rji1). The rest situations can be separated into 3 cases: rji=vj and q ³ 1, rjiÎVj1 and q ³ 1, and rjiÎVj2 and q ³ 1. We discuss them as follows.

In case 1 we consider that rji=vj and q ³ 1. Without loss of generality, we assume that rji corresponds to rj1i1 for vj1 (the left child node of vj) and rj2i2 for vj2 (the right child node of vj).We claim that G(vj,q,rji)=max{G(vj,q,rji1),Ar(vjAji+max{F(vj1,q1,rj1i1)+F(vj2,q2,rj2i2)| 0£q1,q2£q-1, q1+q2=q-1}}.

In order to prove that, we must show (1) the optimal solution appears either in G(vj,q,rji1) or Ar(vjAji+max{F(vj1,q1,rj1i1)+F(vj2,q2,rj2i2) | 0£q1,q2£q-1, q1+q2=q-1}, and (2) the optimal solution is the maximum value.

First, if a node in {rj1, …, rji1} ∩ Vj is selected in the optimal solution, then the optimal solution appears in G(vj,q,rji1). Otherwise we assume that no node in {rj1, …, rji1} ∩ Vj is selected but rji is selected in the optimal solution. Then it is better for vj to fetch data from node rji=vj than other nodes. Let q1 nodes will be selected from Tj1 and q2 nodes will be selected from Tj2 where q1+q2=q-1. Hence the optimal solution appears in Ar(vjAji+max{F(vj1,q1,rj1i1)+F(vj2,q2,rj2i2) | 0£q1,q2£q-1, q1+q2=q-1}.

Second, Ar(vjAji+max{F(vj1,q1,rj1i1)+F(vj2,q2,rj2i2) | 0£q1,q2£q-1, q1+q2=q-1} may contain some inconsistent values. For example, if F(vj1,q1,rj1i1) or F(vj2,q2,rj2i2) selects a node in {rj1, …, rji1} ∩ Vj, vj should fetch data from that node instead of rji. However since the term Ar(vjAji is used, it actually counts that vj fetches data from node rji. Or F(vj1,q1,rj1i1) selects a node such that vj2 should fetch data from that node outside Tj2 instead of rj2i2. However the term F(vj2,q2,rj2i2) indicates that actually it counts that vj2 fetches data from node rj2i2 outside Tj2. Fortunately, the inconsistent values are smaller than the corresponding consistent values, and hence are smaller than the optimal solution. Therefore the inconsistent values will not affect the optimal solution to be the maximum value.

In case 2 we consider rjiÎVj1 and q ³ 1. Without loss of generality, we assume that rji corresponds to rj1i1 for vj1 (the left child node of vj) and rj2i2 for vj2 (the right child node of vj). We claim that G(vj,q,rji)=max{G(vj,q,rji1),Ar(vjAji+max{G(vj1,q1,rj1i1)+F(vj2,q2,rj2i2) | 1£q1£q, 0£q2£q, q1+q2=q, the associated set of G(vj1,q1,rj1i1) does not contain rj1, …, rji1 but contains rji}}.

First, if a node in {rj1, …, rji-1} ∩ Vj is selected in the optimal solution, then the optimal solution appears in G(vj,q,rji-1). Otherwise we assume that no node in {rj1, …, rji-1} ∩ Vj is selected but rjis selected in the optimal solution. We claim that vj will not be selected. Because if vj is selected, then it cannot be in {rj1, …, rji-1} ∩ Vj. Thus it is better for vj to fetch data from node rji than vj. Consequently it is better for all nodes to fetch data from node rji than vj. It implies that the selection of vj will not gain any benefit. Therefore vj will not be selected. So q1 nodes will be selected from Tj1 and q2 nodes will be selected from Tj2 where q1+q2=q. Hence the optimal solution appears in Ar(vj)×Aji+max{G(vj1,q1,rj1i1)+F(vj2,q2,rj2i2) | 1£q1£q, 0£q2£q, q1+q2=q, the associated set of G(vj1,q1,rj1i1) does not contain rj1, …, rji-1 but contains rji}. 

Second, if the associated set of G(vj1,q1,rj1i1) does not contain rj1, …, rji, then that G(vj1,q1,rj1i1) should not be counted. If the associated set of G(vj1,q1,rj1i1) contains rj1, …, rji1, then the inconsistent value is smaller than the corresponding consistent value, and hence smaller than the optimal value. so that G(vj1,q1,rj1i1) can be omitted. Therefore it suffices to consider G(vj1,q1,rj1i1) whose associated set does not contain rj1, …, rji1 but contains rji.

In case 3 we consider rjiÎVj2 and q ³ 1. Without loss of generality, we assume that rji corresponds to rj1i1 for vj1 (the left child node of vj) and rj2i2 for vj2 (the right child node of vj). Analogous to the proof of case 2, it can be proved that G(vj,q,rji)=max{G(vj,q,rji1),Ar(vjAji+max{F(vj1,q1,rj1i1)+G(vj2,q2,rj2i2) | 0£q1£q, 1£q2£q, q1+q2=q, the associated set of G(vj2,q2,rj2i2) does not contain rj1, …, rji1 but contains rji}}.

If rjiÎVj or q < 0, the value of F(vj,q,rji) should not be counted since the situation is not included in the definition. Therefore we set F(vj,q,rji)=-∞. If rjiÏVj and q ³ 0, without loss of generality, we assume that rji corresponds to rj1i1 for vj1 (the left child node of vj) and rj2i2 for vj2 (the right child node of vj). We claim that F(vj,q,rji)=max{G(vj,q,rji),Ar(vjAji+max{F(vj1,q1,rj1i1)+F(vj2,q2,rj2i2) | 0£q1£q, 0£q2£q, q1+q2=q}} First, if the optimal solution selects node in {rj1, …, rji} ∩ Vj, then the optimal solution appears in G(vj,q,rji).

Otherwise we assume that no nodes in {rj1, …, rji} ∩ Vj is selected in the optimal solution. Then it is better for vj to fetch data from node rji than other nodes. Additionally vj will not be selected because of the same reason in the proof of case 2. So q1 nodes will be selected from Tj1 and q2 nodes will be selected from Tj2 where q1+q2=q. Hence the optimal solution appears in

Ar(vjAji + max{F(vj1,q1,rj1i1)+F(vj2,q2,rj2i2) | 0£q1£q, 0£q2£q, q1+q2=q}.

Finally, the inconsistent values in Ar(vjAji+max{F(vj1,q1,rj1i1)+F(vj2,q2,rj2i2) | 0£q1£q, 0£q2£q, q1+q2=q} are smaller than the corresponding consistent values, and hence are smaller than the optimal solution. Therefore the inconsistent values will not affect the optimal solution to be the maximum value.

3. System Availability Enhancement


In this research, we proposed to develop mechanisms to optimize data availability in a system with a fixed number of replicas, K, i.e., to find a resident set with a size of K such that the overall data availability in a tree graph is optimal. It has been shown in our previous work in [17, 18] that reducing average distance of the paths (number of hops) from user’s requests to service/data sites can result in high availability, as the shorter the distance (the smaller the number of hops), the higher the path availability from a requester to its closest replica. Therefore, a potential approach to improve the overall system availability is to allocate data replicas in the system such the average distance of the paths from user’s requests to service sites is minimal.

3.1 Availability Enhancement Problem Modeling

The replica allocation problem to optimize data availability is much more complex than the traditional optimal resident set problems [19, 20, 21]. For traditional resident set problem, each request will be served by a single site (usually the closest site) and the corresponding communication cost will be computed as the final communication cost for such a request. Therefore, each request will be associated with a single data replica site only.  However, with the availability model defined in this research, a request may need to find another available site if the current accessing site is not available. This process will repeat till such access is served by a replica site or is not served after all replica sites have been tried but none of them are available. Thus, each request should be associated with all sites within the resident set for data availability computing. To find the resident set with maximal data availability in a tree network, the computation complexity could be too high to be feasible. For example, an intuitive approach is to try every combination of the resident set in the tree and the runtime complexity would be |V||R|. Thus, an approximate solution is to apply the property observed in [17, 18] that the smaller the number of hops from a request to the resident set, the higher the data availability.

First, since a request will be forwarded to a secondary replica only the primary replica node cannot serve such request, and such an event has a very low probability (node availability is usually very high), we can assume that a request is only associated with the primary replica node to reduce the complexity (note that this assumption is only valid for replica allocation purpose, a request is still associated with all replicas when computing availability). Thus, our problem can be reduced to a resident set problem maximizing the total of the product of the number of reading requests and the path availability of each request to the resident set. i.e.,  Σνεv(Aγ(ν)*(Aδ(ν,R)*NA(ν))), where is the path availability from a request issued at node v to a replica site in the resident set R.

Second, we all understand that the heterogeneity of the node (replica site) availability can have a big impact on the overall system availability. Here, we will show that the heterogeneity of the edge availability can have big impact on the overall system availability. In a system with line topology shown Fig. 4, the allocation of two replicas to optimize the availability without considering the heterogeneity of edge availability is shown in allocation scheme A, and the allocation considering the heterogeneity of edge availability is shown in allocation scheme B (with homogeneous node availability of 0.999). In Scheme B, the service to requests at node v1 is available locally with availability of 0.999, and requests need to be forwarded to node v3 only if node v1 is not available (with a probability of 0.001). Requests at other nodes can be served by the replica at node v3 with very high probability, and only need to be forwarded to node v1 if node v3 is not available or edge e(v2, v3) is not available (for requests from node v2). The overall normalized data availability in system with replica allocation scheme B is around 0.96 by applying the availability computing algorithm described in Fig. 3. In Scheme A, the service to requests at node v1 is not available locally and thus need to be forwarded to node v2 or v3 where the availability of edge e(v1, v2) is 0.5. The overall normalized data availability in system with replica allocation scheme A will definitely be lower than 0.90.

Figure 4. The impact of the edge availability on system availability.


In our previous work in [18], solutions have been given to availability optimization problem, by considering the homogeneous of node availability or network link availability, or both.  However, these solutions are only partial solutions to Problem 1.

3.2 The Availability Enhancement Algorithm

Problem 1 is an optimization problem similar to the p-median problem [22, 15]. In a p-median problem, it is given a graph G(V, E) where V = {v1, v2, …,vn}. Each node vi is associated with a nonnegative weight ci, and each edge e(vi, vj) is associated with a nonnegative weight d(e(vi, vj)). Additionally, it is given a real monotonically increasing function fi for each node vi. The goal is to find a subset R Í V containing at most p nodes (p is an arbitrary but pre-determined number) to minimize the objective


Problem 1 and p-median problem have some common issues. They both consider weighted tree networks where each node/edge is assigned value(s). They both look for optimal solutions to the objective functions. Problem 1 looks for the maximum value of an objective function while p-median problem looks for the minimum value of another objective function. It is therefore possible for us to borrow some ideas of the solution to the p-median problem to develop the solution to Problem 1.

However, it is difficult to directly apply the solution to the p-median problem to Problem 1 since they are different. Consider the p-median problem. Suppose that a node vj is selected and the edges e(vi, vi1), e(vi1, vi2), …, e(vik, vj) form a path connecting nodes vi and vj. Then the computation of the cost for vi to access vj includes k+1 piece of information: fi(d(e(vi,vi1))), fi1(d(e(vi1,vi2))), …, fik(d(e(vik,vj))), and cj. While for Problem 1, the computation of the contribution for vi to fetch data from vj includes k+2 piece of information: Ar(vi), Ae(vi,vi1), Ae(vi1,vi2), …, Ae(vik,vj), and NA(vj). Therefore the solution to Problem 1 should include and handle more information than that to p-median problem. Thus we need to modify the solution to p-median problem to re-develop the solution to Problem 1.

Problem Transformation and Re-definition 

A three step solution is provided to Problem 1:

  1. Transformation: reduce Problem 1 from an arbitrary tree network to a binary tree network.
  2. Pre-processing: compute Aji and rji for each node vj, 1 £ i, j £ n
  3. Computing function values: compute the values of two functions G(vj,q,rji) and F(vj,q,rji) based on their recursive relations.

Step 1.             Transformation

Given an arbitrary tree network T(V, E), where V = {v1, v2, …,vn}, E is the set of all edges in T. It can be transformed to that of a binary tree network T’(V’,E’). We show that by considering the following two cases, Case 1 and Case 2.

Case1. Suppose that an internal node vj has exactly one child node (Fig. 5(a)). Without loss of generality, we assume that vj has a left child node vj1. Then we introduce a right child node vj2 and set the availabilities NA(vj2) = Ae(vj, vj2) = 0 and the number of read requests Ar(vj2) = 0 (Fig. 5(b)).

Figure 5. Node with a single child node (a) is transformed to (b)

Since NA(vj2) = Ae(vj, vj2) = Ar(vj2) = 0, there is no benefit to select vj2 and add vj2 to the resident set R. Therefore if R’ is an optimal solution to Problem 1 for T’(V’,E’), then R = R’ – {vj2 | j} is an optimal solution to Problem 1 for T(V,E).

Case 2. Suppose that an internal node vj has more than 3 childe nodes vj1, vj2­, …, vjt where t ³ 3 (Fig 6(a)). Then we introduct t-2 nodes ujs where 2 £ s £ t − 1, replace edge e(vj, vjs) by edge e(ujs, vjs) for 2 £ s £ t − 1, replace edge e(vj, vjt) by edge e(ujt−1, vjt), and add edges e(vj, uj2) and e(ujs’, ujs’+1) for 2 £ s’ £ t – 2, and set the availabilities NA(ujs) = 0 for 2 £ s £ t − 1, Ae(ujs, vjs) = Ae(vj, vjs) for 2 £ s £ t − 1, Ae(ujt−1, vjt) = Ae(vj, vjt), Ae(vj, uj2) = Ae(ujs’, ujs’+1) = 1 for 2 £ s’ £ t – 2, and the number of read requests Ar(ujs) = 0 for 2 £ s £ t – 1 (Fig 6 (b)).

Figure 6(a). Case 2, the node with over 3 child nodes.

Figure 6(b). Transformed graph for the graph shown in Fig. 6 (a).

First, it can be verified that the success availability for node vj (vjs) to fetch data from node vjs (vj) does not change, 2 £ s £ t – 1. Second, the new assigned availabilities and number of requests guarantee that there is no benefit to select nodes ujs, 2 £ s £ t − 1. Therefore if R’ is an optimal solution to Problem 1 for T’(V’,E’), then R = R’ – {ujs | j, s} is an optimal solution to Problem 1 for T(V,E).

Based on the previous discussion, Problem 1 can be efficiently transformed from an arbitrary tree network to a binary tree network. The solution to the transformed problem can be easily transformed back to the original problem.

Step 2. Pre-processing: 

In this step we compute Aji and the corresponding rji for each node vj, 1 £ i, j £ n. Aji and rji are defined in the following definition. 

Definition 1: Aj1 is the greatest success probability for node vj to fetch data from all nodes, …, Ajn is the smallest success probability for node vj to fetch data from all nodes. rj1 is the node corresponds to Aj1, …, rjn is the node corresponds to Ajn.

Given a binary tree network, we compute the success probability pij for node vi to fetch data from node vj. Specifically, if the edges e(vi, vi1), e(vi1, vi2), …, e(vik, vj) form the path connecting vi and vj, then the probability pij = Ae(vi,vi1Ae(vi1,vi2) … Ae(vik,vj NA(vj). Then we sort p1j, …, pnj from the smallest to the largest, and let the results be Aj1, …, Ajn. Additionally, let node rji corresponds to Aji, 1 £ i £ n. The equality of success probabilities can be resolved by arbitrary set the order of nodes.

Step 3. Computing recursive function values

Let vj be a node, q be an integer, and rji and Aji be the values computed in the pre-processing step. The functions G(vj,q,rji) and F(vj,q,rji) are defined in the following definition.

Definition 2: G(vj,q,rji) is defined to be the optimal value of the sub-problem defined on the sub-tree Tj given that total of at least 1 and at most q nodes can be selected in Tj, and at least one node must be selected in {rj1, …, rji} ∩ Vj. The set of nodes for G(vj,q,rji) to achieve the optimal value is called the associated set of G(vj,q,rji).

Definition 3: F(vj,q,rji) is defined to be the optimal value of the sub-problem defined on the sub-tree Tj under the constraints that a total of at most q nodes can be selected in Tj and there are already some selected nodes in VVj, and the best amongst them that vj can fetch data is rji. The set of nodes for F(vj,q,rji) to achieve the optimal value is called the associated set of F(vj,q,rji).

We establish the recursive relations between G(vj,q,rji) and F(vj,q,rji) as what follows.

Leaf node vi 


The Algorithm for the Transformed Problem 

We integrate the three steps described in previous subsections to develop the algorithm, Alg_Enhan, to solve the transformed Problem 1.

  1. transforms the problem to a binary tree network if necessary,
  2. pre-processes the binary tree network and compute Aji and rji for each node vj, 1 £ i, j £ n,
  3. computes the values of G(vj,q,rji) and F(vj,q,rji) based on the recursive relations following the post order of the nodes: first compute the function values for the leaf nodes, then the internal nodes, and finally the root node. The associated sets of G(vj,q,rji) and F(vj,q,rji) are recorded at the same time. If two sets of nodes both achieve the optimal value, then the set containing the node in Vj from which it is better for vj to fetch data will be recorded, and
  4. Returns G(v1,K,r1n) together with the associated set, where v1 is the root node.

3.3 A Numerical Example for the Availability Enhancement Algorithm (Alg_Enhan)

We now use a numerical example to demonstrate the algorithm we developed for Problem 1.

Consider the binary tree network T(V, E) as shown in fig. 7, where V = {v1, v2, v3, v4, v5} and E = {e(v1, v2), e(v1, v3), e(v2, v4), e(v2, v5)}. The nodes availabilities NA(v1) = 0.3, NA(v2) = 0.1, NA(v3) = 0.6, NA(v4) = 0.7, NA(v5) = 0.9. The edges availabilities Ae(v1, v2) = 0.5, Ae(v1, v3) = 0.4, Ae(v2, v4) = 0.2, Ae(v2, v5) = 0.8. The numbers of read requests Ar(v1) = 100, Ar(v2) = 200, Ar(v3) = 300, Ar(v4) = 400, Ar(v5) = 500. The tree network is illustrated in the following figure. We further assume that K = 2.

Figure 7. A sample network for algorithm numerical demonstration.

Pre-processing: 

Since it is a binary tree network, transformation is not needed. We compute Aji and rji in this subsection.

Root node v1 

The success probability to fetch data from v1 = 0.3.

The success probability to fetch data from v2 = 0.5×0.1 = 0.05.

The success probability to fetch data from v3 = 0.4×0.6 = 0.24.

The success probability to fetch data from v4 = 0.5×0.2×0.7 = 0.07.

The success probability to fetch data from v5 = 0.5×0.8×0.9 = 0.36.

We sort the success probabilities and record A1i and the corresponding r1i in the following table, 1£i£5.

Table 1. Pre-processing the data access for root node v1.


Internal node v2 

The success probability to fetch data from v1 = 0.5×0.3 = 0.15.

The success probability to fetch data from v2 = 0.1.

The success probability to fetch data from v3 = 0.5×0.4×0.6 = 0.12.

The success probability to fetch data from v4 = 0.2×0.7 = 0.14.

The success probability to fetch data from v5 = 0.8×0.9 = 0.72.

We sort the success probabilities and record A2i and the corresponding r2i in the following table, 1£i£5.

Table 2. Pre-processing the data access for internal node v2.


Leaf node v3 

The success probability to fetch data from v1 = 0.4×0.3 = 0.12.

The success probability to fetch data from v2 = 0.4×0.5×0.1 = 0.02.

The success probability to fetch data from v3 = 0.6.

The success probability to fetch data from v4 = 0.4×0.5×0.2×0.7 = 0.028.

The success probability to fetch data from v5 = 0.4×0.5×0.8×0.9 = 0.144.

We sort the success probabilities and record A3i and the corresponding r3i in the following table, 1£i£5.

Table 3. Pre-processing the data access for leaf node v3.


Leaf node v4 

The success probability to fetch data from v1 = 0.2×0.5×0.3 = 0.03.

The success probability to fetch data from v2 = 0.2×0.1 = 0.02.

The success probability to fetch data from v3 = 0.2×0.5×0.4×0.6 = 0.024.

The success probability to fetch data from v4 = 0.7.

The success probability to fetch data from v5 = 0.2×0.8×0.9 = 0.144.

We sort the success probabilities and record A4i and the corresponding r4i in the following table, 1£i£5.

Table 4. Pre-processing the data access for leaf node v4.

Leaf node v5 

The success probability to fetch data from v1 = 0.8×0.5×0.3 = 0.12.

The success probability to fetch data from v2 = 0.8×0.1 = 0.08.

The success probability to fetch data from v3 = 0.8×0.5×0.4×0.6 = 0.096.

The success probability to fetch data from v4 = 0.8×0.2×0.7 = 0.112.

The success probability to fetch data from v5 = 0.9.

We sort the success probabilities and record A5i and the corresponding r5i in the following table, 1£i£5.

Table 5. Pre-processing the data access for leaf node v5.

Recursive function computing: 

We now compute the function values based on the pre-processing results and recursive relations developed above. We follow the post order of the nodes, which is v4v5v2v3v1. 

Leaf node v4 

The G and F function values for node v4 is summarized in the Table 6. The associated set of nodes is also recorded.

Table 6. G and F function values for node v4


Leaf node v5 

The G and F function values for node v5 is summarized in the Table 7. The associated set of nodes is also recorded.

Table 7. G and F function values for node v5.


Internal node v2 

The G and F function values for node v2 is summarized in the Table 8. The associated set of nodes is also recorded.

Table 8. G and F function values for node v2


Leaf node v3 

The G and F function values for node v3 is summarized in the Table 9. The associated set of nodes is also recorded.

Table 9. G and F function values for leaf node v3.

Root node v1 

The G and F function values for node v1 is summarized in the Table 10. The associated set of nodes is also recorded.

Table 10. G and F function values for root node v1.


According to the entry (G(v1,2,r15) = 953.2; v4,v5) in Table 10 function values for root node v1, when selecting nodes v4 and v5, the objective function in Problem 2 achieves the maximum value 953.2.

3.4 Complexity and Correctness Analysis for the Availability Enhancement Algorithm (Alg_Enhan)

Complexity Analysis for the Algorithm (Alg_Enhan) 

We now analyze the runtime complexity for the availability enhancement algorithm (Alg_Enhan). For simplicity, we assume that the computational complexity for arithmetic operations (+, -, ×, /) and comparison is O(1).

Theorem 1: The computational complexity of the algorithm is O(Kn2).

Proof. In step 1 (transformation), O(n) nodes and edges may be added to the original tree network. The corresponding parameters (e.g. probabilities and number of read requests) may also be added. So the computational complexity of step 1 is O(n).

In step 2 (pre-processing), the success probabilities for each node to fetch data from all nodes are computed and sorted. It is possible that the computation of a success probability costs O(n) multiplications (consider unbalanced binary tree). So the computational complexity of step 2 is O(n)×(O(nO(n)+O(nlogn)) = O(n3). However, the computational complexity can be improved to O(n2) based on the similar technique used in [25, 15].

In step 3 (computing function values), the values of two functions are computed based on the recursive relations. O(n) tables are computed for O(n) nodes. Each table contains O(nO(K) = O(Kn) entries. In the worst case, the computation of an entry requires O(K) operations. So the computational complexity of step 3 is O(nO(KnO(K) = O(K2n2). However, the computational complexity can be improved to O(Kn2) based on the similar technique used in [26, 15].

Therefore the computational complexity of the algorithm is  O(n) + O(n2) + O(Kn2) = O(Kn2) 

Correctness Analysis for the Algorithm(Alg_Enhan) 

We now show the correctness in Theorem 2 for the availability enhancement algorithm (Alg_Enhan).

Theorem 2: The algorithm Alg_Enhan outputs the optimal solution of Problem 1.

Proof. It suffices to prove the correctness of recursive relations.




4. Experimental Studies


4.1 Evaluating the Effectiveness of Availability Modeling in Tree Networks

In section II, we have shown that the proposed availability modeling algorithm in a tree network has a complexity of |V|2, where |V| is the number of nodes in the tree network. In this section, we investigate the practical complexity of our proposed algorithm. In this simulation, a variety of random tree topologies are generated. We use four parameters to control the generation of a tree: the total number of nodes (|V|), resident size (|R|), the number of reads issued from each node (Ar(u)), and  the maximum node degree (degree). A random tree is generated in a breadth-first manner, i.e., starting from the root node, at each level, a random number of node are generated for the next level based on the randomly generated node degrees, until the number of nodes specified is reached. The root of the tree always hosts a replica and the rest of |R|-1 replicas are allocated to randomly selected |R|-1 nodes. The number read on each node is randomly generated following a uniform distribution, each  in the range of (0 10). Link availability (LA) of each edge and node availability (NA) of each node in the system is generated randomly following a uniform distribution, each is in the range of (0.9 1.0). 

Table 11. G and F function values for root node v1.


To study actual performance of the algorithm, the actual runtime and memory used are determined. The memory used is dynamic and the maximum memory used is measured in the middle of the recursive algorithm execution process. The runtime is determined as the time between the start of the execution of the algorithm and end of the algorithm. Note that the actual runtime and memory used here do not necessarily mean the lowest runtime and maximum memory used, and the algorithm is implemented in Java. The parameters are set as follows:  degree = 5, each node can have 0 to 5 children, |R| = 6, and |V| is ranged from 100 to 100000. As can be seen in Table 11, the memory used by the algorithm linearly increased along the increase of |V|, while the run time of the algorithm increases along with the increase of |V|, governed by a growth function of the square of |V|. The results confirmed our theoretic analysis in Section II such that the availability modeling algorithm in a tree network has a runtime complexity of |V|2.

We then test our modeling approach against the simulated distributed environment to see how good it is. The impacts of three factors on Tree_SA(T, R) are studied: (a) the graph size |V|; (b) the graph degree, which is the average number of neighbors of a node; and (c) the size of the resident set |R|. To avoid biased access patterns and topology structures, we repeat the experimental steps 100 times. The final result is the average of the 100 trials. The results are shown in Fig. 8 (a), Fig. 8 (b), and Fig. 8 (c).

Figure 8(a). The impact of resident set size.

Fig.8(a) shows the impact of resident set size on the effectiveness of the three replica allocation algorithms. The parameters are set as follows: |V| = 100, degree = 5, and |R| is increased from 1 to 20. The result shows that data availabilities of the three replica allocation schemes increase along with the increase of |R|, which confirms the conclusion of Theorem 1. From Fig. 7(a), the two resident sets computed by the two replica allocation algorithms, Alg_Hom and Alg_Het, can always achieve higher availabilities than the availability achieved by the resident set that is computed by the randomized replication allocation algorithm. However, the difference on the availabilities of the resident sets between the randomized and the two algorithms, Alg_Hom and Alg_Het, decreases along with increase of |R|. Also, the availability increase rate decreases along with increase of |R|. The reason is obvious. With a larger |R|, the average distance from a user to service/data site (number of edges in the path) will be reduced and thus the service/data reach availability should be increased. Thus, the space of service availability enhancement will also be reduced when availability reaches a high level.

Figure 8(b). The impact of node degree.

Fig.8(b) shows the impact of node degree on the effectiveness of the three replica allocation algorithms. The parameters are set as follows: |V| = 100, |R| = 6, and degree is increased from 2 to 10. The result shows that data availabilities of the three replica allocation schemes increase along with the increase of node degree. From Fig. 7(b), the two resident sets computed by the two replica allocation algorithms, Alg_Hom and Alg_Het, can always achieve higher availabilities than the availability achieved by the resident set that is computed by the randomized replication allocation algorithm. However, the difference on the availabilities of the resident sets between the randomized and the two algorithms, Alg_Hom and Alg_Het, decreases along with increase of node degree. The availabilities increase rate also slows down when node degree reach a certain size (e.g., degree = 5). The reason is obvious. With a larger node degree, the height of the tree will be reduced. Therefore, the average distance from a user to service/data site (number of edges in the path) will be reduced and thus the service/data reach availability should be increased. Thus, the space of service availability enhancement will also be reduced when availability reaches a high level.

Figure 8(c). The impact of graph size on data availability.

Fig.8(c) shows the impact of graph size on the data availability modeled and computed by Tree_SA(T, R). The parameters are set as follows: |R| = 6, degree = 5 and |V| is increased from 100 to 1000. The result shows that data availability decreases along with increase of graph size. The reason is obvious. With a larger |V|, the average distance from a user to service/data site (number of edges in the path) will be increased and thus the service/data reach availability should decrease.

4.2 Evaluating the Effectiveness of Availability Enhancement Algorithms in Tree Networks

In this section, we test two replica allocation algorithms we developed in this research in a simulated distributed environment to see how effective they are on enhancing service availabilities. Specifically, we compare the availability enhanced algorithm, Alg_Enhan, with the randomized replica allocation scheme, Alg_Ran. In this simulation, a variety of random tree networks are generated. We use to simulate a tree with 30 nodes which have a variety of resident size (|R. A random tree is generated in a breadth-first manner, i.e., starting from the root node, at each level, a random number of the node are generated for the next level based on the randomly generated node degrees, until the number of nodes specified is reached. The entire tree hosts |R| replicas. The number read on each node is randomly generated following a uniform distribution, each is in the range of (0, 10). Link availabilities of all the edges and node availabilities of all the nodes in the system are generated randomly following a uniform distribution in the range of (0.9, 1.0). For the randomized allocation scheme, replicas are allocated to the randomly selected |R| nodes in the system. For algorithms Alg_Enhan, we will first implement the algorithm based on a tree network, and then use this algorithm to compute the resident sets in the simulated tree network.  Finally, we will apply our availability computing algorithm shown in Fig. 3 to compute the service availabilities for the two resident sets in the simulated tree network.

Our goal is to test which replica allocation scheme is more effective in enhancing overall service availability in a tree network. The impacts of the resident set size |R| (from 1 to 6). The experiments are repeated 10 times and the results are shown in Fig. 9.


Figure 9. The impact of resident set size on improved availability.

Fig.9 shows the impact of resident set size on the effectiveness of the three replica allocation algorithms. The parameters are set as follows: |V| = 30, degree = 5, and |R| is increased from 1 to 6. The result shows that data availabilities of the two replica allocation schemes increase along with the increase of |R|, which confirms the conclusion of Theorem 1. From Fig. 8, the resident set computed by Alg_Enhan can always achieve higher availabilities than the availability achieved by the resident set that is computed by Alg_Ran. However, the difference on the availabilities of the resident sets between Alg_Ran and Alg_Enhan decreases along the increase of |R| (from 0.05 to 0.002). Also, the availability increase rate decreases along the increase of |R|. The reason is obvious. With a larger |R|, the average distance from a user to service/data site (number of edges in the path) will be reduced and thus the service/data reach availability should be increased. Thus, the space of service availability enhancement will also be reduced when availability reaches a high level. From the above experimental study, we can conclude that when the replica size is small, the replica location has the bigger impact on availability. Hence, it is more critical to employ a better replica allocation algorithm to improve service availability, such as Alg_Enhan.

5. Related Work


Network reliability has been well studied and a large amount of works have been conducted on this topic [15, 23, 24, 31]. The modeling and computation of network reliability is similar to (but not the same as) a specific case of network availability modeling for a distributed system, i.e., modeling the availability of a distributed system with no replication. For such a system, the data availability can be computed by computing the reach availability from each node to the destination node (the node with the data). The difference between availability and reliability modeling is that the unavailability of a node does not affect the reaching probability in our data availability modeling (in reliability, the node unavailability will affect the reliability).  However, the availability modeling in a distributed system with data replication is very different from reliability modeling. The data availability in the system to a node w (actually the clients associated with such node) is affected by multiple nodes (each with a copy of the data) at the same time, i.e., if a copy of data on node u is not available to node w, then another node v can provide availability to w. Therefore, computing and modeling data availability to node w not only needs to compute the reach probability to multiple destination nodes (for example u and v), but also needs to consider their impact on each other.

Availability has long been considered as a critical issue for systems providing replicated services. In [8, 9], the authors explored the benefits of dynamically trading consistency for availability using a continuous consistency model. The availability model is a function of consistency, workload, and fault load. If consistency requirement is greatly relaxed, availability can be very high. In data storage systems such as Ocean store [14, 27] and PASIS [28], both pure replication and data fragmentation approaches are considered. Their availability models only consider node availability. [11] and [12] both present an analytical model for determining the overall availability of data stored in a data grid or P2P system. The availability model captures the characteristics of peer-to-peer systems and grid systems. The availability of replica location service (metadata service) of data grids or P2P systems can be considered independent of data storage availability.

All the above works on data availability computation only consider the availability of the service or data sites and do not consider network link availability. In [6, 7], the authors propose a model for studying the availability of replicated systems. The system availability is computed in a stochastic graph, i.e. each node and each link are assigned a specific failure probability. The overall system availability is computed exhaustively using a state enumeration method. However, the runtime complexity of the state enumeration method is exponential in the number of nodes in the system in a tree network. Also, it does not give a solution to compute data availability with multiple replicas in the system. In our previous research in [17, 29], we proposed effective availability modeling and computing approaches that consider both node and network link failures, for systems with the tree, ring, and general graph topologies. In [18], we provided two availability enhancement algorithms for a tree network with homogeneous node availability and link availability, which is a special case for the problem we consider in this research. Hence, a solution is needed for the general case, i.e., each node availability and link (network edge) availability are heterogeneous.

6. Conclusions


In this paper, we address the system service or data availability modeling and enhancement approaches for distributed systems with data replication.  We first provide an efficient availability computing algorithm for tree networks. We then analyze the availability enhancement problem and formalized availability enhancement in a general tree network as Problem 1. Then, the problems are transformed into a special p-median problem and a recursive algorithm-based solution, Alg_Enhan is developed to maximize the service availabilities for distributed systems with a tree topology. The algorithm Alg_Enhan considers the impact of replica location on overall service availability and the impact of network node and edge heterogeneity.  Theorems have shown that the algorithm Alg_Enhan is efficient with a runtime complexity of O(K|V|2), where K is the number of replicas and |V| is the number of nodes in the network. Finally, experimental studies show that the algorithm Alg_Enhan can compute resident sets with higher availabilities than those computed by the randomized algorithm, but it is more effective when the number of replicas is small.

References

[1]     J. Hennessy, “The future of systems research”, Computer, vol. 32, no. 8, pp. 27-33, 1999.

[2]     J. Liu and H. Shen, “A Low-Cost Multi-failure Resilient Replication Scheme for High Data Availability in Cloud Storage”, In High Performance Computing (HiPC), 2016 IEEE 23rd International Conference on (pp. 242-251),2016.

[3]     K. Meenakshi and S.B. Singh, “Availability Assessment of Multi-State System by Hybrid Universal Generating Function and Probability Intervals”, International Journal of Performability Engineering, Vol.12, No. 4, pp. 321-339, 2016.

[4]     S. Ren, Y. Yang, Y. Chen and Y. Du, “Fluctuation analysis of instantaneous availability under specific distribution”, Neurocomputing, vol. 270, pp. 152-158, 2017.

[5]     M. Dahlin, B. Chandra, Lei Gao and A. Nayate, “End-to-end WAN service availability”, IEEE/ACM Transactions on Networking, vol. 11, no. 2, pp. 300-313, 2003.

[6]     G. On, J. Schmitt, and R. Steinmetz, “On availability Qos for replicated multimedia service and content”,  In Proceedings of International Workshop on Interactive Distributed Multimedia Systems, 2002.

[7]     G. On, J. Schmitt, and R. Steinmetz, “Quality of availability: replica placement for widely distributed systems”, In IWQos’03, 2003.

[8]     H. Yu and A. Vahdat, “The costs and limits of availability for replicated services”, ACM SIGOPS Operating Systems Review, vol. 35, no. 5, p. 29, 2001.

[9]     H. Yu and A. Vahdat, “The costs and limits of availability for replicated services”, ACM Transactions on Computer Systems, vol. 24, no. 1, pp. 70-113, 2006.

[10]   M. Aguilera, R. Janakiraman, and L. Xu, “Using Erasure Codes Efficiently for Storage in Distributed System,” 2005 International Conference on Dependable Systems and Networks (DSN05).

[11]   K. Ranganathan, A. Iamnitchi, and I. Foster, “Improve data availability through dynamic model-driven replication in large peer-to-peer communities”, In proceedings of 2nd IEEE/ACM International symposium on Cluster Computing and the Grid. 2002.

[12]   F. Schintke and A. Reinefeld, “Modeling Replica Availability in Large Data Grids”, Journal of Grid Computing, vol. 1, no. 2, pp. 219-227, 2003.

[13]   B. Tang, N. Jaggi, and M. Takahashi, “Achieving data K-Availability in intermittently connected sensor networks”,  In Proceedings of the 23rd International Conference on Computer Communication and Networks (ICCCN’14), Shanghai, China,  2014.

[14]   H. Weatherspoon and J. Kubiatowicz, “Erasure coding vs. replication: a quantitative comparison”, In proceedings of Peer-to-Peer Systems: First International Workshop (IPTPS), 2002.0

[15]   A. Tamir, “An O(pn2) algorithm for the p-median and related problems on tree graphs”, Operations Research Letters, vol. 19, no. 2, pp. 59-64, 1996.

[16]   V. Paxson, “End-to-end routing behavior in the Internet”, IEEE/ACM Transactions on Networking, vol. 5, no. 5, pp. 601-615, 1997.

[17]   M. Tu, D. Xu, Z. Xia, and J, Fu, “Reach availability of replicated services”, In Proceedings of the 35th IEEE International Computer Software and Application Conference, July 2011.

[18]   M. Tu, L. Xiao, and D. Xu, “Maximizing the availability of replicated services in widely distributed systems”, In Proceedings of IEEE International Conference on Software Security and Reliability, Washington DC, June, 2013.

[19]   K. Kalpakis, K. Dasgupta and O. Wolfson, “Optimal placement of replicas in trees with read, write, and storage costs”, IEEE Transactions on Parallel and Distributed Systems, vol. 12, no. 6, pp. 628-637, 2001.

[20]   O. Wolfson and A. Milo, “The multicast policy and its relationship to replicated data placement”, ACM Transactions on Database Systems, vol. 16, no. 1, pp. 181-205, 1991.

[21]   O. Wolfson, S. Jajodia and Y. Huang, “An adaptive data replication algorithm”, ACM Transactions on Database Systems, vol. 22, no. 2, pp. 255-314, 1997.

[22]   O. Kariv and S. Hakimi, “An Algorithmic Approach to Network Location Problems. II: The p-Medians”, SIAM Journal on Applied Mathematics, vol. 37, no. 3, pp. 539-560, 1979.

[23]   L. Fratta and U. Montanari, “A Recursive Method Based on Case Analysis for Computing Network Terminal Reliability”, IEEE Transactions on Communications, vol. 26, no. 8, pp. 1166-1177, 1978.

[24]   J. Fussell, “How to Hand-Calculate System Reliability and Safety Characteristics”, IEEE Transactions on Reliability, vol. -24, no. 3, pp. 169-174, 1975.

[25]   “On the location of a tree-shaped facility”, Location Science, vol. 5, no. 1, p. 63, 1997.

[26]   M. M. Halldórsson, K. Iwano, N. Katoh, and T. Tokuyama, “Finding Subsets Maximizing Minimum Structures,” SIAM Journal on Discrete Mathematics, vol. 12, no. 3, pp. 342–359, 1999.

[27]   J. Kubiatowicz, C. Wells, B. Zhao, D. Bindel, Y. Chen, S. Czerwinski, P. Eaton, D. Geels, R. Gummadi, S. Rhea and H. Weatherspoon, “OceanStore”, ACM SIGARCH Computer Architecture News, vol. 28, no. 5, pp. 190-201, 2000.

[28]   J. Wylie, M. Bakkaloglu, V. Pandurangan, M. Bigrigg, S. Oguz, K. Tew, C. Williams, G. Ganger, and P. Khosla, “Selecting the right data distribution scheme for a survivable storage system”,  Technical Report CMU-CS-01-120, Carnegie Mellon University, 2000.

[29]   M. Tu, H. Ma, I. Yen, F. Bastani, and D. Xu, “Availability, security, access performance and load balance in P2P data grid”, Journal of Grid Computing, Vol. 11, No. 1, 2013.

[30]   A. Rahmani, Z. Fadaie and A. Chronopoulos, “Data placement using Dewey Encoding in a hierarchical data grid”, Journal of Network and Computer Applications, vol. 49, pp. 88-98, 2015.

[31]   O. Theologou and J. Carlier, “Factoring and reductions for networks with imperfect vertices”, IEEE Transactions on Reliability, vol. 40, no. 2, pp. 210-217, 1991.

Authors 


Michael Tu is a Ph.D. in Computer Science, Associate Professor of Computer Information Technology, director of the Center for Cyber Security and Infrastructure Protection at Purdue University Northwest. Dr. Tu’s areas of expertise are information assurance, digital forensics, and cybersecurity education. He has published over 40 peer reviewed papers in prestigious journals and peer reviewed conference proceedings. Dr. Tu is a Certified Information System Security Professional (CISSP), Certified Ethical Hacker (CEH), Certified Hacking and Forensics Investigator (CHFI) and AccessData Computer Examiner (ACE).

Dianxiang Xu, received the B.S., M.S., and Ph.D. degrees in computer science from Nanjing University, Nanjing, China. He is a Professor with the Department of Computer Science, Boise State University, Boise, ID, USA. His research interests include software security and safety, software engineering, access control, and software-defined systems. Dr. Xu has published over 100 peer reviewed papers in prestigious journals and peer reviewed conference proceedings.

Liangliang Xiao, received the M.S., and Ph.D. degrees in computer science from The University of Texas at Dallas, Richardson, USA. He is an Associate Professor with the Department of Computer Science & Information Technologies, Frostburg State University, Frostburg, MD, USA. His research interests include data security and distributed systems. Dr. Xiao has published over 30 peer reviewed papers in prestigious journals and peer reviewed conference proceedings.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Information

This entry was posted on February 13, 2019 by .
%d bloggers like this: