Manghui Tu^{1}, Liangliang Xiao^{2} and Dianxiang Xu^{3}

^{1}Department of CITG, Purdue University Northwest, Hammond, Indiana

^{2}Department of CSIT, Frostburg State University, Frostburg, Maryland

^{3}Department of Computer Science, Boise State University, Boise, Idaho

**Abstract**

*Mechanism to improve data or service availability is critical for an enterprise to ensure the quality of service in terms of availability. Replication has been used to improve system availability. The number and location of the replicas are two impact factors on availability. In this paper, we will consider the impact of the node and network edge failures on the availability of replicated data or services**. The Effective availability modeling approach is designed and efficient availability computing algorithms are developed to model and compute availability of replicated services for systems with the tree topology. The availability enhancement problem (maximizing the objective function) is transformed to a p-median problem (minimizing the objective function) through re-define the availability enhancement problem. An efficient replica allocation algorithm is developed to improve data availability in tree networks, with a runtime complexity of O(K|V| ^{2}), where K is the number of replicas and |V| is the number of nodes in the tree network. Finally, experimental studies have been conducted to evaluate how efficient and effective the proposed availability computing algorithm and the availability enhancement algorithm on improving the availability of replicated data or services. The results show that the proposed solutions are efficient and effective on availability computing and availability enhancement.*

**Keywords**

*Network Protocols, Wireless Network, Mobile Network, Virus, Worms & Trojon*

**1. Introduction**

It is well known that the utility of the data and service is mainly limited by availability [1, 2, 3, 4], and the dynamic environment and unreliable internet connectivity raise the concerns on availability of distributed systems [5, 6, 7, 8, 9]. Replication and erasure coding techniques [7, 8, 9, 10, 11, 12, 30] have been used to enhance data availability. However, the data object that is replicated also impacts the effective availability. Replicating data at highly available sites will result in high data availability. Also, widely distributing data can result in higher availability than distributing data in local clusters. In [8, 9], it has shown that even 100% local availability does not necessarily provide high availability to end users. The network failures may, for 1.5-2 %of the time, prevent clients from successfully accessing a cluster [5, 6, 7, 13]. Thus, the location of data stored in the system plays a significant role in overall system availability.

Most of the existing research on system availability consider the availability of the storage nodes only [11, 12, 14]. Some other research also considers the impact of data consistency on service availability [8, 9]. All these research do not consider the reach availability of users’ requests to service/data sites. As discussed earlier, availability of the network links also impacts service data availability, especially the reach availability of users’ requests. In [6, 7], network link availability is considered, but the evaluation algorithm itself requires exponential computation time relative to the number of storage nodes even in a tree graph with a single replica. Thus, it is not feasible to compute the availability of distributed systems with replications. Also, the work in [6, 7] does not provide an availability model in a system with more than one replica, i.e, the system has multiple service/data sites available to serve a user’s request.

In this paper, we consider modeling the service/data reach availability (i.e., the availability of service/data to remote requests) in distributed systems in a tree network. Data or services are replicated in the system to improve both performance and availability. Since the system has multiple service/data sites available to serve a user’s request, a single site failure does not lead to the unavailability to user’s request. Therefore, our problem is fundamentally different from network reliability problem (reliable only ensures the access to a single site) [15]. To assess system availability, efficient algorithms are developed to compute system availability considering both node and link failures.

The remainder of the paper is organized as follows. System modeling is described in Section II. An algorithm on computing system availability in a tree network is presented in Section III. Section IV, issues on how to reduce the runtime complexity and how to improve the service/data availability will be discussed. Efficient replica allocation algorithms to improve system availability will be introduced in Section V. Experimental studies conducted will be described in Section VI to give some numerical results of the proposed replica allocation algorithm applied in a system with tree network topology. Section VII briefly discusses some related works in availability modeling and enhancement and Section VIII concludes the paper.

**2. System & Availability Modeling**

The Study shows that user access requests usually follow a constant path routing to data sites and most routes are stable in days or even weeks [16]. So the data access routes in such a distributed system can be modeled as a tree graph, *T* = (*V*, *E*), as shown in Fig.1. Let *e*(*u*, *v*) denote the edge that directly links two nodes *u* and *v*, and *d*(*u*, *v*) denote a single path between *u *and *v*. Note that *e*(*u*, *v*) itself is a path between *u* and *v* with a single edge. The system hosts a set of data objects replicated to different nodes. For simplicity, we consider a single data object *d*. The set of nodes hosting replicas is defined as the** residence set**, *R* = {*u*|* u* Î* V* and *u* holds a replica}, and |*R*| is the number of replicas. A read request needs to access the closest node to the request. If it is not available, then, the request will be further forwarded to another node. Let *r* denote a read request and *A ^{r}*(

**Figure 1**. A sample tree network with replica.

In a distributed system, data or service availability may be affected by other factors such as node up/downtime, system load, data consistency, and network failure or congestion [5, 8, 9]. According to [8, 9], weaker consistency would result in higher availability. Here, we assume that users are allowed to read stale data and, hence, the effect of data inconsistency will have no impact on data availability. Also, due to replication, the overloading problem is not an issue, hence, the impact of system load on data availability is not considered.

**2.1. Data Availability Analysis**

If the preferred node in R is not available, then the read request will be forwarded and served at the next available node. This process repeats until it is finally served by one of the replicas, or finally none of the replica can serve the read request, by choosing different combinations of paths. The access path of such a read request can be modeled as a directed graph, and thus both the network topology and the location of the read requests originally issued will affect data availability. Therefore, to compute data availability, we need to consider requests originated from different nodes.

**2.2. Data Availability Modeling in a Tree Network**

**Figure 2**. A sample tree graph rooted at u.

Subsequently, *S _{A}*(

The algorithm *Tree_S _{A}*(

**Figure 3**. The algorithm Tree_SA(T, R) in tree graph T.

In Theorem 1, we show that the availability *d* in tree *T*, *S _{A}*(

**Theorem 1.** Let *R*_{1} and *R*_{2} denote two resident sets of *T* and *R*_{1} Í *R*_{2}, then *S _{A}*(

Proof: Consider *R*_{1} = *R*_{2}. According to the definition of *S _{A}*(

Consider *R*_{1} Ì *R*_{2}. For each request in a node *u*, it has at least one more node that can provide a replica of data object *d* to access. Without loss of generality, assume that |*R*_{2}| =|*R*_{1}| +1, and *R*_{2} = *R*_{1} È {*v*}. According to the availability model, the unavailability of *v*, 1− *N _{A}*(

We now show that the runtime complexity of our proposed availability computing algorithm in a tree *T* is in the order of the square of the number of nodes in *T*.

**Theorem 2. **The runtime complexity of algorithm *Tree_S _{A}*(

**Proof.** It suffices to prove the correctness of recursive relations.

Let *v _{j}* be a leaf node. If

If *r _{j}^{i}*Î

Let *v _{j}* be an internal node. If

In case 1 we consider that *r _{j}^{i}*=

In order to prove that, we must show (1) the optimal solution appears either in *G*(*v _{j}*,

First, if a node in {*r _{j}*

Second, *A ^{r}*(

In case 2 we consider *r _{j}^{i}*Î

First, if a node in {*r _{j}*

Second, if the associated set of *G*(*v _{j}*

In case 3 we consider *r _{j}^{i}*Î

If *r _{j}^{i}*Î

Otherwise we assume that no nodes in {*r _{j}*

*A ^{r}*(

Finally, the inconsistent values in *A ^{r}*(

**3. System Availability Enhancement**

In this research, we proposed to develop mechanisms to optimize data availability in a system with a fixed number of replicas, *K*, i.e., to find a resident set with a size of *K* such that the overall data availability in a tree graph is optimal. It has been shown in our previous work in [17, 18] that reducing average distance of the paths (number of hops) from user’s requests to service/data sites can result in high availability, as the shorter the distance (the smaller the number of hops), the higher the path availability from a requester to its closest replica. Therefore, a potential approach to improve the overall system availability is to allocate data replicas in the system such the average distance of the paths from user’s requests to service sites is minimal.

**3.1 Availability Enhancement Problem Modeling**

The replica allocation problem to optimize data availability is much more complex than the traditional optimal resident set problems [19, 20, 21]. For traditional resident set problem, each request will be served by a single site (usually the closest site) and the corresponding communication cost will be computed as the final communication cost for such a request. Therefore, each request will be associated with a single data replica site only. However, with the availability model defined in this research, a request may need to find another available site if the current accessing site is not available. This process will repeat till such access is served by a replica site or is not served after all replica sites have been tried but none of them are available. Thus, each request should be associated with all sites within the resident set for data availability computing. To find the resident set with maximal data availability in a tree network, the computation complexity could be too high to be feasible. For example, an intuitive approach is to try every combination of the resident set in the tree and the runtime complexity would be |*V*|^{|R|}. Thus, an approximate solution is to apply the property observed in [17, 18] that the smaller the number of hops from a request to the resident set, the higher the data availability.

First, since a request will be forwarded to a secondary replica only the primary replica node cannot serve such request, and such an event has a very low probability (node availability is usually very high), we can assume that a request is only associated with the primary replica node to reduce the complexity (note that this assumption is only valid for replica allocation purpose, a request is still associated with all replicas when computing availability). Thus, our problem can be reduced to a resident set problem **maximizing** the total of the product of the number of reading requests and the path availability of each request to the resident set. i.e., Σνεv(A^{γ}(ν)*(A_{δ}(ν,R)*N_{A}(ν))), where is the path availability from a request issued at node *v* to a replica site in the resident set *R*.

Second, we all understand that the heterogeneity of the node (replica site) availability can have a big impact on the overall system availability. Here, we will show that the heterogeneity of the edge availability can have big impact on the overall system availability. In a system with line topology shown Fig. 4, the allocation of two replicas to optimize the availability without considering the heterogeneity of edge availability is shown in allocation scheme *A*, and the allocation considering the heterogeneity of edge availability is shown in allocation scheme *B* (with homogeneous node availability of 0.999). In Scheme *B*, the service to requests at node *v*_{1} is available locally with availability of 0.999, and requests need to be forwarded to node *v*_{3 }only if node *v*_{1} is not available (with a probability of 0.001). Requests at other nodes can be served by the replica at node *v*_{3} with very high probability, and only need to be forwarded to node *v*_{1} if node *v*_{3} is not available or edge *e*(*v*_{2}, *v*_{3}) is not available (for requests from node* v*_{2}). The overall normalized data availability in system with replica allocation scheme *B* is around 0.96 by applying the availability computing algorithm described in Fig. 3. In Scheme *A*, the service to requests at node *v*_{1} is not available locally and thus need to be forwarded to node *v*_{2 }or* v*_{3} where the availability of edge *e*(*v*_{1}, *v*_{2}) is 0.5. The overall normalized data availability in system with replica allocation scheme *A* will definitely be lower than 0.90.

**Figure 4**. The impact of the edge availability on system availability.

In our previous work in [18], solutions have been given to availability optimization problem, by considering the homogeneous of node availability or network link availability, or both. However, these solutions are only partial solutions to Problem 1.

**3.2 The Availability Enhancement Algorithm**

Problem 1 is an optimization problem similar to the *p*-median problem [22, 15]. In a *p*-median problem, it is given a graph *G*(*V*, *E*) where *V *= {*v*_{1}, *v*_{2}, …,*v _{n}*}. Each node

Problem 1 and *p*-median problem have some common issues. They both consider weighted tree networks where each node/edge is assigned value(s). They both look for optimal solutions to the objective functions. Problem 1 looks for the maximum value of an objective function while *p*-median problem looks for the minimum value of another objective function. It is therefore possible for us to borrow some ideas of the solution to the *p*-median problem to develop the solution to Problem 1.

However, it is difficult to directly apply the solution to the *p*-median problem to Problem 1 since they are different. Consider the *p*-median problem. Suppose that a node *v _{j}* is selected and the edges

__Problem Transformation and Re-definition____ __

A three step solution is provided to Problem 1:

- Transformation: reduce Problem 1 from an arbitrary tree network to a binary tree network.
- Pre-processing: compute
*A*and_{j}^{i}*r*for each node_{j}^{i}*v*, 1 £_{j}*i*,*j*£*n* - Computing function values: compute the values of two functions
*G*(*v*,_{j}*q*,*r*) and_{j}^{i}*F*(*v*,_{j}*q*,*r*) based on their recursive relations._{j}^{i}

*Step 1. Transformation*

*Given an arbitrary tree network T(V, E), *where *V *= {*v*_{1}, *v*_{2}, …,*v _{n}*},

**Case1. **Suppose that an internal node *v _{j}* has exactly one child node (Fig. 5(a)). Without loss of generality, we assume that

**Figure 5**. Node with a single child node (a) is transformed to (b)

Since *N _{A}*(

**Case 2. **Suppose that an internal node *v _{j}* has more than 3 childe nodes

**Figure 6(a)**. Case 2, the node with over 3 child nodes.

**Figure 6(b)**. Transformed graph for the graph shown in Fig. 6 (a).

First, it can be verified that the success availability for node *v _{j}* (

Based on the previous discussion, Problem 1 can be efficiently transformed from an arbitrary tree network to a binary tree network. The solution to the transformed problem can be easily transformed back to the original problem.

*Step 2. Pre-processing:** *

In this step we compute *A _{j}^{i}* and the corresponding

**Definition 1:** *A _{j}*

Given a binary tree network, we compute the success probability* p _{ij}* for node

*Step 3. **Computing recursive function values*

Let *v _{j}* be a node,

**Definition 2: ***G*(*v _{j}*,

**Definition 3: ***F*(*v _{j}*,

We establish the recursive relations between *G*(*v _{j}*,

**Leaf node v_{i}**

__The Algorithm for the Transformed Problem____ __

We integrate the three steps described in previous subsections to develop the algorithm, *Alg_Enhan*, to solve the transformed Problem 1.

- transforms the problem to a binary tree network if necessary,
- pre-processes the binary tree network and compute
*A*and_{j}^{i}*r*for each node_{j}^{i}*v*, 1 £_{j}*i*,*j*£*n*, - computes the values of
*G*(*v*,_{j}*q*,*r*) and_{j}^{i}*F*(*v*,_{j}*q*,*r*) based on the recursive relations following the post order of the nodes: first compute the function values for the leaf nodes, then the internal nodes, and finally the root node. The associated sets of_{j}^{i}*G*(*v*,_{j}*q*,*r*) and_{j}^{i}*F*(*v*,_{j}*q*,*r*) are recorded at the same time. If two sets of nodes both achieve the optimal value, then the set containing the node in_{j}^{i}*V*from which it is better for_{j}*v*to fetch data will be recorded, and_{j} - Returns
*G*(*v*_{1},*K*,*r*_{1}) together with the associated set, where^{n}*v*_{1}is the root node.

**3.3 A Numerical Example for the Availability Enhancement Algorithm (Alg_Enhan)**

We now use a numerical example to demonstrate the algorithm we developed for Problem 1.

Consider the binary tree network *T*(*V*, *E*) as shown in fig. 7, where *V* = {*v*_{1}, *v*_{2}, v_{3}, *v*_{4}, *v*_{5}} and *E* = {*e*(*v*_{1},* v*_{2}), e(*v*_{1},* v*_{3}), *e*(*v*_{2},* v*_{4}), *e*(*v*_{2},* v*_{5})}. The nodes availabilities *N _{A}*(

**Figure 7**. A sample network for algorithm numerical demonstration.

*Pre-processing:** *

Since it is a binary tree network, transformation is not needed. We compute *A _{j}^{i}* and

**Root node v_{1}**

The success probability to fetch data from *v*_{1} = 0.3.

The success probability to fetch data from *v*_{2} = 0.5×0.1 = 0.05.

The success probability to fetch data from *v*_{3} = 0.4×0.6 = 0.24.

The success probability to fetch data from *v*_{4} = 0.5×0.2×0.7 = 0.07.

The success probability to fetch data from *v*_{5} = 0.5×0.8×0.9 = 0.36.

We sort the success probabilities and record *A*_{1}* ^{i}* and the corresponding

**Table 1**. Pre-processing the data access for root node v1.

**Internal node v_{2}**

The success probability to fetch data from *v*_{1} = 0.5×0.3 = 0.15.

The success probability to fetch data from *v*_{2} = 0.1.

The success probability to fetch data from *v*_{3} = 0.5×0.4×0.6 = 0.12.

The success probability to fetch data from *v*_{4} = 0.2×0.7 = 0.14.

The success probability to fetch data from *v*_{5} = 0.8×0.9 = 0.72.

We sort the success probabilities and record *A*_{2}* ^{i}* and the corresponding

**Table 2**. Pre-processing the data access for internal node v2.

**Leaf node v_{3}**

The success probability to fetch data from *v*_{1} = 0.4×0.3 = 0.12.

The success probability to fetch data from *v*_{2} = 0.4×0.5×0.1 = 0.02.

The success probability to fetch data from *v*_{3} = 0.6.

The success probability to fetch data from *v*_{4} = 0.4×0.5×0.2×0.7 = 0.028.

The success probability to fetch data from *v*_{5} = 0.4×0.5×0.8×0.9 = 0.144.

We sort the success probabilities and record *A*_{3}* ^{i}* and the corresponding

**Table 3**. Pre-processing the data access for leaf node v3.

**Leaf node v_{4}**

The success probability to fetch data from *v*_{1} = 0.2×0.5×0.3 = 0.03.

The success probability to fetch data from *v*_{2} = 0.2×0.1 = 0.02.

The success probability to fetch data from *v*_{3} = 0.2×0.5×0.4×0.6 = 0.024.

The success probability to fetch data from *v*_{4} = 0.7.

The success probability to fetch data from *v*_{5} = 0.2×0.8×0.9 = 0.144.

We sort the success probabilities and record *A*_{4}* ^{i}* and the corresponding

**Table 4**. Pre-processing the data access for leaf node v4.

**Leaf node v_{5}**

The success probability to fetch data from *v*_{1} = 0.8×0.5×0.3 = 0.12.

The success probability to fetch data from *v*_{2} = 0.8×0.1 = 0.08.

The success probability to fetch data from *v*_{3} = 0.8×0.5×0.4×0.6 = 0.096.

The success probability to fetch data from *v*_{4} = 0.8×0.2×0.7 = 0.112.

The success probability to fetch data from *v*_{5} = 0.9.

We sort the success probabilities and record *A*_{5}* ^{i}* and the corresponding

**Table 5**. Pre-processing the data access for leaf node v5.

*Recursive function computing:** *

We now compute the function values based on the pre-processing results and recursive relations developed above. We follow the post order of the nodes, which is *v*_{4} → *v*_{5} → *v*_{2} → *v*_{3} → *v*_{1}.** **

**Leaf node v_{4}**

The G and F function values for node *v*_{4} is summarized in the Table 6. The associated set of nodes is also recorded.

**Table 6**. G and F function values for node v4

**Leaf node v_{5}**

The G and F function values for node *v*_{5} is summarized in the Table 7. The associated set of nodes is also recorded.

**Table 7**. G and F function values for node v5.

**Internal node v_{2}**

The G and F function values for node *v*_{2} is summarized in the Table 8. The associated set of nodes is also recorded.

**Table 8**. G and F function values for node v2

**Leaf node v_{3}**

The G and F function values for node *v*_{3} is summarized in the Table 9. The associated set of nodes is also recorded.

**Table 9**. G and F function values for leaf node v3.

**Root node v_{1}**

The G and F function values for node *v*_{1} is summarized in the Table 10. The associated set of nodes is also recorded.

**Table 10**. G and F function values for root node v1.

According to the entry (*G*(*v*_{1},2,*r*_{1}^{5}) = 953.2; *v*_{4},*v*_{5}) in Table 10 function values for root node *v*_{1}, when selecting nodes *v*_{4} and *v*_{5}, the objective function in Problem 2 achieves the maximum value 953.2.

**3.4 Complexity and Correctness Analysis for the Availability Enhancement Algorithm (Alg_Enhan)**

__Complexity Analysis for the Algorithm (____Alg_Enhan____)____ __

We now analyze the runtime complexity for the availability enhancement algorithm __(____Alg_Enhan____)__. For simplicity, we assume that the computational complexity for arithmetic operations (+, -, ×, /) and comparison is *O*(1).

**Theorem 1:** The computational complexity of the algorithm is *O*(*Kn*^{2}).

**Proof.** In step 1 (transformation), *O*(*n*) nodes and edges may be added to the original tree network. The corresponding parameters (e.g. probabilities and number of read requests) may also be added. So the computational complexity of step 1 is *O*(*n*).

In step 2 (pre-processing), the success probabilities for each node to fetch data from all nodes are computed and sorted. It is possible that the computation of a success probability costs *O*(*n*) multiplications (consider unbalanced binary tree). So the computational complexity of step 2 is *O*(*n*)×(*O*(*n*)×*O*(*n*)+*O*(*n*log*n*)) = *O*(*n*^{3}). However, the computational complexity can be improved to *O*(*n*^{2}) based on the similar technique used in [25, 15].

In step 3 (computing function values), the values of two functions are computed based on the recursive relations. *O*(*n*) tables are computed for *O*(*n*) nodes. Each table contains *O*(*n*)×*O*(*K*) = *O*(*Kn*) entries. In the worst case, the computation of an entry requires *O*(*K*) operations. So the computational complexity of step 3 is *O*(*n*)×*O*(*Kn*)×*O*(*K*) = *O*(*K*^{2}*n*^{2}). However, the computational complexity can be improved to *O*(*Kn*^{2}) based on the similar technique used in [26, 15].

Therefore the computational complexity of the algorithm is *O*(*n*) + *O*(*n*^{2}) + *O*(*Kn*^{2}) = *O*(*Kn*^{2})

__Correctness Analysis for the Algorithm(____Alg_Enhan____)____ __

We now show the correctness in Theorem 2 for the availability enhancement algorithm __(____Alg_Enhan____)__.

**Theorem 2:** The algorithm *Alg_Enhan* outputs the optimal solution of Problem 1.

**Proof.** It suffices to prove the correctness of recursive relations.

**4. Experimental Studies**

**4.1 Evaluating the Effectiveness of Availability Modeling in Tree Networks**

In section II, we have shown that the proposed availability modeling algorithm in a tree network has a complexity of |*V*|^{2}, where |*V*| is the number of nodes in the tree network. In this section, we investigate the practical complexity of our proposed algorithm. In this simulation, a variety of random tree topologies are generated. We use four parameters to control the generation of a tree: the total number of nodes (|*V*|), resident size (|*R*|), the number of reads issued from each node (*A ^{r}*(

**Table 11**. G and F function values for root node v1.

To study actual performance of the algorithm, the actual runtime and memory used are determined. The memory used is dynamic and the maximum memory used is measured in the middle of the recursive algorithm execution process. The runtime is determined as the time between the start of the execution of the algorithm and end of the algorithm. Note that the actual runtime and memory used here do not necessarily mean the lowest runtime and maximum memory used, and the algorithm is implemented in Java. The parameters are set as follows: *degree* = 5, each node can have 0 to 5 children, |*R*| = 6, and |*V*| is ranged from 100 to 100000. As can be seen in Table 11, the memory used by the algorithm linearly increased along the increase of |*V*|, while the run time of the algorithm increases along with the increase of |*V*|, governed by a growth function of the square of |*V*|. The results confirmed our theoretic analysis in Section II such that the availability modeling algorithm in a tree network has a runtime complexity of |*V*|^{2}.

We then test our modeling approach against the simulated distributed environment to see how good it is. The impacts of three factors on *Tree*_*S _{A}*(

**Figure 8(a)**. The impact of resident set size.

Fig.8(a) shows the impact of resident set size on the effectiveness of the three replica allocation algorithms. The parameters are set as follows: |*V*| = 100, *degree* = 5, and |*R*| is increased from 1 to 20. The result shows that data availabilities of the three replica allocation schemes increase along with the increase of |*R*|, which confirms the conclusion of Theorem 1. From Fig. 7(a), the two resident sets computed by the two replica allocation algorithms, *Alg_Hom* and *Alg_Het*, can always achieve higher availabilities than the availability achieved by the resident set that is computed by the randomized replication allocation algorithm. However, the difference on the availabilities of the resident sets between the randomized and the two algorithms, *Alg_Hom* and *Alg_Het*, decreases along with increase of |*R*|. Also, the availability increase rate decreases along with increase of |*R*|. The reason is obvious. With a larger |*R*|, the average distance from a user to service/data site (number of edges in the path) will be reduced and thus the service/data reach availability should be increased. Thus, the space of service availability enhancement will also be reduced when availability reaches a high level.

**Figure 8(b)**. The impact of node degree.

Fig.8(b) shows the impact of node degree on the effectiveness of the three replica allocation algorithms. The parameters are set as follows: |*V*| = 100, |*R*| = 6, and *degree* is increased from 2 to 10. The result shows that data availabilities of the three replica allocation schemes increase along with the increase of *node* *degree*. From Fig. 7(b), the two resident sets computed by the two replica allocation algorithms, *Alg_Hom* and *Alg_Het*, can always achieve higher availabilities than the availability achieved by the resident set that is computed by the randomized replication allocation algorithm. However, the difference on the availabilities of the resident sets between the randomized and the two algorithms, *Alg_Hom* and *Alg_Het*, decreases along with increase of node degree. The availabilities increase rate also slows down when node degree reach a certain size (e.g., *degree* = 5). The reason is obvious. With a larger node degree, the height of the tree will be reduced. Therefore, the average distance from a user to service/data site (number of edges in the path) will be reduced and thus the service/data reach availability should be increased. Thus, the space of service availability enhancement will also be reduced when availability reaches a high level.

**Figure 8(c)**. The impact of graph size on data availability.

Fig.8(c) shows the impact of graph size on the data availability modeled and computed by *Tree*_*S _{A}*(

**4.2 Evaluating the Effectiveness of Availability Enhancement Algorithms in Tree Networks**

In this section, we test two replica allocation algorithms we developed in this research in a simulated distributed environment to see how effective they are on enhancing service availabilities. Specifically, we compare the availability enhanced algorithm, *Alg_Enhan*, with the randomized replica allocation scheme, *Alg_Ran*. In this simulation, a variety of random tree networks are generated. We use to simulate a tree with 30 nodes which have a variety of resident size (|*R*. A random tree is generated in a breadth-first manner, i.e., starting from the root node, at each level, a random number of the node are generated for the next level based on the randomly generated node degrees, until the number of nodes specified is reached. The entire tree hosts |*R*| replicas. The number read on each node is randomly generated following a uniform distribution, each is in the range of (0, 10). Link availabilities of all the edges and node availabilities of all the nodes in the system are generated randomly following a uniform distribution in the range of (0.9, 1.0). For the randomized allocation scheme, replicas are allocated to the randomly selected |*R*| nodes in the system. For algorithms *Alg_Enhan*, we will first implement the algorithm based on a tree network, and then use this algorithm to compute the resident sets in the simulated tree network. Finally, we will apply our availability computing algorithm shown in Fig. 3 to compute the service availabilities for the two resident sets in the simulated tree network.

Our goal is to test which replica allocation scheme is more effective in enhancing overall service availability in a tree network. The impacts of the resident set size |*R*| (from 1 to 6). The experiments are repeated 10 times and the results are shown in Fig. 9.

**Figure 9**. The impact of resident set size on improved availability.

Fig.9 shows the impact of resident set size on the effectiveness of the three replica allocation algorithms. The parameters are set as follows: |*V*| = 30, *degree* = 5, and |*R*| is increased from 1 to 6. The result shows that data availabilities of the two replica allocation schemes increase along with the increase of |*R*|, which confirms the conclusion of Theorem 1. From Fig. 8, the resident set computed by *Alg_Enhan* can always achieve higher availabilities than the availability achieved by the resident set that is computed by *Alg_Ran*. However, the difference on the availabilities of the resident sets between *Alg_Ran* and *Alg_Enhan* decreases along the increase of |*R*| (from 0.05 to 0.002). Also, the availability increase rate decreases along the increase of |*R*|. The reason is obvious. With a larger |*R*|, the average distance from a user to service/data site (number of edges in the path) will be reduced and thus the service/data reach availability should be increased. Thus, the space of service availability enhancement will also be reduced when availability reaches a high level. From the above experimental study, we can conclude that when the replica size is small, the replica location has the bigger impact on availability. Hence, it is more critical to employ a better replica allocation algorithm to improve service availability, such as *Alg_Enhan*.

**5. Related Work**

Network reliability has been well studied and a large amount of works have been conducted on this topic [15, 23, 24, 31]. The modeling and computation of network reliability is similar to (but not the same as) a specific case of network availability modeling for a distributed system, i.e., modeling the availability of a distributed system with no replication. For such a system, the data availability can be computed by computing the reach availability from each node to the destination node (the node with the data). The difference between availability and reliability modeling is that the unavailability of a node does not affect the reaching probability in our data availability modeling (in reliability, the node unavailability will affect the reliability). However, the availability modeling in a distributed system with data replication is very different from reliability modeling. The data availability in the system to a node *w* (actually the clients associated with such node) is affected by multiple nodes (each with a copy of the data) at the same time, i.e., if a copy of data on node *u* is not available to node *w*, then another node *v* can provide availability to *w*. Therefore, computing and modeling data availability to node *w* not only needs to compute the reach probability to multiple destination nodes (for example *u* and *v*), but also needs to consider their impact on each other.

Availability has long been considered as a critical issue for systems providing replicated services. In [8, 9], the authors explored the benefits of dynamically trading consistency for availability using a continuous consistency model. The availability model is a function of consistency, workload, and fault load. If consistency requirement is greatly relaxed, availability can be very high. In data storage systems such as Ocean store [14, 27] and PASIS [28], both pure replication and data fragmentation approaches are considered. Their availability models only consider node availability. [11] and [12] both present an analytical model for determining the overall availability of data stored in a data grid or P2P system. The availability model captures the characteristics of peer-to-peer systems and grid systems. The availability of replica location service (metadata service) of data grids or P2P systems can be considered independent of data storage availability.

All the above works on data availability computation only consider the availability of the service or data sites and do not consider network link availability. In [6, 7], the authors propose a model for studying the availability of replicated systems. The system availability is computed in a stochastic graph, i.e. each node and each link are assigned a specific failure probability. The overall system availability is computed exhaustively using a state enumeration method. However, the runtime complexity of the state enumeration method is exponential in the number of nodes in the system in a tree network. Also, it does not give a solution to compute data availability with multiple replicas in the system. In our previous research in [17, 29], we proposed effective availability modeling and computing approaches that consider both node and network link failures, for systems with the tree, ring, and general graph topologies. In [18], we provided two availability enhancement algorithms for a tree network with homogeneous node availability and link availability, which is a special case for the problem we consider in this research. Hence, a solution is needed for the general case, i.e., each node availability and link (network edge) availability are heterogeneous.

**6. Conclusions**

In this paper, we address the system service or data availability modeling and enhancement approaches for distributed systems with data replication. We first provide an efficient availability computing algorithm for tree networks. We then analyze the availability enhancement problem and formalized availability enhancement in a general tree network as Problem 1. Then, the problems are transformed into a special p-median problem and a recursive algorithm-based solution, Alg_Enhan is developed to maximize the service availabilities for distributed systems with a tree topology. The algorithm Alg_Enhan considers the impact of replica location on overall service availability and the impact of network node and edge heterogeneity. Theorems have shown that the algorithm Alg_Enhan is efficient with a runtime complexity of O(K|V|2), where K is the number of replicas and |V| is the number of nodes in the network. Finally, experimental studies show that the algorithm Alg_Enhan can compute resident sets with higher availabilities than those computed by the randomized algorithm, but it is more effective when the number of replicas is small.

**References**

[1] J. Hennessy, “The future of systems research”, Computer, vol. 32, no. 8, pp. 27-33, 1999.

[2] J. Liu and H. Shen, “A Low-Cost Multi-failure Resilient Replication Scheme for High Data Availability in Cloud Storage”, In High Performance Computing (HiPC), 2016 IEEE 23rd International Conference on (pp. 242-251),2016.

[3] K. Meenakshi and S.B. Singh, “Availability Assessment of Multi-State System by Hybrid Universal Generating Function and Probability Intervals”, International Journal of Performability Engineering, Vol.12, No. 4, pp. 321-339, 2016.

[4] S. Ren, Y. Yang, Y. Chen and Y. Du, “Fluctuation analysis of instantaneous availability under specific distribution”, Neurocomputing, vol. 270, pp. 152-158, 2017.

[5] M. Dahlin, B. Chandra, Lei Gao and A. Nayate, “End-to-end WAN service availability”, IEEE/ACM Transactions on Networking, vol. 11, no. 2, pp. 300-313, 2003.

[6] G. On, J. Schmitt, and R. Steinmetz, “On availability Qos for replicated multimedia service and content”, In Proceedings of International Workshop on Interactive Distributed Multimedia Systems, 2002.

[7] G. On, J. Schmitt, and R. Steinmetz, “Quality of availability: replica placement for widely distributed systems”, In IWQos’03, 2003.

[8] H. Yu and A. Vahdat, “The costs and limits of availability for replicated services”, ACM SIGOPS Operating Systems Review, vol. 35, no. 5, p. 29, 2001.

[9] H. Yu and A. Vahdat, “The costs and limits of availability for replicated services”, ACM Transactions on Computer Systems, vol. 24, no. 1, pp. 70-113, 2006.

[10] M. Aguilera, R. Janakiraman, and L. Xu, “Using Erasure Codes Efficiently for Storage in Distributed System,” 2005 International Conference on Dependable Systems and Networks (DSN05).

[11] K. Ranganathan, A. Iamnitchi, and I. Foster, “Improve data availability through dynamic model-driven replication in large peer-to-peer communities”, In proceedings of 2nd IEEE/ACM International symposium on Cluster Computing and the Grid. 2002.

[12] F. Schintke and A. Reinefeld, “Modeling Replica Availability in Large Data Grids”, Journal of Grid Computing, vol. 1, no. 2, pp. 219-227, 2003.

[13] B. Tang, N. Jaggi, and M. Takahashi, “Achieving data K-Availability in intermittently connected sensor networks”, In Proceedings of the 23rd International Conference on Computer Communication and Networks (ICCCN’14), Shanghai, China, 2014.

[14] H. Weatherspoon and J. Kubiatowicz, “Erasure coding vs. replication: a quantitative comparison”, In proceedings of Peer-to-Peer Systems: First International Workshop (IPTPS), 2002.0

[15] A. Tamir, “An O(pn2) algorithm for the p-median and related problems on tree graphs”, Operations Research Letters, vol. 19, no. 2, pp. 59-64, 1996.

[16] V. Paxson, “End-to-end routing behavior in the Internet”, IEEE/ACM Transactions on Networking, vol. 5, no. 5, pp. 601-615, 1997.

[17] M. Tu, D. Xu, Z. Xia, and J, Fu, “Reach availability of replicated services”, In Proceedings of the 35th IEEE International Computer Software and Application Conference, July 2011.

[18] M. Tu, L. Xiao, and D. Xu, “Maximizing the availability of replicated services in widely distributed systems”, In Proceedings of IEEE International Conference on Software Security and Reliability, Washington DC, June, 2013.

[19] K. Kalpakis, K. Dasgupta and O. Wolfson, “Optimal placement of replicas in trees with read, write, and storage costs”, IEEE Transactions on Parallel and Distributed Systems, vol. 12, no. 6, pp. 628-637, 2001.

[20] O. Wolfson and A. Milo, “The multicast policy and its relationship to replicated data placement”, ACM Transactions on Database Systems, vol. 16, no. 1, pp. 181-205, 1991.

[21] O. Wolfson, S. Jajodia and Y. Huang, “An adaptive data replication algorithm”, ACM Transactions on Database Systems, vol. 22, no. 2, pp. 255-314, 1997.

[22] O. Kariv and S. Hakimi, “An Algorithmic Approach to Network Location Problems. II: The p-Medians”, SIAM Journal on Applied Mathematics, vol. 37, no. 3, pp. 539-560, 1979.

[23] L. Fratta and U. Montanari, “A Recursive Method Based on Case Analysis for Computing Network Terminal Reliability”, IEEE Transactions on Communications, vol. 26, no. 8, pp. 1166-1177, 1978.

[24] J. Fussell, “How to Hand-Calculate System Reliability and Safety Characteristics”, IEEE Transactions on Reliability, vol. -24, no. 3, pp. 169-174, 1975.

[25] “On the location of a tree-shaped facility”, Location Science, vol. 5, no. 1, p. 63, 1997.

[26] M. M. Halldórsson, K. Iwano, N. Katoh, and T. Tokuyama, “Finding Subsets Maximizing Minimum Structures,” SIAM Journal on Discrete Mathematics, vol. 12, no. 3, pp. 342–359, 1999.

[27] J. Kubiatowicz, C. Wells, B. Zhao, D. Bindel, Y. Chen, S. Czerwinski, P. Eaton, D. Geels, R. Gummadi, S. Rhea and H. Weatherspoon, “OceanStore”, ACM SIGARCH Computer Architecture News, vol. 28, no. 5, pp. 190-201, 2000.

[28] J. Wylie, M. Bakkaloglu, V. Pandurangan, M. Bigrigg, S. Oguz, K. Tew, C. Williams, G. Ganger, and P. Khosla, “Selecting the right data distribution scheme for a survivable storage system”, Technical Report CMU-CS-01-120, Carnegie Mellon University, 2000.

[29] M. Tu, H. Ma, I. Yen, F. Bastani, and D. Xu, “Availability, security, access performance and load balance in P2P data grid”, Journal of Grid Computing, Vol. 11, No. 1, 2013.

[30] A. Rahmani, Z. Fadaie and A. Chronopoulos, “Data placement using Dewey Encoding in a hierarchical data grid”, Journal of Network and Computer Applications, vol. 49, pp. 88-98, 2015.

[31] O. Theologou and J. Carlier, “Factoring and reductions for networks with imperfect vertices”, IEEE Transactions on Reliability, vol. 40, no. 2, pp. 210-217, 1991.

**Authors**** **

**Michael Tu** is a Ph.D. in Computer Science, Associate Professor of Computer Information Technology, director of the Center for Cyber Security and Infrastructure Protection at Purdue University Northwest. Dr. Tu’s areas of expertise are information assurance, digital forensics, and cybersecurity education. He has published over 40 peer reviewed papers in prestigious journals and peer reviewed conference proceedings. Dr. Tu is a Certified Information System Security Professional (CISSP), Certified Ethical Hacker (CEH), Certified Hacking and Forensics Investigator (CHFI) and AccessData Computer Examiner (ACE).

**Dianxiang Xu,** received the B.S., M.S., and Ph.D. degrees in computer science from Nanjing University, Nanjing, China. He is a Professor with the Department of Computer Science, Boise State University, Boise, ID, USA. His research interests include software security and safety, software engineering, access control, and software-defined systems. Dr. Xu has published over 100 peer reviewed papers in prestigious journals and peer reviewed conference proceedings.

**Liangliang Xiao,** received the M.S., and Ph.D. degrees in computer science from The University of Texas at Dallas, Richardson, USA. He is an Associate Professor with the Department of Computer Science & Information Technologies, Frostburg State University, Frostburg, MD, USA. His research interests include data security and distributed systems. Dr. Xiao has published over 30 peer reviewed papers in prestigious journals and peer reviewed conference proceedings.

%d bloggers like this: