IJCNC 06

INVESTIGATING MULTILAYER OMEGA–TYPE NETWORKS OPERATING WITH THE CUT–THROUGH
TECHNIQUE UNDER UNIFORM OR HOTSPOT TRAFFIC CONDITIONS

Eleftherios Stergiou1, John Garofalakis2,
Dimitrios Liarokapis1 and Spiridoula Margariti1

1Department of Informatics & Telecommunications, University of Ioannina, Arta, Greece 2Department of Communication Engineering and Informatics, University of Patras and Computer Technology Institute and Press– “DIOPHANTUS” (CTI) Patras, Patra, 26504, Greece

ABSTRACT

The continuous increase in the complexity of data networks has motivated the development of more effective Multistage Interconnection Networks (MINs) as important factors in providing higher data transfer rates in various switching divisions. In this paper, semi–layer omega–class networks operating with a cut–through forwarding technique are chosen as test–bed subjects for detailed evaluation, and this network architecture is modelled, inspected, and simulated. The results are examined for relevant single–layer omega networks operating with cut–through or ‘store and forward’ forwarding techniques. Two series of experiments are carried out: one concerns the case of uniform traffic, while the other is related tohotspot traffic. The results quantify the way in which this network outperforms the corresponding single–layer network architectures for the same network size and buffer size. Furthermore, the effects of the dimensions of the switch elements and their corresponding reliability on the overall interconnection system are investigated, and the complexity and the relevant cost are examined. The data yielded by this investigation can be valuable to MIN engineers and can allow them to achieve more productive networks with lower overall implementation costs.

KEYWORDS

Omega network, Performance evaluation, Cut-through, Semi-layer multistage networks, Uniform traffic,Hotspot traffic, Banyan networks.

1. INTRODUCTION

Multistage interconnection networks (MINs) are devices used in datacentres or supercomputers to ensure the satisfactory quality of networking flows [1-4] and can be implemented at a relatively low cost. MIN fabrics consist of parallel stages formed of small-scale Switching Elements (SEs) in a parallel arrangement. Each stage is preceded (or followed) by linked states called permutations. Banyan MINs are devices characterised by the banyan property, meaning that from an arbitrary input to a specified output, there exists a unique path. Omega-type MINs are a specific form of banyan MINs and are self-routing equipments with the banyan characteristics [5]. The ‘perfect shuffle’ mechanism (operating mechanism) is used in omega-type networks. The term self-routing (forwarding algorithm) means that when an arbitrary SE receives a packet at one of its inputs, it can send it to the designated output based on the destination ddress of the packet. This forwarding method functions by shifting the bits of the destination address to the left. Figure 1 shows a common view of an omega-type network, and this is described inmore detail below. The form of the links (permutations) between stages stays the same for allomega-type networkstages (see Fig. 1). This is another key feature of these networks.Multi-layer type of MINs (MLMINs) are recently developed fabrics that can ameliorate theperformance of conventional MINs. Semi-layer MINs (SeLMINs) are a specific type of MLMINswhich have been fully investigated [6-9] (Figs. 3 and4). These researchers explored thebehaviour of SeLMINs operating via the common ‘Store and Forward’ (SaF) mechanism. Figure3 shows how a level branches to form two levels. It can be seen that in the stage at which branching takes place, 2×4 SEs are used. Forwarding mechanisms: In addition to the acclaimed SaF forwarding method, more advanced techniques may also be used to transmit data to the target, like as the wormhole and cut-through (C-T) techniques. A particular combination of a network topology and forwarding method creates an individual environment that requires specific research.

Uniform traffic: Most network surveys have been carried out based on the assumption that the load follows a uniform distribution. This means that the probability of a packet appearing at each input of the fabric is the same. Applying a balanced load helps us to understand the behaviour of a device. Although the use of uniform traffic remains a very useful study tool, mainly due to its symmetry, real-world traffic distributions are highly non-uniform.

Figure 1: A representative topology of an omega-type semi-layer multistage network

Hotspot traffic: Most data packets originating from contemporary applications and arriving at MINs have a non-uniform pattern of traffic. In the MIN architecture studied here, hotspot-type traffic was examined as a case of particular importance and was obtained as a scenario for investigation. In general, hotspot traffic is a pattern of data packets in which the destination of a certain proportion of the packets is a single outlet, called a hotspot. This type of traffic is common, for instance when a server has been set up and multiple clients repeatedly access it to obtain data and services, or where miscellaneous network nodes are connected through trunk ports.

Fig. 2 shows the traffic pattern in an omega-type multistage system with only one hotspot point, called output 2: this output is named the hotspot as it collects a higher share of the total MIN traffic than the rest of the outputs. More widely, if we signify as thevthe probability that a packet coming at input port i has output port j as its destination . Hence, any input port are targeted to a single hotspot point that accepts a higher share of the load.

Figure 2: Α snapshot of hotspot traffic in an omega-type MIN

In hotspot-type traffic, the SEs of the MIN can be subdivided into two parts, Hot-Spot Traffic (HS-T) and No-T, where HS-T represents SEs that accept and serve hotspot traffic, and No-T (normal traffic) indicates those that accept only normal traffic and are free of hotspot load.

Related work

Although the C-T packet routing method has recently been investigated [10-11], but those studies mainly concerned torus interconnection networks, which are different from omega-type networks. In [12] have presented a brief study of omega networks working via the C-T technique. In [13] researched hotspot traffic using an OMNeT++ simulator, but this traffic has been implemented to torus-type networks, while in [14] hotspot traffic was investigated using the SaF forwarding mechanism in conventional single-layer MINs (SiMINs). In [15] also investigated a scenario involving delta-formed networks working via the C-T technique when the arriving load was hotspot-type traffic, and in [16] carried out a recent study investigating the issue of the reliability of MINs. Even though all the above studies are related to data forwarding techniques, they were applied only to classical MIN architectures. Due to the wide range of different types of interconnection fabric, many issues remain open. In this paper, a specific scenario is selected for thorough investigation.

Selecting a testbed fabric for investigation: With the aim of investigating a high-performance fabric that can meet the current demands on contemporary nettings, semi-layer omega-type MINs re been chosen as a basic testbed, and the C-T forwarding method was applied. We then investigated the behavior of current fabrics under conditions of uniform or hotspot traffic.

Contributions of this work: This investigation extends existing research with regard to the following points:

For uniform-type traffic: The throughput is estimated for fabrics with 2×2 SEs as basic units. This performance indicator is quantified demonstrating that the C-T technique is superior in performance and in handling HS traffic compared to the SaF one. The same indications were observed for fabrics using 4×4 SEs as basic units. The latency of a packet is estimated for the above MIN architectures.

For hotspot-type traffic: The throughput metric is estimated for single- and semi-layer omega formed networks with different buffer sizes.

Overview of the Results: A comparison of results is presented for networks of the same size operating with the SaF and C-T forwarding methods. This comparison gives a quantitative estimate of the extent to which the C-T forwarding technique outperforms SaF. The advantage of this approach is revealed in terms of the reliability factor for the fabrics under study. All of the results from the experiments indicate that the current fabric is a very productive device that can provide reliability and high performance to network nodes.

The rest of this paper is structured as follows: Section 2 gives details of the omega-formed and semi-layer MINs, and the C-T forwarding mechanism. Section 3 presents definitions for fundamental performance metrics. Section 4 explains the simulators used and the results of the experiments are portrayed. In the end, Section 5 briefly summarises the work in this paper, with conclusions and directions for future work.

2.PROPERTIES OF OMEGA-TYPE NETWORKS

2.1.Omega-type MINs

An ordinary multistage ( N x N ) MIN is constructed from maximum parallel stages of ( k x k ) SEs, where k represents the size of the SE and each stage consists of ( N / K ) SEs. Thus, the overall number of SEs in a SiMIN is calculated equal to. Hence, there exist interconnections between all the stages, unlike in a crossbar network, that has links. These are self-routing networks and are distinguished by the fact that there exists only a unique route from each inlet to each outlet (the banyan attribute), as illustrated in [1- 4]. The ‘perfect shuffle’ routing method is used by omega networks, in which routing is done by shifting the bits of the destination tag to the left [2]. A version of this method (algorithm) is also utilized by other omega-formed fabrics. Typically, in generalized cube-type interconnection networks, the used routing- tag is produced based on source and destination labels. An additional discreet feature of omega-type networks is the symmetry of the links that exists between the stages.

Figure 3: Creating a simple double-layer structure using 2×4 switch elements in a stage before branching into layers.

A typical configuration for an N x N Omega-type network is illustrated in Fig. 1 and explained below. The entire multistage fabric works synchronously, which means that the time-cycle is based on a global clock-tick. The time-cycle of the fabric consists of two distinct phases (time periods): during the first phase, the queues forward the packets (service), while in the second phase, new packets are accepted by queues.

2.2.C-T forwarding in omega MINs

The omega-type MIN studied here is assumed to operate in accordance with the following conditions:

At any input, the packets’ arrival is considered to follows Poisson distribution. The probability of an arrival at each input of a MIN within an arbitrary clock-cycle is independent of the probabilities for the other
All packets arriving at the MIN have the same
The service-time of the output queues in any switch is constant and is equal to the network- Each network cycle is long enough that the server of the SE can designate routing ways and transmit a packet via the stages to output using the C-T technique.
Each switch input can receive at most one packet in any time
The whole fabric operates in a synchronous manner, meaning that in each network-time, the access of the next outbound link is checked. If it is accessible, a packet starts moving; it goes through as many stages as is feasible (C-T forwarding) and ends up in the most remote buffer that is technically
In the preamble to any time-cycle of the MIN, all SEs transmit backpressure signals to their corresponding upstream SEs to advertise the status of their
A C-T mechanism’s operation (packet movement through buffers) is performed when (a) this is demanded by routing; (b) the intermediate join-links are not busy, and (c) adequate buffer space is
The First Input, First Output (FIFO) service method is applied by all the buffers in the system.
In any time-cycle, only one packet at most is permitted to pass through an output-link of a SE.
The arrival of packets is considered to take place at the end of each time cycle. In other words, the systems process the first elements in the queue before accepting new
If the buffers of the first stage are full, but new packets are arriving at these buffers, then the new packets are lost (buffer overflow).
Any packet at any stage will be blocked if the corresponding buffer at the next stage do not have free
Conflicts between packets are resolved by selecting randomly the packet to proceed, which can also be considered a fair policy.
The output links of the MIN (the links of the last stage) are considered to be non-blocking links. This is because we only wish to study the performance of this
Here, is a random variable that depicts the number of packets coming at the end of a network-cycle at any buffer of SE at the first stage of the MIN, and we use the formula provided by Stergiou in [6]. In this approach, the arrival of packets follows a binomial distribution

where represents the fixed-probability of a packet being generated by a processor in each time-cycle. Represents the probability of ( ) packets being accepted by a first-stage buffer with an average of inputs at any time step. In general, the fabrics under study have SEs with inputs, and less often with In addition to the topologies of MINs, other characteristics make them important as fabrics. For example, the switching techniques and routing mechanisms that they employ make their behaviour extremely unusual. The fabrics investigated here apply a C-T switching-method and a ‘perfect shuffle’ algorithm of routing.

2.3.Semi-layer MINs

SeLMINs belong in a particular category of MLMINs. Their most important characteristic is that they are composed by two distinct segments, where the first segment is a single-layer MIN and the second is a fully developed multi-layer module (Fig. 4(ii)). The first segment of the MIN is made of one exclusive layer that is working via a backpressure blocking mechanism (and hence does not allow for packet loss) and which has a low cost of production. The second module is a full ‘fan-out’ section, which also utilizes a backpressure mechanism but is free of blocking, thus facilitating the output of the packets.

In an arbitrary semi-layer type of MIN (Fig. 4ii), let depicts the number of single-layer stages and let represent the number of stages that have full layer development and may therefore service multicast traffic without blocking. We can therefore write:

If the fabric generally consists of SEs then the last-stage of the single-layer segment (referred to here as the ‘replicated’ stage) is generally manufactured using for l = 1, 2, … and the second segment will have a replication degree (or number of layers) equal to .

A similar development could apply for case the fabric consists of 4×4 SEs yielding number of layers.

Figure 4: Two six-stage omega MINs: (i) a single-layer MIN; (ii) a semi layer MIN (with two-layer fan-out).

If the total length of the second segment is a value of , then the total number of layers() in the full fan-out is equal to , where represents the inputs’ number per SE (for instance, for 2×4 switches, is equal to two). As reported in [7], the outlets of the SEs in the last stage of a multi-layer MIN are multiplexed. Hence, if either the data sinks or the multiplexer has insufficient capacity to absorb the packets, blocking may be observed. Nevertheless, in this study, we assume that the multiplexers at the data sinks have the necessary capacity.

The primary disadvantage of MLMINs is their higher value of cost owing to their increased complexity. The semi-layer type of MINs were originally used due to their balance between performance and cost, and in cases there are increased traffic requirements to be met. Owing to their excellent performance/cost ratio, semi-layer MINs are anticipated to become widely used in the future in Internet interconnection nodes and parallel systems.

2.4.Semi-layer MINs incorporating an omega-type permutation

When the schema of the links between the stages of a SeLMIN is kept the same with an omega-type permutation, then the fabric becomes a semi-layer omega network (omega-type SeLMIN). In this paper, this special type of MIN is selected for a detailed analysis and study of its performance under various traffic conditions. More specifically, this paper considers SiMINs and SeLMINs with 10 and five stages, with a double-layered fan-out at the end. In addition, it is assumed that the SemiLMINs use SEs in the stage where the replication starts and typical SEs in the rest of the stages. Furthermore, an alternative form of SeLMINs is investigated that utilises SEs in the stage where the replication starts and SEs in the other stages.

2.5.Cut-through forwarding mechanism

The C-T mechanism is a forwarding method that can be implemented in miscellaneous networks (including omega-type fabrics), in which the packets (or the frames) may be transferred to their destination points immediately after the routing tags has been examined, without expecting the rest of data to be completely accepted [10-11,16]. The C-T forwarding method generally provides low-latency performance, and operates as follows:

The same clock is utilized for all networks SEs. .
All the SE buffers should have the least a whole packet free space, as happens in the SaF forwarding
When a message has arrived, it takes one-time unit for its header (if it avoids being blocked) to be placed in the MIN internal input
Forthwith after a packet has s arrived in a MIN, rather than expecting for the whole packet to be completely placed, the header continues moving forward through the next stages (‘cut- through’); the routing decision has already been taken, and the relevant channel is working normally (viz without blocking).
Each package is divided into flits, creating a chain of flits following the head of the packet. The chain of flits is forwarded in a snake-line
Each flit of a packet may store when it arrives at a stage, or it also cuts through to the subsequent stage if the next link is verified as being free, following the same route of the packet’s
Forthwith after a packet’s header has been accepted at a stage, if the header can no longer proceed (because of the busy channels), then the packet must be fully received before it is retransmitted. In this situation, all the subsequent data flits are attempting to release their channels which they have been occupy so far. Otherwise, the packet tries to accumulate as close as possible to the packet’s
If there are no packets’ conflicts along a route, each packet will be successfully pipelined through a series of sequential stages, in the form of a loose chain consisting of data flits.
Throughout any C-T transfer operation, any buffer of the stages along the specific routing pathway is closed to another parallel requirement for
The time cycle of the pipelined packets in the C-T method is defined based on the maximum value of the switching time period and the inter-stage time delay once the header arrives at a final
It is supposed that there are no staging delays for the cutting through method until a packet is blocked by a preceding one.

The C-T forwarding technique is credible for packet error handling in which the recovery time of incorrectly targeted packets is restricted since the recovery process can be started more quickly than in the SaF forwarding method.

Proposal: In an equilibrated MIN operating with C-T forwarding, if represents the average number of hops (or average span) and portrays the probability of waiting at a randomly selected channel (i.e. with no cut-through), then the average number (the mean) of cut-throughs in a multistage fabric can be portrayed by the following formula:

In summary, the C-T forwarding method is an elegant and efficient method but is expensive compared with the SaF or wormhole methods that also can be applied in MINs. The C-T forwarding modal is currently utilized in a wide range of switch fabrics, for example in the Cisco Nexus 5000 series of switches [17], the Cisco Nexus 5548P switch [18], and the Juniper QFX3500 switch [19].

2.6.Transition graph for the queue state

The state transition of a queue in an arbitrary fabric containing a 2×2 SE constructed with buffer- size is illustrated in Fig. 5. In each intermediate stage , we consider that the following two states will take place in one clock-unit:

u (i ) is the probability of a buffer containing no packets, and

u (i ) is the probability that a buffer is being held by a packet.

represents the probability that the next queue, positioned in stage , is blocked. The value of m falls into the values’ scale i+1<m<L-1. This says that the next (i+1,…,m-1) consecutive queues are unaffected of blocking facilitating the transition, and only the queue of stage m is blocked.

In addition represent the transitions of zero, one, or two packets, respectively, from the prior stage to stage. These parameters are adjusted to appropriate values for every stage of the fabric.The clock cycle in a MIN contains two time periods:

the first time-period, where flow control information is transferred via the network from the last stage to the first,
the second time-period, where packets flow from their current stages towards all available subsequent stages, in accordance with the current networking

Fig. 5 shows the state-transition graph for a single-buffered queue of an SE in the C-T forwarding method. In this graph, everyarrow represents the transition and may be accompaniedwith formulas which indicates the policy of each possible transition.

Figure 5: State-transition graph for a single-buffer (b = 1) model of a fabric utilising the C-T forwarding method, at an arbitrary stage (i)

The relationships shown in Fig. 5 allow for a roughly analytical approach to the forwarding method used in the system. The state of queue contains two sub-states:

a blocked route and
a C-T

Likewise, the state of the other queue, also contains two sub-states: an empty state and a C-T transition.

Boundary conditions of the system: (i) The necessity for the foregoing MIN’s stage is i = 0 , since there is no foregoing stage, and the probability of an arrival is assumed to be equal(p) ; and (ii) the demand for the last stage(L) is that every packet at an outlet of the last stage is assumed to leave in each case with no difficulty. Hence, the blocking probability is equal to zero.

3.DEFINING THE RELEVANTMETRICS

We will show performance outcomes for semi-layer and single-layer omega-type multistage networks when these fabrics are handling unicast and hotspot types of traffic. Additional metrics are also used for evaluation, such as the reliability metric.

3.1.Performance metrics

The main performance metrics used in this study are as follows:

The average throughput of a single- and semi-layer omega-network: This metric is calculated as the number of packets per time cycle coming out of the fiber’s outputs. Formally, can be defined as:

where represents the number of packets that come to their destination at the time-interval. This is an indication of how efficiently the network is utilised. Here, the throughput is measured using simulations, and this metric is estimated based on the packets’ quantity that arrived at their target over a given number of trials.

Note: The average throughput of a semi-layer omega-type network ( ) can be assumed to be , where represents the throughput of the first part of a SeLMIN. That happens because no blocking is seen in the second (multi-layer) segment.

The normalised throughput of single- and semi-layer omega networks: This is determined as a fraction of the average throughput over the network size N. Typically is defined by the formula:

The average packet latency of a single-layer omega network: This is determined as the number of time-cycles required for all packets of a permutation to reach their final targets. More formal, an be denoted as:

where represents the total number of packets obtained by the sinks within time-intervals and is the total number of network-time slots required by a randomly packet to reach its final target. contains both time periods: the total number of network-time cycles for a packet waiting at any stage, and the total number of network-time cycles required for the packet to involved in real transmission until it reaches at its target. The packet’s latency in a MIN is associated to the overall number of hops (time-cycles) required to distribute a given number of packets to their final targets via links of fabric.

Note: The average packet latency of a semi-layer omega-type MIN is expressed as . If is the packet latency of the first segment, then the packet latency of a SeLMIN can be expressed as . This arises because the packets in the second segment of fabric do not suffer from conflicts.

The normalised packet latency of a single- and semi-layer omega-type network ( ): In general, this is the ratio of the average packet’s latency to the lowest packet’s latency, which is equal to time-cycles. Formally, can be expressed as .

3.2.Additional evaluation metrics

Reliability: The reliability of a MIN may be determined as its capacity to handle all possible unanticipated situations. The reliability of a random path (an arbitrary input–output pathway) is dependent on all the intermediary SEs, since the failure of any SE involves the non-success circumstance (e.g., disaster) of the current routing pathway. The concept of terminal reliability is utilized here to illustrate the above metric and can be defined as the probability of existing at least one failure-free pathway between a given inlet – outlet pair. The term point-to-point reliability may also be used for the same metric.

Terminal reliabilities of SE structures in series and parallel

Estimation of terminal reliability for SEs in series: For n SEs SEs in sequence, each of them with reliability (where the specification of is given by industry), then the overall reliability or terminal reliability of this structural system may calculate using the formula:

Estimation of terminal reliability for SEs in parallel: Supposing we have n SEs in a parallel layout, then the total reliability of this structural system may calculate with formula:

Estimation of terminal reliability for a semi-layer MIN: Regarding the case of a SeLMIN, the overall terminal reliability for an arbitrary pathway may estimate with the formula:

Without losing the generality, and taking the reliability of any SE to be equal to , Equation (6) can be rewritten as:

where S is the span of a single-layer module and is the overall number of fan-out layer’s multiplication. In the present case, the estimations assumed at the reliability of any SE was equal to . For a SE, the reliability was assumed. The worst case of a MIN that has the banyan property and that is implemented using SEs arises more rarely, due to the high cost involved

4.SIMULATIONS AND RESULTS

4.1.Simulations

The performance of multi-layer omega-type networks is evaluated in this research utilizing simulations. The focus of the discussion is on (N x N) multi-layer omega-type fabrics composed of (2 x 2) and (4X4) SEs accompanied with (2 x 4) and ( 4 x 8) SEs, which are implemented in stages with the multiplication of layers. A specific simulator was created for SeLMINs that could handle several types of switches and various load conditions at the data packet scale. This simulator tool was implemented in the C++ language and could run various configurations of networks.

When building the simulator, each SE was represented by two or additional un-shared buffered queues. Each buffer operated on a First Come, First Served (FCFS) basis. The C-T mechanism was utilised to transmit all packets; that is to say, the packets had the potential to be proceed ahead by one or additional stages within a given time cycle. Cases of conflict between packets were randomly resolved with the same chance. Within the simulator, the size of the buffer, the number of inlets and outlets, the probability of packets arriving, the total number of stages, and the total number of layers were originally determined as input parameters. Ultimately, the simulations were executed at the packet level, based on the assumptions that all packets had a fixed size and were transmitted within time spans of equal length. Performance metrics such as throughput and packet latency were assessed at the output of the model. To ensure that the simulator was running in a stable equilibrium state, the first iterations (clock cycles) of each simulation were disregarded.

We have not observed any other limitations or deficiencies regarding the simulation software for this study. However, we need to point out that the simulation software can be used only for Omega-type MINs, and it needs to be modified if there is a need to simulate a different architecture.

4.2.Results

a)Uniform traffic

Fig. 6 shows the throughput of the normalized 10-stage SeLMIN and the corresponding equivalent SiMIN correlated with the load on the MIN inputs. All the test-bed MINs examined here had a buffer size of one, and all the SEs were 2×2 SEs, except those that are in the branching stage. The last stage of the SeLMIN architecture contained a simple two-layer fan-out.

It can be seen from Fig. 6 that the C-T forwarding mechanism provides higher throughput values which are shown by the first two bars in yellow and orange colour respectively compared to the corresponding omega-networks that use the SaF mechanism depicted by the last two bars per offered load value. The second and fourth bars show the results from SiMINs using the C-T and SaF forwarding mechanisms, correspondingly. It can be seen in Fig.6 at the rightmost bundle of bars that for an offered load of 100%, the SeLMIN with the C-T technique provides approx. 20% (0.8-0.6≈0.2) extra throughput compared to the SeLMIN of equivalent size using SaF.

Figure 6: Normalised throughput correlated with load on a 10-stage omega-network operating via the C-T or SaF forwarding methods, correspondingly.

A similar improvement is observed in comparing the SiMIN and SeLMIN fabrics. Furthermore, when the arrived load is lower than ~40%, the type of forwarding technique and architecture do not matter. From the same graph, the normalized throughput’s values divergence owed to architecture, and for the C-T method, this starts when the arrived load is approximately ~40% as shown by the first two bars in the corresponding bundle. On the other side the divergence of the throughput values owed to architecture for the SaF case starts a little bit earlier at an offered load of 40% as can be observed in the third and fourth bars.

Fig. 7 shows the normalised packet latency values for the 10-stage (1024×1024) single- and semi- layer omega-type networks when C-T and classic SaF data forward strategies are utilised. We take as an example the case in which the offered load is equal to 10%. When the SaF mechanism is used by the omega network, the packet normalized latency is close to one which actually means a latency of 10 time-cycles, as this is the total number of stages of the current device. For the C-T mechanism, the corresponding normalized latency is 0.1, meaning that it takes about one time-cycle for a packet to arrive at the output of the device, assuming that the MIN is free of packets and no collisions are presented. The figure clearly demonstrates the superiority of the C- T forwarding technique over the SaF tactic, in terms of packet latency.

Figure 7: Normalised latency for various arrived load on a 1024×1024 single- and semi-layer omega- network operating via the C-T or SaF forwarding methods, correspondingly.

Subsequently, except for the 10-stage (1024×1024) omega-networks that included 2×2 and 2×4 SEs, further simulation trials were also conducted for 5-stage (32×32) single- and semi-layer omega-type MIN, which were constructed of 4 x 4 and 4 x 8SEs. Several MΙΝs with low- latency data transfer have been proposed, such as the scheme in [20]. However, these proposals have certain weaknesses in terms of routing. In addition, most of the contemporary low-latency MIN architectures are more appropriate or are tailored to multiprocessor reconfigurable devices.

Fig. 8 shows the normalized throughput from a five-stage SeLMIN and from the analogous SiMIN in terms of fabric size, versus the arrived load at the inputs. These MINs are also considered to be consisting by one-sized buffers. The second and fourth bars which are shown in blue and yellow respectively, indicate the outcome of the SiMINs, while the first and third bars depicted by grey and orange respectively, represent the results from the SeLMINs with the two- layered fan-out at the end of the fabric.

Fig. 8 shows that a SeLMIN with the C-T mechanism produces higher values of throughput, although this gain is less significant. This conclusion is also supported by the results in Fig. 6.

Figure 8: Normalised throughput versus load for five-stage single-and semi-layer omega-networks operating via the SaF or C-T forwarding methods. Each SE used here has four inputs.

A further basic conclusion is that the use of larger numbers of SEs does not provide significant performance improvements, while at the same time, the cost is greatly increased.

Validation of simulator: The quality of the simulator was checked for certain boundary conditions. More specifically, some of the simulations were validated in particular instances of single-layer MINs (without a fan-out module). The data from these experiments are illustrated in Figs. 6 and 7, by the second and fourth bars. These data exactly coincide with the results reported in Figs. 5 and 6 in [21], which are related to SiΜΙΝs with the C-T forwarding method.

b)Hotspot traffic

In addition to experiments with uniform traffic at the inputs, another series of experiments was carried out with hotspot traffic. This type of traffic is the most common in everyday life since web servers and other client-server-type applications create this type of load. It is important to produce some traffic statistics to understand the upper limits created by the aggregation of flow at hotspots due to the corresponding traffic requirements. As in the previous series of experiments which was concerning uniform traffic, the performance evaluation was carried out when the operation of the MIN was in a stable condition.

Fig. 9 shows the normalized throughput for a single-layer omega-type network with size equal to six, meaning that the MIN block has 64 inputs and 64 outputs ‘working’ with the C-T or SaF techniques with different buffer sizes, for arrived traffic load which has a hotspot ratio (h) of 10%. The three first bars, depicted as grey, yellow, and blue respectively, show devices ‘working’ with the C-T method, and it can be infered that a relatively minor amelioration in the throughput metric is achieved when the buffer size increases. For example, if a maximum offered load with a hotspot traffic ratio equal to ~10%, the normalised throughput is 39% if the system has a buffer size of one. For the same traffic conditions but with a buffer size of 10, the normalised throughput is 41.6% indicating a gain of just 2.6%.

This gain is insignificant in relation to the cost of the hardware required. In other words, increasing the buffer size does not considerably meliorate the performance of the devices, and hence it is not recommended.

Figure 9: Normalized throughput of single-layer omega-MINs ‘working’ with the C-T or SaF method versus arrived load, which has a 10% hotspot ratio and for buffer-sizes of 1, 2, and 10.

However, the usage of the C-T packet’s forwarding technique markedly ameliorates the rate of outcoming traffic by approximately 26%, compared to the SaF method for an omega-type MIN with a buffer size of one.

Fig. 10 shows the normalised throughput for single- layer and semi-layer omega-networks with a network size of six, ‘working’ via the C-T and SaF methods and with different buffer-sizes, versus the various values of the offered packet traffic which has a hotspot ratio (h) of ~10%. The SeLMINs investigated in the current study have a double-layer fan-out at the end, i.e., NoL = 2 . Moreover, Fig.10 shows that increasing the buffer size does not significantly ameliorate the throughput, while the analogous hardware cost increases to a high level. In addition, it is again shown that the C-T forwarding operation significantly enhances the MIN’s throughput compared to the SaF method. In addition, Fig. 10 shows that the SeLMINs are more productive than the analogous SiMINs in terms of the network diameter. More specifically, for an omega-type semi- layer network (SeLMIN) with a network scope of six stages, a buffer size of one and ‘working’ via the C-T method, for a full offered load, the normalized throughput is 39%.

Table 1:Values of normalized throughput for semi-layer and single-layer MINs ‘working’ viathe SaF and C-T methods

For the corresponding single-layer MIN operating via the SaF forwarding method, the normalised throughput is 13%. The first bar in orange represents the results for a SiMIN with the SaF method. The second and the third bars in grey and yellow respectively, represent results for SiLMINs that use the C-T forwarding method and they differ in buffer size. The fourth and fifth bars in light and dark blue respectively, depict results of SemiLMIN that operates via C-T mechanism but differ in buffer size. Table 1 shows the values of the normalized throughput for SemiLMINs and SiMINs using the C-T and SaF mechanisms, respectively.

The normalized throughput for a SiMIN with b = 10 and full offered load is only 1.6% better than the corresponding architecture with b = 2. This is confirmed by looking at the last row in Table 1, where the corresponding difference is: 0.416−0.4 = 0.016. It can be seen from the graph that the SeLMIN with buffer size b = 1 and the C-T method gives a 26% improvement in throughput (0.39–0.13=0.26) compared to the corresponding SiMIN with b = 1.

Figure 10: Normalised throughput versus offered packet traffic having a hotspot portion of 10%, for single- and semi-layer MINs ‘working’ via the SaF and C-T methods and constructed with 1, 2 or 10 buffer sizes.

Fig. 11 depicts the normalized latency versus arrived load on a 6-stage single- and semi-layer omega-network operating via the C-T forwarding methods, respectively versus offered load which have a hotspot portion equal to ~10%. In addition, Fig. 11 depicts how an architecture which operates via C-T method with small buffer size prevails in terms of packet latency. A 6- stage SeLMIN with buffer size equal to 1 that operates via C-T mechanism presents approx. 3 times less packet latency compared with an equivalent SeLMIN accommodated with buffer size equal to 10, when the offered load is full. That is confirmed by estimating the difference in values of the last two bars in Fig. 11. i.e., 4.2-0.79=3,41.

Also, the same SiLMIN consisting of buffer-size equal to 1 that operates via C-T mechanism presents approx. 4 times less packet latency compared with an equivalent SiMIN equipped with buffer size equal to 10 for full offered load. That is confirmed by estimating the difference in values of the last two bars in Fig. 11, i.e., 5.2-0.79=4,41.

Figure 11: Normalised latency versus offered packet load on a 6-stage single- and semi-layer omega- network operating via the C-T forwarding methods, respectively.

The above performance investigation yields quantitative results for the improvement in the efficiency of the MINs due to the multi-layer architecture. Furthermore, is shows that C-T forwarding method provides further increases in the performance of the fabric. A combination of the two techniques gives the best solution, and it seems that this solution works properly even for hotspot traffic.

In addition to the performance evaluation, other aspects of this system were investigated, for example the complexity, the cost, and the reliability factor.

c)Complexity and cost analysis

The key problem in the construction of a MIN is the cost since a performing MIN with a very high cost might not worth it. The MIN’s cost is associated to the complexity of the system and the SEs. Here, we follow the approach which is presented in the studies [22, 23] where is carried out a cost analysis.

The complexity of a MIN depends on the number of SEs involved. An N x N SiMIN consists of N / 2 (log2N) SEs. Furthermore, an N x N SeLMIN with a double -layer fan-out (NoL =2) in the latter stage is made up of N / 2 . ((log2N) – 1) + N SEs in each single-layer segment, and a number of 2. N / 2 = N SEs in the fan-out module.

Table 2: Cost functions for N x N omega-type MINs

The complexity of a MIN directly correlates with the number of input-links connected to the switch together with the sum of cross-points that included in a switch. Hence, a 2 x @SE has an overall cost weight of four units; a 2 x 2 SE has a cost weight and a 2 x 6SE has an overall cost of twelve units.

On the basis of complexity, the functions of cost can be computed, considering the individual cost units of every SE, mentioned as weighting items. The functions of cost for some typical omega-type MINs are reported in Table 2.

Fig. 12 shows the units of cost for typical omega multistage networks versus network size (N). From this diagram, it can be seen that for any MIN with network size N < 128 (total stages ≤ 7), the cost remains quite low, regardless of the architecture, and there are insignificant differences due to the architecture used.

Figure 12: Units of cost versus network size for single- and semi-layer MINs

However, as the network size increases, the cost also increases and can become an important factor for large network sizes. The greatest deviation is observed for a network of size Ν = 1024. This deviation is about 4,000 units of cost, representing a burden of 20% in relation to the overall cost. As a result of the cost examination of MINs, it can be concluded that industrial users should not be dissuaded by the cost aspect. The cost is significant only for sizable networks, and in these instances, the differences in the costs of various architectures (for equivalent network sizes) are relatively small.

d)Reliability of the fabric

The terminal reliability of a semi-layer omega-MIN with double-layer fan-out in the latter stage (NoL = 2 ) versus the simple reliability of a switch-unit is shown in Fig. 13. The simple (or elementary) reliability of a switch-unit is a construction feature and is given by the manufacturer. Theoretically, the elementary reliability ranges between 0.9 and 1.

Figure 13: Overall terminal-reliability of semi-layer omega-MINs with NoL = 2 and different network dimensions, constructed with two-input and four-input SEs, versus the simple reliability of a typical SEs.

The fabrics considered here rely on two-input (2in – SEs) or four-input (4in – SEs ) SEs, forming fabrics with various network diameters. Fig. 13 shows that SeLMINs based on two-input SEs have higher values for the overall terminal reliability (as shown by the first three bars) compared to corresponding fabrics built with four-input SEs (as shown by the last three bars). The first bar (shown in yellow) represents the overall reliability of a SeLMIN with 64×64 inputs/outputs and constructed with 2×2 SEs. Similarly, the next two bars (green and brown) show the overall reliability of the corresponding 256×256 and 1024×1024 SeLMINs. The remaining bars represent interconnection systems based on 4×4 SEs. It is therefore demonstrated with numbers that the bigger the size of the networks, the lower the total terminal-reliability. Ιn summary, it can be inferred from our experiments that semi-layer omega-MINs operating via the C-T technique and based on two-input SEs are more effective in terms of their operation than MINs of equivalent dimensions that utilise an SaF technique or that are constructed with higher numbers of SEs.

Although various reliability studies have been carried out in this area, these studies deal with specific MIN architectures. For example, a reliability analysis was carried out by Prakash et al. [24], but their work referred to a shuffle-exchange gamma logical neighbourhood interconnection network (SEGLNIN). In the same way as these authors, we developed a specific approach to estimating the reliability of the fabrics investigated here. Because the real production of the fabric requires significant time and ηιγη cost, this study offers a practical way of calculating the reliability in advance before the actual fabrication.

e)General Comments

Besides the methods of packet forwarding mentioned earlier, there are new studies appearing in the literature such as the one by Birmpilis et al [25], regarding applications where parallelization is critical. According to this study, packets transmitted through such networks can be interrupted using buffers in order to maximize network usage and minimize the time required for all messages to reach their destination. Also, in order to improve the performance of the network nodes there are new proposed architectures in the literature [26].

The results we have seen so far rely on devices that rely on classical semiconductor memories. Recently, a novel type of memory has been proposed in [27] which can achieve data access speeds on the scale of 10 Gbps. This new form of memory is very promising in terms of allowing interconnection networks to overcome their existing limitations and operate at ultra-fast data rates. That innovation will therefore reignite attention in new architectural structures of MINs.

5.CONCLUSION

In this study, semi-layer omega interconnection-type networks that use the C-T forwarding technique were examined. The outcomes yielded from these experiments were compared with SiMINs with similar network properties that utilize the classical SaF forwarding technique. Two series of experiments were carried out: the first involved a uniform traffic profile, while the other dealt with hotspot traffic. It is apparent that the omega-type SeLMINs are more efficient and robust, despite their slightly higher complexity, in proportional terms, compared to SiMINs. It should also be mentioned that the data derived from the simulations were validated in some marginal cases based on available relevant pre-existing investigations.

The findings of this investigation can be utilized by MIN engineers to build more efficient interconnection networks and configure them appropriately. MIN designers must consider and evaluate the quality of service that they desire to achieve. The omega-type MINs investigated here operate at the ns scale due to their conventional electronic memories.

The current investigation can offer an exemplary approach to study additional traffic patterns such as bursty-type traffic. In addition, other research scenarios could evaluate the performance of similar network architectures utilizing modern routing techniques. We plan to investigate further the application of the cut-through technique in Tandem Multistage Networks.

CONFLICT OF INTEREST

The author declares no conflict of interest.

REFERENCES

A. M. Yunus, M. Othman, Z. Hanapi, Y. Kweh, “Evaluation of replication method in shu_e- exchange network reliability performance,” in Advances in Data and Information Sciences, 271–281, Springer, 2019, doi:10.1007/978-981-13-0277-0_22.
Aleksic, M. Fiorani, “The Future of Switching in Data Centers,” in Optical Switching in Next Generation Data Centers, 301–328, Springer, 2018, doi: 10.1007/978-3-319-61052-8_15.
Faizian, M. A. Mollah, M. S. Rahman, X. Yuan, S. Pakin, M. Lang, “Throughput models of interconnection networks: the good, the bad, and the ugly,” in 2017 IEEE 25th Annual Symposium on High-Performance Interconnects (HOTI), 33–40, IEEE, 2017, doi: 10.1109/HOTI.2017.21.
Prakash, D. K. Yadav, A. Choubey, “A survey of multistage interconnection networks,” Recent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Electrical & Electronic Engineering), 13(2), 165–183, 2020, doi: 10.2174/1872212113666190215145815.
C. Vasiliadis, G. E. Rizos, C. Vassilakis, E. Glavas, “Routing and performance analysis of double- buffered omega networks supporting multi-class priority traffic,” in 2008 Third International Conference on Systems and Networks Communications, 56–63, IEEE, 2008, doi: 10.1109/ICSNC.2008.10.
Stergiou, “Quality of service study for single and semi layer interconnection networks with relaxing blocking,” WSEAS TRANSACTIONS on COMMUNICATIONS, 12(8), 406–419, 2013, E- ISSN: 2224-2864.
Garofalakis, E. Stergiou, “Performance evaluation for single-and semi-layer multistage interconnection networks servicing multicast traffic by full multicast operation,” International Journal of Communication Systems, 24(4), 415–437, 2011, doi:10.1002/dac.1156.
Stergiou, J. D. Garofalakis “A Simulation Study for Optimizing the Performance of Semi-layer Delta Networks.” in SIMULTECH, 257–265, 2011, ISBN: 978-989-8425-78-2.
Stergiou, D. Liarokapis and J. Garofalakis, “Improving the behavior Interconnection Networks by using multiple internal paths and fan-outs,” in 2017 IEEE Symposium on Computers and Communications (ISCC), 322–326, IEEE, 2017, doi: 10.1109/ISCC.2017.8024550.
B. Levitin, Y. Rykalova, “An analytical model for virtual cut-through routing,” in 2019 28th International Conference on Computer Communication and Networks (ICCCN), 1–7, IEEE, 2019, doi: 10.1109/ICCCN.2019.8846915.
B. Levitin, Y. Rykalova, “Computer interconnection networks with virtual cut-through routing,” Procedia Computer Science, 155, 449–455, 2019, doi: 10.1016/j.procs.2019.08.062.
Stergiou, J. Garofalakis, D. Liarokapis, S. V. Margariti, “A Study of Multilayer Omega Networks Operating with Cut-Through Mechanism under Uniform Trac,” in 2020 5th South-East Europe Design Automation, Computer Engineering, Computer Networks and Social Media Conference (SEEDACECNSM), 1–7, IEEE, 2020, doi: 10.1109/SEEDA-CECNSM49515.2020.9221819.
Kumar, V. K. Sehgal, et al., “On Performance of Modified Torus Interconnection Networks,” Thesis, 2019.
Vasiliadis, G. Rizos, C. Vassilakis, “Performance study of multilayered multistage interconnection networks under hotspot traffic conditions,” Journal of Computer Systems, Networks, and Communications, 2010, doi:10.1155/2010/403056.
Stergiou, E. Glavas, S. Margariti, “Performance Evaluation of Delta Networks Operating via Cut- Through Switching under Hotspot Traffic,” in 2020 International Conference on Information Technologies (InfoTech), 1–4, IEEE, 2020, doi: 10.1109/InfoTech49733.2020.9210984.
Shin, S. Choi, H. Kim, “Flit scheduling for cut-through switching: Towards near-zero end-to-end latency,” IEEE Access, 7, 66369–66383, 2019, doi: 10.1109/ACCESS.2019.2916651.
“eXam Answers Search Engine,” Available online https://cisco.com/c/en/us/products/collateral/switches/nexus-5020-switch/white_paper_c11- 465436.html
Services, “Data Center Switches – Cisco Nexus,”. https://images10.newegg.com/UploadFilesForNewegg/itemintelligence/Cisco/qa_c67_618605140032 1443745.pd
“QFX3500 Switch” http://ibest.ee/documentation/Juniper/QFX/QFX3500%20Switch.pdf
Montano, T. Ould-Bachir, J. Mahseredjian, J. P. David, “A low-latency reconfigurable multistage interconnection network,” in 2019 IEEE Canadian Conference of Electrical and Computer Engineering (CCECE), 1–4, IEEE, 2019, doi: 10.1109/CCECE.2019.8861540.
L. Liao, W. Lin, “Performance analysis of general cut-through switching on buffered MIN switches,” Journal of Information Science and Engineering, 18(5), 745–762, 2002.
Bansal, R. Joshi, K. Singh, “On a fault-tolerant multistage interconnection network,” Computers & electrical engineering, 20(4), 335–345, 1994, doi:10.1016/0045-7906(94)90047-7.
Bansal, K. Singh, R. Joshi, “Reliability and performance analysis of a modular multistage interconnection network,” Microelectronics reliability, 33(4), 529–534, 1993, doi:10.1016/0026- 2714(93)90322-P.
Prakash, D. K. Yadav, “Design and reliability analysis of fault-tolerant shuffle exchange gamma logical neighborhood interconnection network,” The Journal of Supercomputing, 75(12), 7934–7951, 2019, doi:10.1007/s11227-019-02929-z.
Stavros Birmpilis and Timotheos Aslanidis, “A Critical Improvement On Open Shop Scheduling Algorithm For Routing In Interconnection Networks”, International Journal of Computer Networks & Communications (IJCNC), 9, No.1, January 2017, doi: 10.5121/ijcnc.2017.9101. 1.
Laxminath Tripathy and Chitta Ranjan Tripathy, “A New Interconnection Topology For Network On Chip”, International Journal of Computer Networks and Communications 10(4):37-50,July 2018, doi:10.5121/ijcnc.2018.10403
Tsakyridis, T. Alexoudi, A. Miliou, N. Pleros, C. Vagionas, “10 Gb/s optical random-access memory (RAM) cell,” Optics letters, 44(7), 1821–1824, 2019, doi:10.1364/OL.44.001821.

International Journal of Computer Networks & Communications (IJCNC)

IJCNC 06

Leave a comment Cancel reply

Information

Shortlink

Navigation

International Journal of Computer Networks & Communications (IJCNC)

IJCNC 06

Share this:

Leave a comment Cancel reply

Information

Shortlink

Navigation