DIA-TORUS:A Novel Topology for Network on chip Design
Deewakar Thakyal and Pushpita Chatterjee
SRM Research Institute, Bangalore
Abstract
The shortcomings of conventional bus architectures are in terms of scalability and the ever increasing demand of more bandwidth. And also the feature size of sub-micron domain is decreasing making it difficult for bus architectures to fulfill the requirements of modern System on Chip (SoC) systems. Network on chip (NoC) architectures presents a solution to the earlier mentioned shortcomings by employing a packet based network for inter IP communications. A pivotal feature of NoC systems is the topology in which the system is arranged. Several parameters which are topology dependent like hop count, path diversity, degree and other various parameters affect the system performance. We propose a novel topology forNoC architecture which has been thoroughly compared with the existing topologies on the basis of different network parameters.
Keywords
Network on chip, Torus, Topology, NoC
1.Introduction
Traditional bus architectures were working fine but then it started posing problems when too many deep sub-micron devices started showing disparities and the focus shifted to scalability. Scalability, being the primary goal, could not be achieved by bus architecture. These problems needed a well-structured approach, clear programming and modular design. This led to the rise of Network on Chip (NoC) architectures which offer promising solutions in terms of scalability and ease of communication. The paradigm has now shifted its focus from computation to communication and how it can cope with the scalability of the network. In most NoC architectures, the network is a set of connected router with processing nodes attached to those routers. Topology plays a crucial role in NoC, it defines how the cores are connected in a network[1]. Different IP cores can be arranged in different ways paving the way for a number of topologies that can be used for realizing a network. Topology affects the performance so it must be selected with care. So, a good topology is one which offers less hop count, more path diversity and can support load balancing. There are number of topologies in NoC which can be grouped as regular and irregular topologies. Regular topologies are those which follow a fixed pattern and can be divided into similar blocks. Irregular topologies don’t follow any fixed arrangement of nodes.
In this paper we propose a novel topology, Dia-torus, for NoC systems. Present work aims at reducing the hop count and latency parameters. It has diametrical connections that enhance its performance. The proposed work is compared with the existing topologies. Different network parameters are used for the comparison. RTL implementation is also done for area and power
consumption analysis of the proposed work and other topologies. The related work with detailed description of the preliminary metrics for realizing a good NoC topology has been described in Section 2. Section 3 discusses the proposed Dia-Torus topology and routing procedure in the proposed topology. Weaddress the analysis and results that we get after comparing it with different topologies and depicted in a tabular form in Section 4 to show the efficacy of the proposed topology and conclude the paper by describing further future work in Section 5.
2.Related Work
Topology designs become popular due to their simplicity and how easily they can be mapped using the fabrication techniques in practice. Some of these are mesh, torus and folded torus. Mesh is being used in number of NoC architectures namely, Intel’s Teraflops containing 80 cores forming a 10×8 2D mesh[2], Tilera’s 2D mesh [3] with 64 nodes in 8×8 mesh, the TRIPS processor [4] a wormhole routed 2D mesh with virtual channels. The complexity of router implementation depends on the degree of the router as more degree leads to more complexity.Number of buffers and ports required increases with increase in the number of degree.
Topologies like torus[5], folded torus[6] and mesh [7], have uniform degree routers and are easy to implement whereas Dmesh[8], diametricmesh[9], Tmesh[10], xmesh, xtorus and xxtorus[11] all have no uniform degree of routers. The new torus network which has the hypercube Q3 as the basic module has been proposed[12]. The proposed Hyper-torus has the degree 4, and its network has the node and its edge symmetric and is scalable. The extra links and ports require more arbitration level and complex routing algorithms. Extra links in the topology can also help to increase the load balancing capabilities of the topology.
The performance analysis and comparison of 2×4 Network on Chip (NoC)topology has been done. First three common topologies, 2D Mesh topology, 2D Torus topology and hierarchical Mesh topology, are designed and then the performances of these three topologies are analyzed and compared in detail [12]. The simplicity of the regular mesh topology Network on Chip (NoC) architecture leads to reductions in design time and manufacturing cost but it’s also unable to efficiently support cores of different sizes.
2.1 Preliminaries
A network topology comprises of different parameters and these parameters are gauged. On the basis of these parameters a topology is considered better than the other topologies. Various network parameters are explained in brief below.
2.2 Network On Chip (Noc) Topologies
NoCsystems can be implemented with many different topologies. The choice of topology depends on the requirements and parameters like the complexity in implementing the topology, the area it requires and its routing algorithm.
1.Mesh: Mesh is the most frequently used and basic used topology with very simple routing algorithm. Mesh topology is easy to implement and is scalable. Except the corner and diagonal nodes, every node is connected to the four neighboring nodes. A diagonal node has degree 2 and the border nodes have degree 3 and all other nodes have degree 4 as shown in Fig.1. Mesh has a larger network diameter due to its more hop count. For a 4×4 mesh, the network diameter is 6. The diagonal nodes are the one which contribute to more hop count (node 1 to 16). For routing, a XY routing algorithm can be used which routes packet based on the difference in co-ordinates
2.Torus:Torus is another topology that is popular because of its long wrap around links and is formed by connecting the boundary nodes in the same row and column as shown in the Fig.2. Torus has long end to end links which leads to less network diameter and less hop count. For a 4×4 torus topology, the network diameter is 4 as compared to 6 in mesh. In mesh it took 6 hops to reach diagonal nodes while it takes only 2 in torus. Torus requires long wires for the wrap around links when compared to mesh.
3.Folded Torus:Another variant of torus is folded torus. Folded Torus has an advantage of shorter link length which helps to reduce the time packet required to traverse in the interconnected links. Shorter link length also contributes to the reduction in interconnect area required for implementation. Folded torus also has more path diversity than torus and is fault tolerant.
4.Xmesh:Xmesh topology is an advancement of the existing mesh topology where it adds links in the diagonal direction to the mesh topology as shown in Fig.2 which results in the reduction of the diameter of the topology to half. In Xmesh topology, the hop count is also less than that of the mesh topology. The additional diagonal links contribute to increase in the area. Number of extra links added to xmesh topology is n for anXn mesh topology. Xmesh has diameter 3 for a 4×4 topology when compared to mesh which has a diameter value of 6.
5.Xtorus:Xtorus topology is also relatively new topology having diagonal connections where it increases the links in diagonal directions as done in Xmesh. Xtorus inherits the long warp around links of the torus topology. Diameter of Xtorus is better than torus whose diameter is (n-1) when compared to that of torus (n).
6.King mesh:King mesh is a topology used is areas where parallel processing is a key aspect. In king mesh, every node is connected to its neighbor as shown in fig. Nodes that are in the center of the topology have higher degree than other nodes. Routers with higher degree have more area because of the extra buffers that are required for the extra ports that contribute to greater degree. Owing to greater number connections, the diameter of king mesh is less and the relative hoop count is also less.
7.King torus:King torus is another topology like king mesh. King torus, just as king mesh, is a torus topology having connections with all the neighboring routers. King torus will also have lager areas values than other regular topologies because of the extra ports in the routers. It also offers more in performance when compared to king mesh and other topologies as a result of the long connections between end to end nodes, both horizontally and diagonally.
3. Proposed Work: Dia-Torus Topology
In this paper, we propose a new topology, Dia-Torus, which is a hybrid topology with features of both mesh and torus topologies. In addition to the long vertical and horizontal links that torus has, Dia-Torus has diagonal links that connect the four edge nodes with the nodes in the second last row of the topology as shown in Fig 4. The edge routers having diagonal links do not need an additional port. Only the four nodes need 5 port routers that are connected with those diagonal links. Since only four 5 port routers are used irrespective of the size of topology, there is only slight increase in the area. The diagonal links help in reducing the hop count by a significant value. Moreover, the diameter of the topology is also reduced because of the presence of these links. In the Fig.4, nodes 1, 4, 13, 16 are edge nodes and nodes 1 to 4, 5, 8,9,12 and 13 to 16are border nodes.
3.1 Router Architecture
Network on chip (NoC) architectures are implemented as tile based structure to ease on chip communications [13][14][15]. A tile can be a general purpose processor, a memory subsystem etc. A router is attached which connects this tile to neighboring nodes. All communications with the other nodes in the architecture are done via this router. Figure 9(a) shows the tile structure and basic router architecture is shown in Figure 9(b). The router has 4 named on directions as east, west, north and south. There is another port, local port, dedicated port for communication with the processing core. Buffers are usually attached to store incoming data
Present work has a router with an additional port for the diagonal connections. There are only 4routers in the topology that have an extra port. The extra port leads to a slight increase in area of the router but it also manages to route packets efficiently reducing the overall hop count. The router structure is shown in figure 10.
3.2 Routing Algorithm
The routing algorithm for the present work is based upon the routing algorithm for torus. Additional conditions necessary for the topology are added. It has four routers that have an extra port named diagonal port for long diametrical connections. Edge routers have diametrical connections attached to the existing port only and no extra port is added. For anxnDia-torus the routing algorithm is as follows
1. If the source node is a border node:
i. Exceptions are the four edge nodes don’t have horizontal wrap around links so additional condition for packet to move are written.
2. For nodes that are not border nodes, if both x and y offset are greater than n/2,
4.Results And Comparison
The proposed work is compared with different topologies on the basis of several network parameters. The topologies mentioned in the preliminaries are used for the comparison.
T=HTr+ D/v+L/b
Where H is the average hop count from source node to destination node, is the delay in routing on the router, and the unit is cycle/hop,Dis the average distance from the source node to destination node, which usually equals to H, and the unit is hop. vis the wire transmission speed, and the unit is hop/cycle, L is the packet length, and the unit is flit. bis the bandwidth, and the unit is flit/cycle. Generally, different topologies correspond to different H and D values, while Tris dependent on the routing algorithm and the physical implementation of the router.From the Table 1, it can be seen that latency value for proposed work is 26.4% less than that of mesh, 11.08% less than torus, 3.8% less than that of twisted torus. It also has less latency value compared to that of Xmesh, 6.1% less and equal to xtorus but with less number of links.
TH <=2b*Bc/ N
Taking 4×4 node scale for example, for mesh, it takes four links to divide it into two halves and each link is bidirectional, so Bc= 8. In the same way, it is easy to get that Bc for torus,xmesh,xtorusand Dia-Toruswhich equals to 16, 12, 20and 20 respectively.Assuming that all other parameters are constant, the ideal throughput of Dia-Torus increases by 150% when it is compared with that of mesh,and increases by 25% compared with that of torus and60% xmesh.
Table I: Table showing comparison of proposed work with different topologies`
Topology |
Mesh | Torus | Folded Torus | King Mesh | King Torus | Xmesh | Xtorus |
Dia-Torus |
Hop Count |
644 | 514 | 468 | 456 | 396 | 460 | 444 |
444 |
Diameter |
6 | 4 | 4 | 3 | 3 | 3 | 3 |
3 |
Latency |
14.5 | 12 | 13.32 | 10.9 | 10.75 | 11.37 | 10.67 |
10.67 |
Path Diversity |
2.96 | 2 | 2.24 | 4.14 | 4.34 | 1.4 | 2.08 |
1.62 |
Table II: Table showing number of links comparison of proposed work with different topologies
Topology |
Mesh | Torus | Xmesh | Xtorus |
Dia-Torus |
Number of links |
2n(n-1) | 2 | 2 | 2 +2(n-1) |
2 +2 |
Table III: Table showing area comparison of proposed work with different topologies
Topology | Mesh | Torus | Folded Torus | Dia-Torus |
Total area( ) | 2703521.651525 | 2875182.661041 | 2883018.998952 |
2891311.592451 |
Table IV: Table showing energy comparison of proposed work with different topologies
Topology |
Mesh | Torus | Folded Torus |
Dia-Torus |
Total Power consumed( ) | 1.4889e+04 | 1.5121e+04 | 1.1415e+04 |
1.4696e+04 |
5.Conclusions And Future Works
From the above results it can be concluded that the proposed work offers significant advantages in hop count and diameter.Dia-torus also offer more ideal throughput when compared to existing topologies. Latency of the present work is also less. It is comparable with latency values of king torus and xtorus which have more number of links than the proposed work.Proposed work has a trade off in terms of path diversity. When compared with mesh, torus and othertopologies, proposed work loses some path diversity. The proposed topology has a slight increase in the area which is negligible when the advantages are considered. Routing algorithm for the proposed work doesn’t handle faulty links. A new fault free and deadlock free routing algorithm will be developed in near future. Also, the proposed topology will be implemented in ASIC.We are working on implementing the proposed topology on NoC simulator Noxim [14] and the functionality of Noxim [15] will be extended for evaluation of faulty links in a network and network simulation will be done with those faulty links and how the performance is affected.
References
[1] W. J. Dally, B. Towles, Principles and Practices of Interconnection Networks, Morgan Kaufmann Pub., San Francisco, CA, 2004.
[2] S. Vangal et al, “An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS”, In proceedings of IEEEInternational Solid-State Circuits Conf., Digest of Tech. Papers (ISSCC),pp. 98-10, 2007.
[3] A. Agarwal, L. Bao, J. Brown, B. Edwards, M. Mattina, C.C. Miao, C. Ramey, D. Wentzlaff, “Tile Processor: Embedded Multicore for Networking and Multimedia”, In proceedings of HotChips19, Stanford, CA, Aug. 2007.
[4] P. Gratz, C. Kim, R. McDonald, S. W. Keckler, D. C. Burger, “Implementation and evaluation of on-chip network architectures”, In proceedings of Int. Conf. Computer Design,pp.477-484,2006.
[5] Kumar,S, Jantsch,A, Soininen,J.P, Forsell,M, Millberg,M, Oberg,J, Tiensyrja,K, Hemani,A, “A network on chip architecture and design methodology”. In proceedings of IEEE Computer Society Annual Symposium onVLSI,pp.105-112,2002.
[6] W. J. Dally and C. L. Seitz., “The Torus routing chip”, Journal of Distributed Computing, pp.187-196,1986.
[7] W. J. Dally and B. Towles., “Route packets, not wires: on-chip interconnection networks”, In proceedings of DAC,pp. 684-689,2001.
[8] Chifeng Wang, Wen-Hsiang Hu, SeungEun Lee, Nader Bagherzadeh, “Area and power-efficient innovative congestion-aware Network-on- Chip architecture”, Journal of Systems Architecture, pp.24-38,2011.
[9] M. Reshadi, A. Khademzadeh, A. Reza, M. Bahmani, “A Novel Mesh Architecture for On-Chip Networks”, Design and Reuse Industry Articles,2013.
[10] Quansheng Yang, Zhekai Wu, “An Improved Mesh Topology and Its Routing Algorithm for NoC”,In proceedings of International Conference on Computational Intelligence and Software Engineering (CiSE), (2010), pp:1-4.
[11] Liu Yu-hang, Zhu Ming-fa, Wang Jue, Xiao Li-min, Gong Tao, “Xtorus: An Extended Torus Topology for On-Chip Massive Data Communication”, In proceedings ofIEEE 26th International Symposium Workshops & PhD Forum on Parallel and Distributed Processing,pp.2061-2068,2012.
[12] Woo-seo Ki, Hyeong-Ok Lee, Jae-Cheol Oh, “The New Torus Network Design Based On 3-Dimensional Hypercube”,In proceedings of 11th International Conference on Advanced Communication Technology, ICACT 2009, Vol:01, 2009.
[13] A. Hemani et al., “Network on a chip: An architecture for billion transistor era,” in Proc. IEEE NorChip Conf., Nov. 2000, pp. 166–173.
[14] S. Kumar et al., “A network on chip architecture and design methodology,” in Proc. Symp. VLSI, Apr. 2002, pp. 105–112.
[15] W. J. Dally and B. Towles, “Route packets, not wires: on-chip interconnection networks,” in Proc. Design Automation Conf., Jun. 2001, pp.684–689
[16]J uyeob Kim, Miyoung Lee, Wonjong Kim, Junyoung Chang, YounghwanBae and Hanjin Cho “Performance analysis of NoC structure based star mesh topology”,In proceedings of SoC Design Conference, ISOCC ’08, Vol:02, 2008.
[17] F Fazzino., M Palesi., D Patti., “Noxim: Network-on-Chip Simulator”, url: http://sourceforge.net/projects/noxim/ accessed on April, 2016.
[18] K. Swaminathan, D. Thakyal, S. G. Nambiar, G. Lakshminarayanan and Seok-Bum Ko, “Enhanced Noxim simulator for performance evaluation of network on chip topologies,” In proceedings of Recent Advances of Engineering and Computational Sciences (RAECS), 2014, pp. 1-5, 2014.
Authors
Pushpita Chatterjee is working as a research lead at SRM Research Institute, Bangalore since 2012. She completed her PhD from Indian Institute of Technology Kharagpur, India in 2012. She has over 15 publications to her credit in international journals and conferences. Her research interests are mobile computing, distributed and trust computing, wireless ad hoc and sensor networks, network on chip, information-centric networking an software-defined networking
Deewakar Thakyal is working as a research member at SRM Research Institute, Bangalore since 2015. He received his Master’s degree from National Institute of Technology, Tiruchirappalli, India in 2014. His research interests are network on chip, internet of t hings, sensor networks and cyber physical systems.