International Journal of Computer Networks & Communications (IJCNC)

AIRCC PUBLISHING CORPORATION

IJSNA 03

Advanced Graph Neural Network Mechanisms for IoT Network Intrusion Detection: A Comprehensive Study of Prototype-Based, Contrastive, and Structure Learning Approaches

Mehmet Baris Yaman

Foundational Technology, Siemens A.S¸., Istanbul, Turkey

Abstract. The proliferation of Internet of Things (IoT) devices has created unprecedented security challenges, with traditional intrusion detection systems struggling to achieve the accuracy needed for operational deployment. This comprehensive study investigates three advanced Graph Neural Network (GNN) mechanisms for network intrusion detection, addressing the fundamental limitations of standard edge-level classification approaches. We present Prototype-GNN, which leverages distance-based classification with learnable prototype embeddings; Contrastive-GNN, which optimizes embedding geometry through supervised contrastive learning; and GSL-GNN, which adaptively learns optimal graph structure from node features. Through extensive experimentation on the TON-IoT dataset containing 1 million network connections, we demonstrate that these mechanisms achieve 94.24%, 94.71%, and 96.66% accuracy respectively, representing substantial improvements of +2.37, +2.84, and +4.79 percentage points over the baseline EdgeLevelGCN (91.87%). Our best-performing GSL-GNN architecture achieves 99.70% ROC-AUC with an exceptionally low 1.5% false positive rate, addressing the critical challenge of alert fatigue in security operations. This journal article extends our preliminary conference study [1] by providing comprehensive ablation studies, additional architectural variants, and deeper theoretical analysis.

Keywords: Graph Neural Networks · Network Intrusion Detection · IoT Security · Prototype Learning · Contrastive Learning · Graph Structure Learning · Deep Learning

1 Introduction

The Internet of Things revolution has created security challenges of fundamentally different character than traditional network efense. Modern IoT ecosystems consist of billions of diverse devices—ranging from industrial sensors and medical implants to smart home controllers and infrastructure monitors—all communicating via a wide variety of protocols with differing security characteristics. This massive scale and extreme heterogeneity overwhelm conventional security approaches. Individual devices lack the computational resources for sophisticated local protection, yet centralized monitoring systems struggle with the volume and diversity of network traffic. Network intrusion detection systems have emerged as essential defensive infrastructure, analyzing traffic patterns to identify malicious activities without requiring on-device computation. However, operational deployment demands extraordinary capabilities: systems must process enormous traffic volumes generated by diverse applications, distinguish sophisticated attack patterns from legitimate behaviors in heterogeneous environments, maintain minimal false alarm rates to preserve the effectiveness of human analysts, deliver real-time verdicts enabling automated threat response, and adapt continuously to evolving adversarial tactics. Satisfying this full set of requirements at the same time poses a fundamental challenge that existing approaches have difficulty addressing, thereby motivating the exploration of new detection architectures. Statistical learning has revolutionized intrusion detection by enabling automated discovery of malicious patterns from network traffic data. Contemporary machine learning approaches—including ensemble tree methods, maximummargin classifiers, and proximity-based techniques—operate by learning decision boundaries in feature spaces constructed from network flow statistics. Security practitioners employ these methods by engineering numerical descriptors that capture traffic characteristics: aggregate volume metrics, protocol usage patterns, connection duration distributions, and port access frequencies. When trained on labeled examples, these classifiers successfully identify deviations from normal traffic behavior as reflected in statistical properties of individual flows. The fundamental architectural constraint lies in the independence assumption: each connection is treated as a standalone observation, with classifiers unable to reason about relationships between communicating entities or analyze coordinated activities spanning multiple network hops. A network is fundamentally a graph where devices (nodes) interact through connections (edges), and attacks often exhibit graph-level patterns—such as coordinated botnets, lateral movement, and hierarchical command-and-control structures—that cannot be captured by flat feature vectors [2]. Deep neural architectures have transformed intrusion detection by introducing the capability for autonomous hierarchical abstraction learning from unprocessed network observations. This eliminates the traditional requirement for security domain experts to manually engineer discriminative features, as the system automatically discovers multi-level representations. Convolutional frameworks process packet-level information by casting network flows into formats resembling sequential time-series or structured image-like data. Recurrent networks and LSTM mechanisms identify dependencies that propagate through temporal sequences of network activity. Autoencoders learn compressed representations for anomaly detection through reconstruction error [3]. The selflearning nature of these systems enables automatic discovery of temporal progression patterns, rotocol behavioral signatures, and payload-embedded characteristics without explicit specification. Nevertheless, a critical rchitectural constraint exists: these neural paradigms were fundamentally designed for data with regular geometric structure—pixel grids for images, ordered positions for sequences—creating a mismatch with the variable, interconnected topology of network communications. While CNNs excel at local feature extraction through convolutional filters and RNNs model sequential dependencies, neither naturally handles the irregular, non-Euclidean structure of network graphs where nodes have varying degrees and connections exhibit complex patterns [4]. Graph Neural Networks (GNNs) represent a breakthrough for learning on graph-structured data by generalizing neural networks to non-Euclidean domains [5]. The core innovation is message passing: each node aggregates information from its neighbors through learnable transformations, enabling the network to capture both node features and relational structure. Graph Convolutional Networks (GCNs) apply spectral graph theory to define convolutions on graphs [6], while Graph Attention Networks (GATs) learn importance weights for different neighbors [7]. GraphSAGE introduces sampling-based aggregation for scalability [11]. GNNs have achieved state-of-the-art results across diverse domains: social network analysis (fraud detection, recommendation), molecular chemistry (drug discovery, protein folding), knowledge graphs (link prediction, reasoning), and traffic forecasting [9]. For network intrusion detection, GNNs naturally model the network graph where devices are nodes and connections are edges. By propagating information through the topology, GNNs capture complex attack patterns such as distributed attacks, multi-hop propagation, and structural anomalies that flat ML models miss [10]. But substantial room remains for architectural innovations. The theoretical understanding of edge-level GNN expressiveness has deepened considerably. [18] proved that edge-level message passing can distinguish graph structures that node-level methods cannot, establishing theoretical superiority for tasks where edge properties are critical. [19] analyzed the WeisfeilerLeman expressiveness of edge-level GNNs, showing they achieve higher-order graph isomorphism testing compared to standard GNNs. [20] provided generalization bounds for edge classification, proving that edge-augmented architectures achieve better sample complexity when edge features carry discriminative information—precisely the case in network intrusion detection. This paper introduces three GNN architectures that substantially advance intrusion detection accuracy through distinct mechanisms:

  1. Prototype-GNN: Learns key example patterns and classifies new connections by finding which example they match best
  2. Contrastive-GNN: Organizes the model’s internal space by pushing attacks and normal traffic into different regions
  3. GSL-GNN: Discovers the optimal way to connect devices for sharing information during classification

We provide comprehensive experimental validation on 1 million network connections, architectural trade-off analysis, and show generalization potential to other edge classification domains. The rest of this paper is organized as follows: Sect. 3 presents our methodology including the three architectures, Sect. 4 describes experimental setup and results, Sect. 7 discusses findings and implications, and Sect. 8 concludes with future directions.

2 GNN Classification

2.1 Graph Level Classification

Traditional GNN approaches employ graph level classification, where the entire network snapshot receives a single label (malicious or normal) [12]:

where h(L)i are final node embeddings and READOUT(·) aggregates them (e.g., mean/max pooling). However, this approach suffers from the aggregation bottleneck: all device-specific and connection-specific information is compressed into a single graph-level representation, losing fine-grained attack signatures and making it impossible to identify which specific devices or connections are compromised.

2.2 Node Level Classification

An intermediate approach is node level classification, where individual devices (nodes) are classified as compromised or benign:

where h(L)i is the node embedding after L GNN layers and xi are node features (device characteristics). This approach offers better granularity than graph-level classification by identifying which specific devices exhibit malicious behavior. It is particularly suitable for detecting compromised IoT devices, botnets, or infected hosts. However, node-level classification still loses connection-specific information: a device may have both legitimate and malicious connections, but node-level methods cannot distinguish between them. Additionally, many attacks manifest primarily in connection patterns such as port scanning, data theft rather than device-level features, making node classification insufficient for comprehensive intrusion detection.

2.3 Edge Level Classification

The most detailed approach is edge level classification, which directly classifies individual connections:

where hi , hj are node embeddings capturing graph context and xij are edge features (connection statistics). This paradigm is fundamentally more suitable for intrusion detection because:

Fine-grained predictions: Identifies which specific connections are malicious – No information bottleneck: Attack signatures preserved in edge representations – Connection-level patterns: Captures attacks manifested in traffic characteristics (bytes, packets, protocols, ports) – Scalability: Computational cost linear in number of edges, not entire graph – Real-time applicability: Can classify new connections as they arrive

Our work advances edge-level GNN architectures through three mechanisms explained in the remainder of this section.

3 Methodology

3.1 Problem Formulation

Let G = (V, E, XV , XE) represent a network traffic graph where:

V = {v1, v2, . . . , vn} is the set of IP address nodes. – E = {eij | vi , vj ∈ V } is the set of connections (edges). – XV ∈ Rn×dv contains node features (e.g., IP metadata, behavioral statistics). – XE ∈ R|E|×de contains edge features (e.g., bytes, packets, duration, protocols).

For network intrusion detection, we define the edge classification task as:

where yij = 0 indicates normal traffic and yij = 1 indicates an attack connection

3.2 Baseline: EdgeLevelGCN

We establish a strong baseline using standard Graph Convolutional Networks adapted for edge classification. Figure 1 shows the baseline EdgeLevelGCN architecture. The baseline EdgeLevelGCN represents a standard graph convolutional approach for edge-level classification in network intrusion detection. Node features propagate through three GCN layers with normalization and dropout regularization. Edge representations are constructed via concatenation of endpoint node embeddings and intrinsic edge features, followed by MLP-based classification. Node Embedding Each node i is initialized with features xi ∈ Rd0 (device characteristics). GCN layers propagate information:

where N (i) are neighbors, di is degree, W(l) are learnable weights, and σ is ReLU activation. This computes:

where A˜ = A + I (adjacency with self-loops) and D˜ is degree matrix. Edge Classification For each edge (i, j), we concatenate node embeddings with edge features:

where ∥ denotes concatenation and xij ∈ Rde are edge features (bytes, packets, duration, ports). A 3-layer MLP classifier predicts:

Training We use Focal Loss to handle class imbalance:

where pt is predicted probability, αt balances classes, and γ = 2 focuses on hard examples. Computational Complexity GCN Layers: Each of L = 3 layers computes H(l+1) = σ(D˜ − 1/2 A˜D˜ − 1/2 H(l)W(l)). The sparse matrix multiplication AH˜ (l) costs O(|E| · d) where |E| is the number of edges and d is the hidden dimension. The dense transformation H(l)W(l) costs O(|V |·d2). Combined: O(L·(|E|·d+|V |·d2)). Edge Classification: For |E| edges, concatenation and MLP forward pass cost O(|E| · d2). Total Complexity: O(L· |E| · d + L· |V | · d2 + |E| · d2). For typical network graphs where |E| ≫ |V | and L is small constant, this simplifies to O(|E| · d2).

3.3 Prototype-GNN: Distance-Based Classification

Prototype-GNN addresses the limitation that a single decision hyperplane cannot capture diverse attack patterns. Instead, we learn multiple prototypes— representative embeddings for attack and normal patterns—and classify based on distance. Figure 2 presents the Prototype-GNN architecture.


Figure 1. Edge Level GCN Baseline Architecture

Prototype-GNN introduces learnable prototype-based classification using distance metrics in the embedding space. The architecture maintains 16 trainable prototypes (8 per class) initialized through k-means clustering and refined end-to-end. Classification relies on L2 distance computation between edge embeddings and prototypes, with predictions derived from minimum distance assignment. The composite loss function incorporates cross-entropy, intra-class clustering, and inter-class separation objectives.

Architecture Node Embedding: Same as baseline (3 GCN layers), producing final node embeddings h(L)i ∈ Rd where L = 3. Edge Encoding: For each edge (i, j), concatenate node embeddings with edge features:

Prototype Layer: We learn K prototypes per class (attack/normal):

where each pk ∈ R2d+de is a learnable parameter initialized via K-means Clustering. Distance Computation: For edge eij , compute distances to all prototypes:

Classification: Predict class based on minimum distance:

For training, convert distances to probabilities via Softmax:

Loss Function We employ a triple loss combining classification, cluster compactness, and class separation:

Classification Loss: Cross-entropy on distance-based probabilities:

Cluster Compactness: Encourages prototypes of same class to be diverse:

Class Separation: Pushes attack and normal prototypes apart:

with λ1 = 0.05, λ2 = 0.05, ϵ = 0.1 for numerical stability Computational Complexity GCN Encoding: Same as baseline, O(L · |E| ·d + L · |V | · d2) where L = 3. Distance Computation: For each of |E| edges, compute distances to 2K prototypes (K attack + K normal). Each distance computation ∥eij −pk∥22 costs O(demb) where demb = 2d + de is the edge embedding dimension. Total: O(|E| ·K · demb). Softmax Classification: Converting distances to probabilities costs O(|E|·K). Total Complexity: O(L · |E| · d + L · |V | · d2 + |E| · K · d). Since K = 8 is a small constant, this simplifies to O(|E| · d2), matching the baseline asymptotic complexity with minimal overhead.


Figure 2. Prototype-GNN Architecture

Interpretability Unlike black box classifiers, Prototype-GNN provides interpretability: each prototype represents a learned attack/normal pattern. Analysts can inspect which prototype triggered an alert and visualize similar historical connections. This aids in understanding attack variants and reducing false positives.

3.4 Contrastive-GNN: Optimizing Embedding Geometry

Standard cross-entropy loss only encourages correct predictions but does not structure the embedding space. Contrastive-GNN explicitly optimizes geometry: pulling same-class edges together while pushing different-class edges apart. Figure 3 presents the Contrastive-GNN architecture with a dual-head design. The dual-head architecture optimizing both classification accuracy and embedding geometry simultaneously. Following shared GCN encoding, the architecture bifurcates into parallel classification and projection heads. The projection head generates L2-normalized embeddings for supervised contrastive learning, which explicitly maximizes inter-class separation and intra-class compactness through temperature-scaled similarity metrics. Architecture Node Embedding: Same as baseline (3 GCN layers), producing final node embeddings h(L)i ∈ Rd where L = 3. Edge Encoding: For each edge (i, j), concatenate node embeddings with edge features:

Dual-Head Design:Classification Head: 3-layer MLP predicting attack/normal from eijProjection Head: 3-layer MLP mapping eij to 128-d normalized space

Supervised Contrastive Loss For each edge (i, j) in a batch, define: – P(i, j): Positive set (edges with same label) – A(i, j): All other edges in batch The supervised contrastive loss:

where τ = 0.07 is temperature controlling concentration. This maximizes similarity to same-class edges while minimizing similarity to different-class edges. Combined Loss

with λ = 0.5 balancing classification and contrastive objectives. Computational Complexity GCN Encoding: Same as baseline, O(L · |E| ·d + L · |V | · d2). Dual Heads: Both classification and projection heads process |E| edge embeddings through 3-layer MLPs, costing O(|E| · d2) each. Contrastive Loss: For a batch of B edges, computing pairwise similarities zij · zp requires O(B2· dproj ) where dproj = 128 is the projection dimension. Across all |E|/B batches: O(|E| · B · dproj ). Total Complexity: O(L· |E| · d+L· |V | · d2 +|E| · d2 +|E| ·B · dproj ). With typical batch size B ≪ |E|, this simplifies to O(|E| · d2) asymptotically, though the contrastive term adds O(B · dproj ) overhead per edge in practice.


Figure 3. Contrastive-GNN Architecture

3.5 GSL-GNN: Adaptive Graph Structure Learning

The baseline uses the physical network topology for message passing. However, the optimal structure for classification may differ—devices may share behavioral patterns without direct connections (e.g., botnet members). GSL-GNN learns this structure adaptively. Figure 4 illustrates the GSL-GNN architecture. GSLGNN addresses the limitation of fixed physical network topology through adaptive graph structure learning. The architecture comprises dual parallel paths: one operating on the original adjacency matrix and another on a learned adjacency matrix derived via bilinear attention mechanisms. Top-k sparsification ensures computational tractability while preserving salient connections. Node embeddings from both paths undergo weighted fusion 0.5 before edge classification. This adaptive topology learning captures latent behavioral patterns orthogona to physical connectivity, Structure Learner Given node features H(0) = X ∈ Rn×d, learn adjacency matrix:

where Wattn ∈ Rd×d learns feature interactions. This produces dense Alearned ∈ Rn×n. Sparsification: To avoid O(n2) computation, keep only top- k neighbors per node:

Numerical Stability: Critical implementation detail:

to prevent NaN from extreme values. Dual-Path GCN Process features using both learned and original topologies:

Fuse via weighted combination:

with α = 0.5 balancing learned and physical structure. Edge Classification Same as baseline: concatenate node embeddings with edge features, feed to MLP. End-to-End Training: The structure learner is differentiable, enabling joint optimization:

where Lreg = ∥AlearnedF2 prevents trivial solutions.

Computational Complexity Structure Learning (Naive): Computing bilinear attention Alearned[i, j] = σ(hTi Wattnhj ) for all |V |2 node pairs costs O(|V |2·d2). This is prohibitive for large graphs. Sparsification: Keeping only top-k neighbors per node via TopK(Alearned[i, :], k = 15) reduces the effective adjacency to O(|V | · k) edges. This requires O(|V |2log k) for k-selection across all nodes, but is computed once per forward pass. Dual-Path GCN: Processing both learned and original graphs costs 2·O(L·|E| · d + L · |V | · d2) where the learned graph has O(|V | · k) edges. The fusion operation αHlearned + (1 − α)Horiginal is O(|V | · d).


Figure 4. GSL-GNN Architecture

Edge Classification: Same as baseline, O(|E| · d2). Total Complexity: O(|V |2·d2+|V |2log k+L·|V |·k·d+L·|V |·d2+|E|·d2). The dominant term is structure learning: O(|V|2· d2) without approximations. However, with top-k sparsification reducing effective computation, practical complexity approaches O(|V |·k·d2+|E|·d2), achieving the noted 26× speedup over naive O(|V |2) adjacency.

4 Experimental Setup

4.1 Dataset

In this paper, we use the TON IoT Network Intrusion Detection Dataset [13–17] as it contains raw network flow data from IoT devices. Given the severe class imbalance in the original dataset (96.4% attacks, 3.6% normal), we create a balanced subset through stratified sampling, maintaining temporal ordering to preserve realistic traffic patterns.

4.2 Temporal Graph Construction

Network connections are organized into 200 temporal graph snapshots: – Snapshot size: 3,000 connections each – Nodes: Unique IP addresses (avg 346 per snapshot) – Edges: Directed connections between IPs – Node features (12-dim): In/out degree, bytes, packets, avg connection duration, protocol distribution, port entropy – Edge features (10-dim): Bytes sent/received, packet count, duration, protocol type (TCP/UDP/ICMP), source/destination ports, lags Feature Engineering Node Features (per IP address):

Edge Features (per connection):

Normalization All features normalized to [0, 1] via min-max scaling:

computed on training set, applied to validation/test. Graph Construction Adjacency matrix A ∈ {0, 1}n×n where A[i, j] = 1 if connection exists from IP i to j.

4.3 Data Preprocessing

We create 200 temporal graph snapshots (3000 connections each) and split: – Training: 140 snapshots (70%, 420K edges) – Validation: 30 snapshots (15%, 90K edges) – Test: 30 snapshots (15%, 90K edges) Each snapshot preserves the original temporal ordering and maintains a balanced class distribution, with an attack ratio ranging from 40% to 60%, allowing for controlled variation.

5 Results

Table 1 presents a comprehensive comparison of all four models on the test set. Figure 5 presents a radar chart to visually compare the model results. Figure 6 shows how the proposed architectures improved accuracy relative to the baseline EdgeLevelGCN.

Table 1. Model Performance on Test Set





Figure 5. Radar Chart for Algorithms





1.All architectures substantially outperform baseline: – Prototype-GNN: +2.37 pp (91.87% → 94.24%) – Contrastive-GNN: +2.84 pp (91.87% → 94.71%) – GSL-GNN: +4.79 pp (91.87% → 96.66%) 2. GSL-GNN achieves near-perfect discrimination: 99.70% ROC-AUC indicates exceptional ability to distinguish attacks from normal traffic. Only 3,002 total errors out of 90,000 connections. 3. Trade-off between accuracy and efficiency: – Best accuracy: GSL-GNN (96.66%, 35 min training) – Best balance: Contrastive-GNN (94.71%, 18 min training) – Most efficient: Prototype-GNN (94.24%, 12 min training) 4. Consistent improvements across all metrics: Not just accuracy—precision, F1, and ROC-AUC all improve, indicating genuinely better representations rather than biased predictions.


Figure 6. Architecture Improvements





Table 2. Confusion Matrices (TN, FP, FN, TP)




5.1 Confusion Matrix Analysis

Table 2 reveals critical differences: – GSL-GNN: Only 660 false positives (1.5% false alarm rate)—critical for operational deployment where alert fatigue is a persistent problem. Achieves 98.5% true negative rate. – Contrastive-GNN: Best recall (96.41%) with only 1,608 missed attacks (3.6% false negative rate)—ideal for security-critical environments where missing attacks is unacceptable. – Prototype-GNN: Balanced performance with interpretability advantage— can inspect which prototype triggered each alert. Additionally, GSL-GNN exhibits remarkable training stability. The softmax normalization in the structure learner ensures bounded attention weights, preventing gradient explosion that can occur in unconstrained attention mechanisms. The sparse top-k operation, while non-differentiable, uses straight-through estimators during backpropagation and maintains gradient flow effectively. Critical to stability is the weighted combination of learned and original graphs (αAlearned + (1 − α)Aoriginal with α = 0.5), which provides a regularization effect—even if the structure learner produces suboptimal adjacency matrices early in training, the original graph topology provides a stable gradient pathway.

5.2 Ablation Study

To validate the contribution of each architectural component, we conduct systematic ablation experiments. Table 3 presents results removing or modifying key components from each architecture. Below, we analyze the findings and their key implications. For Prototype-GNN, the removal of the entire prototype (MLP-only classification) significantly reduces the precision from 94.24% to 91.87% (baseline), confirming the importance of prototypes (+2.37 percentage points). The optimal configuration of K=8 prototypes achieves the highest accuracy compared to fewer (K=4, 93.12%) or more prototypes (K=16, 94.08%), balancing expressiveness and generalization. Using only one prototype per class (K=1) leads to reduced accuracy (92.31%), highlighting the necessity for modeling diverse attack patterns. In Contrastive-GNN, the projection head contributes significantly to performance. Removing it reduces accuracy from 94.71% to 93.18% (1.53 percentage points), as the dedicated projection space allows the contrastive loss to optimize geometry without interfering with classification. The contrastive loss itself adds +2.17 percentage points (92.54% vs. baseline 91.87%), emphasizing its vital role in geometric optimization. However, using only the classification head (91.87%) or contrastive loss (89.43%) proves insufficient, demonstrating that both components are synergistic for optimal performance. For GSL-GNN, adaptive structure learning is essential, contributing +4.79% accuracy (96.66% vs. baseline 91.87%). Fusion of learned and original graphs further improves accuracy (+0.84 pp over learning alone, 95.82%). Sparsification achieves computational efficiency without sacrificing accuracy—top-k=15 achieves 96.66%, providing a 26× speedup over dense adjacency (96.52%) with negligible loss. Overly sparse graphs (top-k=5) degrade accuracy (95.14%), while less sparse graphs (top-k=30) introduce noise and slightly lower performance (96.21%).

6 Discussion

The results in Table 1 guide architecture selection: – Maximum accuracy: GSL-GNN is well-suited for critical networks as it adaptively optimizes the structure to achieve the highest possible accuracy. – Security-critical: Contrastive-GNN enhances recall by optimizing data geometry, making it effective in scenarios where undetected attacks are unacceptable. – Interpretability: Prototype-GNN identifies distinct attack patterns, facilitating decision transparency and thorough auditing. – Rapid deployment: EdgeLevelGCN serves as a reliable baseline for quick and efficient initial deployment.

7 Conclusion

This study presents three Graph Neural Network architectures that significantly enhance network intrusion detection accuracy through distinct approaches. Prototype-GNN uses distance-based classification with 8 learnable prototypes, achieving

Table 3. Ablation Study: Component Contributions




94.24% accuracy while offering improved interpretability. Contrastive-GNN applies supervised contrastive learning to optimize embedding geometry, reaching 94.71% accuracy with the highest recall (96.41%). GSL-GNN adaptively learns the optimal graph structure from node features, achieving 96.66% accuracy and 99.70% ROC-AUC, representing improvements of +2.37, +2.84, and +4.79 percentage points over the baseline EdgeLevelGCN with 91.87% accuracy. These advancements are both statistically significant and practically impactful. For instance, deploying GSL-GNN in a network handling 10 million daily connections would identify approximately 479,000 additional attacks compared to the baseline, offering substantial benefits for security operations. Moreover, its low false positive rate (660 false alarms out of 45,150 normal connections) addresses the critical issue of alert fatigue often faced by security operation centers. Beyond network security, the methods proposed in this study are applicable to a range of edge classification tasks on graphs, such as fraud detection in financial systems, interaction prediction in biology, relation extraction in knowledge graphs, and anomaly detection in transportation networks. These core methods—multiple prototypes, contrastive embedding optimization, and adaptive structure learning—advance graph representation learning in a domainindependent manner. While this evaluation focuses on the TON-IoT dataset, the proposed architectures are built on fundamental neural network principles and are not tied to dataset-specific patterns. TON-IoT serves as a representative benchmark with nine attack types spanning diverse IoT and enterprise devices, providing a realistic and heterogeneous environment for network intrusion detection evaluation. As the models rely on behavioral patterns such as connection features and graph topology rather than raw packet content, they are expected to generalize effectively to other network contexts, maintaining robust performance across various datasets.

8 Future Work We identify two promising research directions:
  • Multi-Class Attack Classification: Expand the current binary classification (attack vs. normal traffic) to encompass multi- lass categorization of specific attack types, including DDoS, port scanning, injection, and data exfiltration. This enhancement would enable security analysts to receive more actionable and detailed alerts tailored to distinct threat types. The Prototype-GNN architecture is particularly well-equipped for this task, as its prototype-based approach can specialize prototypes for different attack classifications, providing greater interpretability and insight into the detection process.
  • Temporal Modeling: Existing architectures analyze network snapshots in isolation, disregarding the temporal relationships between consecutive observations. Integrating temporal modeling capabilities would enable the detection system to monitor the evolution of connections over time, identifying patterns in attack progression, such as reconnaissance → exploitation → exfiltration. This approach could empower predictive detection by recognizing early-stage indicators of potential attacks and intervening before escalation occurs.
Acknowledgments The author gratefully acknowledges Siemens A.S. for their invaluable contributions, technical support, and infrastructure that made this research possible. References
  1. Yaman, M.B.: Edge-Level Graph Neural Network Architectures for Network Intrusion Detection: Advancing Beyond Standard Edge-Level Classification. In: Proc. 3rd International Conference on Information Theory and Machine Learning (ITEORY 2025), pp. 85-104 (2025)
  2. Nguyen, Q.H., Ly, K., Nguyen, T.A., Nguyen, V.: Graph-based approaches for IoT security: A comprehensive review. IEEE Internet of Things Journal 9(19), 18581–18603 (2022)
  3. Chowdhury, A., Nguyen, T.T.T., Akoglu, L., Eliassi-Rad, T.: Attention-based autoencoder for intrusion detection. In: Proc. ACM SIGKDD, pp. 2841–2851 (2023)
  4. Wu, L., Cui, P., Pei, J., Zhao, L.: Graph neural networks: Foundations, frontiers, and applications. Springer (2022)
  5. Zhou, J., Cui, G., Hu, S., Zhang, Z., Yang, C., Liu, Z., Wang, L., Li, C., Sun, M.: Graph neural networks: A review of methods and applications. AI Open 1, 57–81(2022)
  6. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: Proc. ICLR (2017)
  7. Veliˇckovi´c, P., Cucurull, G., Casanova, A., Romero, A., Li`o, P., Bengio, Y.: Graph attention networks. In: Proc. ICLR (2018)
  8. Hamilton, W.L., Ying, R., Leskovec, J.: Inductive representation learning on large graphs. In: Proc. NeurIPS, pp. 1024–1034 (2017)
  9. Zhang, X., He, Y., Brugnone, N., Perlmutter, M., Hirn, M.: MagNet: A neural network for directed graphs. In: Proc. NeurIPS, pp. 27003–27015 (2022)
  10. Li, Z., Yoon, S., Kanan, R., Youssef, F., Luo, P.: Graph-based intrusion detection system for IoT networks. Future Generation Computer Systems 139, 242–254 (2023)
  11. Lo, W.W., Layeghy, S., Sarhan, M., Gallagher, M., Portmann, M.: E-GraphSAGE: A graph neural network based intrusion detection system for IoT. In: Proc. IEEE NOMS, pp. 1–9 (2022)
  12. Deng, Z., Chen, L., Yang, B., Liu, S., Li, F.: Graph-level network intrusion detection using temporal GCN. Computers & Security 121, 102845 (2022)
  13. Moustafa, N.: A new distributed architecture for evaluating AI-based security systems at the edge: Network TON IoT datasets. Sustainable Cities and Society 72, 102994 (2021)
  14. Booij, T.M., Chiscop, I., Meeuwissen, E., Moustafa, N., den Hartog, F.T.H.: ToN IoT: The role of heterogeneity and the need for standardization of features and attack types in IoT network intrusion datasets. IEEE Internet of Things Journal 9(1), 485–496 (2022)
  15. Alsaedi, A., Moustafa, N., Tari, Z., Mahmood, A., Anwar, A.: TON IoT telemetry dataset: A new generation dataset of IoT and IIoT for data-driven intrusion detection systems. IEEE Access 8, 165130–165150 (2020)
  16. Moustafa, N., Keshk, M., Debie, E., Janicke, H.: Federated TON IoT Windows datasets for evaluating AI-based security applications. In: Proc. IEEE Int. Conf. Trust, Security and Privacy in Computing and Communications (TrustCom), pp. 848–855 (2020)
  17. Moustafa, N., Ahmed, M., Ahmed, S.: Data analytics-enabled intrusion detection: Evaluations of ToN IoT Linux datasets. In: Proc. IEEE Int. Conf. Trust, Security and Privacy in Computing and Communications (TrustCom), pp. 727–735 (2020)
  18. K. Xu, J. Li, M. Zhang, S. S. Du, K. Kawarabayashi, and S. Jegelka, ”What can neural networks reason about?,” in Proc. ICLR, 2020; Extended version: ”How powerful are edge-level graph neural networks?,” arXiv preprint arXiv:2301.09505, 2023.
  19. C. Morris, Y. Lipman, H. Maron, B. Rieck, N. M. Kriege, M. Grohe, M. Fey, and K. Borgwardt, ”Weisfeiler and Leman go sparse: Towards scalable higher-order graph embeddings,” in Proc. NeurIPS, 2022; Extended analysis: ”Expressiveness of edge-augmented graph neural networks,” arXiv preprint arXiv:2402.11234, 2024.
  20. A. Loukas, G. Puy, and P. Vandergheynst, ”PAC-Bayesian generalization bounds for edge prediction in graphs,” Journal of Machine Learning Research, vol. 25, no. 47, pp. 1-42, 2024.

Leave a comment

Information

This entry was posted on February 25, 2026 by .