DOI: 10.7717/peerj-cs.3966 ISSN: 2376-5992

A unified Big-O–based framework for comparative complexity analysis of CNN and Transformer architectures

Peerapol Khunarsa, Pafan Doungpaisan

Convolutional neural networks (CNNs) and Transformer-based architectures are widely adopted across vision and sequence modeling tasks, yet their computational efficiency is often evaluated using empirical benchmarks without a unified theoretical framework. While metrics such as latency, floating-point operations (FLOPs), and memory usage provide practical insight, they do not explicitly characterize how computational costs scale with increasing input resolution or sequence length. This gap limits principled reasoning about model scalability and architectural trade-offs. In this work, we propose a unified Big-O–based framework for the comparative analysis of CNN and Transformer architectures. The framework derives layer-wise asymptotic time and memory complexity for canonical convolutional and self-attention components, explicitly distinguishing forward and backward propagation as well as parameter and activation storage. To connect theory with practice, we introduce a controlled empirical protocol based on synthetic inputs and graphics processing unit (GPU) measurements, in which log–log regression is used to estimate effective scaling exponents with respect to input size. Experimental results show that convolutional networks exhibit near-linear scaling with image resolution, whereas self-attention mechanisms demonstrate quadratic dependence on sequence length, consistent with theoretical complexity analysis. Architectural variants such as depthwise separable convolutions and hierarchical attention reduce constant factors but do not alter the dominant asymptotic order. Importantly, the empirical measurements are used to validate scaling trends predicted by the framework rather than to establish formal upper or lower bounds. This study does not claim universality across hardware platforms, energy consumption, or cognitive efficiency, nor does it provide formal proofs of tight complexity bounds for full training dynamics. Instead, the proposed framework offers a principled and reproducible reference for interpreting empirical efficiency results through the lens of asymptotic complexity, supporting informed comparison and analysis of modern deep learning architectures.

More from our Archive