GAER: Graph Auto-encoders for Unsupervised Software Architecture Recovery
Rakhshanda Jabeen, Morgan Ericsson, Jonas Nordqvist, Anna WingkvistRecovering the modular architecture of software systems from source code remains challenging when documentation is incomplete or outdated. Manual recovery is labor-intensive and does not scale to large systems. Although automated techniques have been proposed, many rely on handcrafted heuristics. They may focus on a limited set of inputs (such as dependencies or text), or combine multiple inputs using fixed rules, rather than integrating structural and semantic cues in a data-driven way.
We introduce GAER, an unsupervised architecture recovery approach based on graph autoencoders. GAER models a system as a heterogeneous dependency graph with typed relations and combines dependency information, folder hierarchy, and code semantics as node features. We study two factors that influence recovery outcomes: the choice of encoder (Graph Attention Network [GAT] vs. Graph Convolution Network [GCN]) and the number of clusters used for the final decomposition. We evaluate GAER across 10 open-source systems and benchmark it against established architecture recovery baselines using standard recovery measures against ground truth mappings. Overall, GAER is competitive, with strong baselines and often achieving higher agreement with ground truth, and the GAT variant generally performs best. By integrating multiple architectural cues in a graph learning framework, GAER produces architectural views that support system comprehension and reduce the effort required to maintain architectural documentation.