DOI: 10.3390/sym18071083 ISSN: 2073-8994

Exploiting Structural Symmetry of SM4 for an Asymmetric Hardware Architecture: Design and Open-Source Verification on the RISC-V LicheePi 4A Platform

Jianxin Wang, Zixuan Wang, Runze Zhou, Chaoen Xiao, Lei Zhang

Reproducing SM4 (GB/T 32907-2016) hardware-accelerator results on open-source RISC-V platforms is difficult, because most published designs depend on proprietary FPGA toolchains. This paper contributes an asymmetric dual-channel SM4 architecture together with a fully reproducible open-source verification framework; physical on-board acceleration is not claimed and is left as future work. The architecture exploits two algorithmic symmetries of SM4—encryption and decryption differ only in round-key order, and the round transform T shares the byte-wise S-box τ with the key-expansion transform T′—but maps them onto an asymmetric workload. Bulk encryption is throughput-bound, whereas key expansion runs once per session. Accordingly, a 32-stage fully unrolled encryption pipeline (one 128-bit block per cycle in steady state) is paired with a single round function reused iteratively for the key schedule, and encryption and decryption share one datapath via round-key reversal. Because the TH1520 SoC on LicheePi 4A does not expose the Xuantie C910 RoCC port, we verify the design in three reproducible tiers on the board itself: (T1) RTL co-simulation of an sm4_rocc wrapper passes 1040/1040 vectors for both the standalone datapath and the full system. (T2) A pure-C reference model passes 10/10 GB/T 32907-2016 vectors on the real C910 at a measured 291.9 Mbps. (T3) A Linux illegal-instruction trap-and-emulate prototype confirms ISA and OS-level semantics. Open-source synthesis (Yosys + SkyWater Sky130) gives a measured area of 133 kGE and a switching-dominated post-synthesis power estimate of ≈0.28 W at 100 MHz (≈22 pJ/bit, ≈46 Gbps/W). At 100 MHz the unrolled pipeline reaches an RTL simulation-equivalent steady-state throughput of 12.8 Gbps, about 43.9× the software baseline. Every reported number is reproducible with open-source tools only (Icarus Verilog, GTKWave, GCC, Yosys, Sky130 PDK).

More from our Archive