Energy-Aware Edge–Cloud Collaboration for Learning Varieties of English: Cache-Assisted Inference and Evidence-Grounded Feedback
Shiwei Zhang, Cunqian You, Lu Qi, Miao Wei, Xiaojun Wang, Huijuan LuSpeech-first English tutors are increasingly expected to correct pronunciation, explain regional usage, and keep the interaction fast enough for rehearsal. These requirements make energy use, latency, and feedback fidelity part of the same design problem. We revise the edge-cloud tutor as a formally specified, edge-first system for five English varieties: American, British, Indian, Australian, and Canadian English. The system selects among semantic-cache reuse, local retrieval-augmented generation, local generation with pronunciation scoring, and cloud fallback by maximizing a quality-energy-latency utility under explicit constraints. Its semantic cache stores only validated, evidence-versioned feedback; its inference cache reuses hot weights and short-context key-value states; and its feedback composer is restricted to evidence selected from a curated variety knowledge base. A controlled prototype evaluation over matched five-minute tutoring sessions shows that the hybrid cached design reduces total session energy by 41.0% and median latency by 44.3% relative to a cloud-only baseline, while preserving near-cloud variety fidelity (94.2% versus 95.0%). Compared with the same hybrid pipeline without caching, it reduces energy by 16.3% and median latency by 16.6%. Human-rated feedback evaluation further shows higher evidence support and lower answer leakage than a cloud NLP feedback baseline. The results do not claim that edge-first tutoring is always best: edge-only remains the lowest-energy mode, but loses fidelity and diagnostic depth. The main contribution is a transparent operating point for real-time English-variety learning where energy, latency, privacy, and pedagogical quality are jointly reported rather than treated as separate afterthoughts.