微软亚洲研究院

MobiCom 2024 highlights from Microsoft Research Asia: Exploring innovations in wireless mobile technology and applications

已发布 2025年2月9日 | 更新 2025年3月13日

分享这个页面

MobiCom is one of the premier international academic conferences in the field of mobile computing and wireless networks. In this article, we select several papers from Microsoft Research Asia that were accepted at MobiCom 2024. These papers explore a diverse range of topics, including mobile task automation, remote auscultation, DNN inference, gas sensing, passive sensing, wireless sensing, and more.

AutoDroid: LLM-powered Task Automation in Android

Mobile task automation is an attractive technique that aims to enable voice-based, hands-free user interaction with smartphones. However, existing approaches suffer from poor scalability due to limited language understanding capabilities and non-trivial manual effort by developers or end-users. Recent advances in large language models (LLMs) for language understanding and reasoning inspire us to rethink the problem from a model-centric perspective, where task preparation, comprehension, and execution are handled by a unified language model. In this work, researchers introduce AutoDroid, a mobile task automation system that can handle arbitrary tasks on any Android application without manual effort. The key insight is to combine the commonsense knowledge of LLMs and the domain-specific knowledge of apps through automated dynamic analysis. Key components include a functionality-aware UI representation method that bridges the UI with the LLM, exploration-based memory injection techniques that augment the LLM’s app-specific domain knowledge, and a multi-granularity query optimization module that reduces the cost of model inference. Researchers integrate AutoDroid with off-the-shelf LLMs, including online GPT-4/GPT-3.5 and on-device Vicuna, and evaluate its performance on a new benchmark for memory-augmented Android task automation with 158 common tasks. The results demonstrated that AutoDroid is able to precisely generate actions with an accuracy of 90.9%, and complete tasks with a success rate of 71.3%, outperforming the GPT-4-powered baselines by 36.4% and 39.7%, respectively.

AutoDroid 的工作流程图 — *Figure: The workflow of AutoDroid*

Exploring the Feasibility of Remote Cardiac Auscultation Using Earphones

Remote video consultations offer convenient and accessible medical assessments from home. However, they can’t currently assess heart health through cardiac auscultation. To solve this problem, in the paper “Exploring the Feasibility of Remote Cardiac Auscultation Using Earphones“, researchers introduce “Asclepius”, which transforms earphones into stethoscopes, allowing doctors to hear heart sounds (PCG signals) during video calls.

Asclepius 框架图 — *Figure: Overview of Asclepius*

Asclepius uses a low-cost peripheral to convert earphone speakers into microphones, capturing faint PCG signals in the ear canal. The system includes signal processing algorithms to eliminate reverberation and correct distortions. It uses the MAX5402EUA chip for impedance matching and voltage detection, ensuring compatibility with various earphones and devices.

Implemented on a double-layer PCB board and following IRB protocols, Asclepius was tested with 30 volunteers. Results show it effectively recovers PCG signals from different earphones, excelling in signal preprocessing, segmentation, and two-stage recovery using UNet models. This technology could enhance remote medical services by enabling cardiac auscultation through ordinary earphones.

FlexNN: Efficient and Adaptive DNN Inferenceon Memory-Constrained Edge Devices

Deep neural network (DNN) models are increasingly deployed on customer devices, such as smartphones, autonomous vehicles, robotics, and drones. However, the limited growth in memory capacity, combined with memory sharing in multi-application environments, has made memory overhead a significant bottleneck for DNN deployment. Due to challenges in memory fragmentation and delays in dynamic memory management, existing DNN inference systems often load all model parameters into memory sequentially. This approach struggles to meet the memory demands.

To address this challenge, researchers introduce FlexNN, a DNN inference framework designed for memory-constrained devices with dynamic memory-hierarchy management. FlexNN formalizes the problem as a time-space 2D bin packing issue and breaks traditional tensor boundaries by employing a fine-grained “slice-load-compute” strategy. This enables concurrent disk loading and computation, drastically reducing memory overhead. Experimental results show that FlexNN reduces memory consumption by 93.81% with only a 3.64% increase in latency, without compromising model accuracy. FlexNN has earned four AE badges for reproducibility and reusability.

对比传统方法(a)(b)，FlexNN (c) 同时减少内存碎片和磁盘加载等待时间，极大降低了内存需求和延迟。 — Figure: Comparison between different memory management strategies. Our preloading-aware static memory management can simultaneously reduce memory fragments and I/O waiting time. The numbers in tensor blocks represent the order of allocation.

FlexNN is a collaboration work between AIR, Tsinghua University and the Heterogeneous Computing group (HEX) of Microsoft Research Asia. FlexNN is part of a broader effort of HEX to design new virtual memory systems for deep learning models, alongside previous works like Pre-gated MoE and Ripple.

Gastag: A Gas Sensing Paradigm using Graphene-based Tags

To address the challenges of high costs and complex maintenance in traditional methods for detecting explosive and toxic gases, this paper introduces Gastag, a novel gas sensing approach using passive RFID tags. Gastag embeds a small piece of gas-sensitive material into cheap RFID tags. Changes in gas concentration alter the material’s conductivity, affecting the tag’s impedance and received signal, enabling precise gas concentration measurement.

To enhance sensitivity and detection range, the research team developed a new material with high sensitivity and surface area, redesigned the tag antenna, and optimized the placement of the gas-sensitive material to achieve impedance matching. Experiments demonstrate low error rates in gas measurements and extend the operational range to 8.5 meters, enabling large-scale deployment.

Gastag’s innovation lies in transforming commercial RFID tags into gas sensors by quantifying the relationship between gas concentration and signal phase variations. It maintains the tag-reader range and leverages RFID signal frequency diversity to improve sensing accuracy. Tests confirm Gastag’s robust performance in various environments, orientations, and interference conditions, showcasing its effectiveness and wide application potential.

RFID 读卡器和标签的工作原理 — *Figure: Operation of an RFID reader and a tag.*

GPSense: Passive Sensing with Pervasive GPS Signals

Advancements in wireless sensing technology have utilized signals like Wi-Fi, UWB, and acoustic waves for various tasks. However, these systems face challenges such as limited range and interference. This study proposes using continuous GPS signals for wireless sensing, which do not interfere with communication technologies.

The GPSense system achieves passive wireless sensing through GPS signals, reconstructing amplitude and phase information from raw data collected by commercial GPS receiver modules. Researchers developed models tailored to GPS signals and introduced distributed sensing, enhancing performance by fusing signals from multiple satellites.

基于来自卫星信号的感知系统 GPSense — *Figure: The sensing system based on pervasive and interference-free signals from GNSS satellites*

Extensive testing under various conditions verified the system’s robustness. Notably, GPS sensing technology was extended to indoor environments using a low-cost GPS repeater. These experiments demonstrate the GPSense system’s potential in human activity sensing, passive trajectory tracking, and respiration monitoring, proving its effectiveness and adaptability.

MSense: Boosting Wireless Sensing Capability Under Motion Interference

In wireless sensing, a major limitation is that devices and targets must remain stationary during the sensing process, hindering real-life applications. To address this issue, the researchers developed MSense, an innovative solution that enhances wireless sensing under motion interference.

MSense uses commercial millimeter-wave (mmWave) radars and digital beamforming technology to improve reflection signals from the target area. By comparing signals from different body areas, MSense eliminates interference from body and device motions, accurately extracting target motion information. This method works for both periodic and non-periodic motion sensing tasks.

基于毫米波雷达的感知示例 — *Figure: MmWave radar-based sensing primer.*

Experimental results demonstrate MSense’s effectiveness in various applications. In vehicles, it significantly improved the accuracy of detecting driver fatigue indicators like eye blinks, yawns, and nods, while reducing false alarms. For monitoring respiration during motion, MSense accurately estimated respiratory rates in home and gym environments, even detecting changes while running on a treadmill. In gesture recognition on mobile devices, MSense achieved over 93% accuracy. These results validate MSense’s potential in advancing wireless sensing technology.

MuDiS: An Audio-independent, Wide-angle, and Leak-free Multi-directional Speaker

In some public places, the demand for personalized audio experiences is growing. For example, museum visitors may want one-on-one explanations closely tied to the exhibits they are viewing, while gym users may prefer to enjoy their own music without wearing headphones. Traditional speaker systems cannot meet these needs due to audio interference and poor sound direction. Acoustic metasurface technology, with its advanced ability to steer acoustic waves, offers a promising solution by taking sound control to a new level.

Building on this innovation, researchers developed a multi-directional speaker (MuDiS). By using a specially designed acoustic metasurface, MuDiS overcomes the limitations of traditional parametric arrays, such as transducer size and wavefront shape. It generates sound waves that can move in multiple directions with adjustable angles, high concentration, and flexibility. The system also converts ultrasonic waves into audible sounds using air nonlinearity, a mechanism that generates audible frequency difference waves through the nonlinear interaction of two or more ultrasonic waves in the air.

MuDiS has three core functions: independent beam playback, wide-angle digital steering, and leakage suppression. First, the unique metasurface design allows MuDis to connect ultrasonic transducers and reshapes sound waves into an approximately spherical wavefront, enabling a wider dynamic steering angle. An optimized beam-forming algorithm suppresses the sound interference of traditional multibeam systems —used in many speakers and conference room audio systems—improving the user experience. Finally, a nonlinear distortion reduction scheme enhances sound quality.

diagram — Figure: Experimental setup of MuDiS

In performance evaluations, researchers verified MuDiS’s effectiveness and generalizability. Its performance reached the level of commercial single-beam projection directional speakers. Compared with the traditional method of using parametric arrays to create multibeams, MuDiS offers significantly improved steering angles and sound fidelity.

In addition to providing personalized audio tours for museum exhibits and customized sound delivery in gyms, MuDiS has a wider range of applications. It could be combined with existing audio-based monitoring systems such as driver fatigue detection in vehicles or respiratory monitoring during exercise. With its precise sound direction and ability to deliver highly focused audio, MuDiS could enhance these applications by reducing interference from background noise, improving the accuracy of audio-based monitoring systems and enabling clearer, more targeted interactions. For example, its ability to focus sound beams could isolate audio signals needed to detect subtle breathing changes during exercise.