The Role of Direct Memory Access (DMA) in Reducing CPU Load on High-Speed Data Transfer: A Literature Review

·

Authors: Rizky Parlika, Fathur Rahman, Yoga Ari Tofan
Year: 2026
Publisher: International Journal of Information Technology Science (INTENS)
Type: Journal
DOI / URL: –


Abstract

Direct Memory Access (DMA) plays a crucial role in reducing CPU load in high-speed data transfer systems. This literature review examines how DMA enables direct data transfer between memory and I/O devices without CPU intervention, thereby improving overall system efficiency. The primary issue identified across the literature is the occurrence of CPU bottlenecks in conventional transfer protocols such as Programmed I/O and Interrupt-driven I/O, which cause performance degradation in applications such as High-Performance Computing (HPC) and the Internet of Things (IoT). Through analysis of literature from various scientific sources, including IEEE and ACM journals, this review finds that DMA implementation has been reported to reduce CPU load by up to 80% in gigabit-speed transfer scenarios and significantly decrease processor time compared to conventional methods. These findings underscore the importance of DMA integration in the design of future computing systems, with recommendations for further research on hybrid architectures and DMA implementation in heterogeneous computing.

Keywords: Direct Memory Access, DMA, transfer data, CPU load, computer architecture


Citation

A. Aljumah and M. A. Ahmed, “AMBA Based Advanced DMA Controller for SoC,” Int. J. Adv. Comput. Sci. Appl. IJACSA, vol. 7, no. 3, Mar. 2016, doi: 10.14569/IJACSA.2016.070326.

S. K. Bhadrayya and V. B. Ravishankar, “Central Processing Unit Load Reduction Through Application Code Optimization and Memory Management,” Int. J. Reconfigurable Embed. Syst. IJRES, vol. 14, no. 1, pp. 79–88, Mar. 2025, doi: 10.11591/ijres.v14.i1.pp79-88.

A. C. Fauzan, U. S. A. Baqi, A. A. Auladi, and A. Z. M. Sph, “Direct Memory Access untuk Menghitung Waktu Prosesor Intel Celeron N2840 dan AMD A8-7410 dalam Menangani Transfer Data,” ILKOMNIKA, vol. 1, no. 1, pp. 1–6, Aug. 2019, doi: 10.28926/ilkomnika.v1i1.4.

A. Cilardo, “Evaluation of HPC Acceleration and Interconnect Technologies for High-Throughput Data Acquisition,” Sensors, vol. 21, no. 22, p. 7759, Jan. 2021, doi: 10.3390/s21227759.

K. Taranov, F. Fischer, and T. Hoefler, “Efficient RDMA Communication Protocols,” Dec. 20, 2022, arXiv: arXiv:2212.09134. doi: 10.48550/arXiv.2212.09134.

X. Wei, R. Cheng, Y. Yang, R. Chen, and H. Chen, “Characterizing Off-path SmartNIC for Accelerating Distributed Systems,” presented at the 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23), 2023, pp. 987–1004. Accessed: Mar. 28, 2026. [Online]. Available: https://www.usenix.org/conference/osdi23/presentation/wei-smartnic

Y. Yuan et al., “ORCA: A Network and Architecture Co-design for Offloading us-scale Datacenter Applications,” Oct. 17, 2022, arXiv: arXiv:2203.08906. doi: 10.48550/arXiv.2203.08906.

P. Czarnul, “Investigation of Parallel Data Processing Using Hybrid High Performance CPU + GPU Systems and CUDA Streams,” Comput. Inform., vol. 39, no. 3, pp. 510–536, Dec. 2020, doi: 10.31577/cai_2020_3_510.

C. C. Dobrescu, I. González, D. Carneros-Prado, J. Fontecha, and C. Nugent, “Direct Memory Access-Based Data Storage for Long-Term Acquisition Using Wearables in an Energy-Efficient Manner,” Sensors, vol. 24, no. 15, p. 4982, Jan. 2024, doi: 10.3390/s24154982.

A. (ITEC) Engelhart, “ITEC-OS Staff – An Analysis of DMA Interference Using Synthetic Load from an NVMe Device.” Accessed: Mar. 28, 2026. [Online]. Available: https://os.itec.kit.edu/21_3192.php

M. R. Hidayatulloh, F. A.-H. S. Bahri, A. Muqtashida, R. Gunawan, and S. Azahra, “Studi Sistem Input/Output: Perangkat, Interface, dan Optimalisasi Kinerja Komputer,” J. Ris. Multidisiplin Edukasi, vol. 2, no. 6, pp. 363–369, Jun. 2025, doi: 10.71282/jurmie.v2i6.451.

H. Ather, J. L. Bez, C. Wang, H. Childs, A. D. Malony, and S. Byna, “Parallel I/O Characterization and Optimization on Large-Scale HPC Systems: A 360-Degree Survey,” Dec. 31, 2024, arXiv: arXiv:2501.00203. doi: 10.48550/arXiv.2501.00203.

A. Ruzhanskaia, P. Xu, D. Cock, and T. Roscoe, “Rethinking Programmed I/O for Fast Devices, Cheap Cores, and Coherent Interconnects,” Apr. 24, 2025, arXiv: arXiv:2409.08141. doi: 10.48550/arXiv.2409.08141.

K. Salah, K. El-Badawi, and F. Haidari, “Performance Analysis and Comparison of Interrupt-Handling Schemes in Gigabit Networks,” Comput. Commun., vol. 30, no. 17, pp. 3425–3441, Nov. 2007, doi: 10.1016/j.comcom.2007.06.013.

E. Joelianto, F. Ramdhani, and E. M. Budi, “Analisis Pengaruh Waktu Latensi Terhadap Akurasi Sistem SCADA Bacaan Metering Listrik Waktu Nyata Melalui Jaringan Internet,” J. Rekayasa Elektr., vol. 16, no. 3, Dec. 2020, doi: 10.17529/jre.v16i3.16465.

W. M. Zabołotny, “Versatile DMA Engine for High-Energy Physics Data Acquisition Implemented with High-Level Synthesis,” Electronics, vol. 12, no. 4, p. 883, Jan. 2023, doi: 10.3390/electronics12040883.

K. Cheng, W. Liu, Q. Shen, and S. Liao, “Design and Implementation of High-throughput PCIe with DMA Architecture between FPGA and PowerPC,” Sep. 17, 2018, arXiv: arXiv:1809.07702. doi: 10.48550/arXiv.1809.07702.

C. K. Shaila, M. J. Arthur, and G. Manoj, “Optimizing IoT Applications with RTL-Based DMA Controller for Data Transfers,” presented at the The 2025 International Conference on Advanced Research in Electronics and Communication Systems (ICARECS-2025), Atlantis Press, Jun. 2025, pp. 756–764. doi: 10.2991/978-94-6463-754-0_66.

S. Larsen, S. Larsen, and B. Lee, “Platform IO DMA Transaction Acceleration,” in CACHES. ACM, 2011. Accessed: Mar. 28, 2026. [Online]. Available: https://www.semanticscholar.org/paper/Platform-IO-DMA-Transaction-Acceleration-Larsen-Larsen/847955ec5e2777f771e9ad757638b28671fc5f25

T. Benz et al., “A High-Performance, Energy-Efficient Modular DMA Engine Architecture,” IEEE Trans. Comput., vol. 73, no. 1, pp. 263–277, Jan. 2024, doi: 10.1109/TC.2023.3329930.

C. Lu, H. Yang, and Q. Wu, “Design and Implementation of a Direct Memory Access ControllerBased on Microcontroller Unit,” J. Phys. Conf. Ser., vol. 2221, no. 1, p. 012016, May 2022, doi: 10.1088/1742-6596/2221/1/012016.

D. Tang, Y. Bao, Y. Chen, W. Hu, and M. Chen, “Exploiting the Produce-Consume Relationship in DMA to Improve I / O Performance,” 2009. Accessed: Mar. 28, 2026. [Online]. Available: https://www.semanticscholar.org/paper/Exploiting-the-Produce-Consume-Relationship-in-DMA-Tang-Bao/62ea2cab84e5099edbc5dfa283fa1aa230ce6d8b

D. Lee, L. Subramanian, R. Ausavarungnirun, J. Choi, and O. Mutlu, “Decoupled Direct Memory Access: Isolating CPU and IO Traffic by Leveraging a Dual-Data-Port DRAM,” in 2015 International Conference on Parallel Architecture and Compilation (PACT), Oct. 2015, pp. 174–187. doi: 10.1109/PACT.2015.51.

M. A. A. Ahmad Abdullah Aljumah, M. Gulam, “Design and Implementation of a Direct Memory Access Controller for Embedded Applications,” IJTech – Int. J. Technol., vol. 10, no. 2, pp. 309–319, 2019, doi: 10.14716/ijtech.v10i2.795.

D. Tang, Y. Bao, W. Hu, and M. Chen, “DMA cache: Using On-Chip Storage to Architecturally Separate I/O Data From CPU Data for Improving I/O Performance,” in HPCA – 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture, Jan. 2010, pp. 1–12. doi: 10.1109/HPCA.2010.5416638.

F. Kong, Y. Deng, X. Yi, R. Antonio, and M. Verhelst, “XDMA: A Distributed, Extensible DMA Architecture for Layout-Flexible Data Movements in Heterogeneous Multi-Accelerator SoCs,” presented at the 2025 IEEE 43rd International Conference on Computer Design (ICCD), IEEE Computer Society, Nov. 2025, pp. 690–693. doi: 10.1109/ICCD65941.2025.00104.

S. Saidi, P. Tendulkar, T. Lepley, and O. Maler, “Optimizing Explicit Data Transfers for Data Parallel Applications on The Cell Architecture,” ACM Trans Arch. Code Optim, vol. 8, no. 4, p. 37:1-37:20, Jan. 2012, doi: 10.1145/2086696.2086716.

A. Mera, Y. H. Chen, R. Sun, E. Kirda, and L. Lu, “D-Box: DMA-enabled Compartmentalization for Embedded Applications,” in Proceedings 2022 Network and Distributed System Security Symposium, San Diego, CA, USA: Internet Society, 2022. doi: 10.14722/ndss.2022.24053.

J. Zhu, L. Wang, L. Xiao, and G. Qin, “uDMA: An Efficient User-Level DMA for NVMe SSDs,” Appl. Sci., vol. 13, no. 2, p. 960, Jan. 2023, doi: 10.3390/app13020960.

J. Li et al., “Fastmove: A Comprehensive Study of On-Chip DMA and its Demonstration for Accelerating Data Movement in NVM-based Storage Systems,” ACM Trans Storage, vol. 20, no. 3, p. 19:1-19:30, Jun. 2024, doi: 10.1145/3656477.

D. Li, W. Zhang, M. Dong, and K. Ota, “DMA-Assisted I/O for Persistent Memory,” IEEE Trans. Parallel Distrib. Syst., vol. 35, no. 5, pp. 829–843, May 2024, doi: 10.1109/TPDS.2024.3373003.

D. Syrivelis, A. Reale, K. Katrinis, and C. Pinto, “A Software-defined SoC Memory Bus Bridge Architecture for Disaggregated Computing,” in Proceedings of the 3rd International Workshop on Advanced Interconnect Solutions and Technologies for Emerging Computing Systems, in AISTECS ’18. New York, NY, USA: Association for Computing Machinery, Jan. 2018, pp. 1–4. doi: 10.1145/3186608.3186611.

J.-H. Chae, “High-Bandwidth and Energy-Efficient Memory Interfaces for the Data-Centric Era: Recent Advances, Design Challenges, and Future Prospects,” IEEE Open J. Solid-State Circuits Soc., vol. 4, pp. 252–264, 2024, doi: 10.1109/OJSSCS.2024.3458900.

S. Otani, H. Kondo, I. Nonomura, T. Hanawa, S. Miura, and T. Boku, “Peach: A Multicore Communication System on Chip with PCI Express,” IEEE Micro, vol. 31, no. 6, pp. 39–50, Nov. 2011, doi: 10.1109/MM.2011.93.

A. K. Abousamra, R. G. Melhem, and A. K. Jones, “Déjà Vu Switching for Multiplane NoCs,” in Proceedings of the 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip, in NOCS ’12. USA: IEEE Computer Society, May 2012, pp. 11–18. doi: 10.1109/NOCS.2012.9.

C.-T. Axinte, A. Stan, and V.-I. Manta, “Embedded Streaming Hardware Accelerators Interconnect Architectures and Latency Evaluation,” Electronics, vol. 14, no. 8, p. 1513, Jan. 2025, doi: 10.3390/electronics14081513.

C. Chen, X. Zhao, G. Cheng, Y. Xu, S. Deng, and J. Yin, “Next-Gen Computing Systems with Compute Express Link: a Comprehensive Survey,” Feb. 20, 2025, arXiv: arXiv:2412.20249. doi: 10.48550/arXiv.2412.20249.

M. Kwon et al., “From Block to Byte: Transforming PCIe Solid-State Devices With Compute Express Link Memory Protocol and Instruction Annotation,” IEEE Micro, vol. 45, no. 06, pp. 46–55, Nov. 2025, doi: 10.1109/MM.2025.3581448.

X. Zhang, K. Liu, Y. Chang, K. Zhang, and M. Chen, “DFabric: Scaling Out Data Parallel Applications with CXL-Ethernet Hybrid Interconnects,” Oct. 30, 2024. doi: 10.5555/3768039.3768113.

Y. Fujii, T. Azumi, N. Nishio, S. Kato, and M. Edahiro, “Data Transfer Matters for GPU Computing,” in 2013 International Conference on Parallel and Distributed Systems, Dec. 2013, pp. 275–282. doi: 10.1109/ICPADS.2013.47.

A. Agrawal, S. Aga, S. Pati, and M. Islam, “ConCCL: Optimizing ML Concurrent Computation and Communication with GPU DMA Engines,” presented at the 2025 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), IEEE Computer Society, May 2025, pp. 1–11. doi: 10.1109/ISPASS64960.2025.00018.

Elavarasan, “The Future of Heterogeneous Computing: Integrating CPUs, GPUs, and FPGAs for High-Performance Applications,” Int. J. Artif. Intell. Data Sci. Mach. Learn., vol. 5, no. 1, pp. 20–31, Mar. 2024, doi: 10.63282/3050-9262.IJAIDSML-V5I1P103.

M. Peng, H. Chen, Y. Zhang, and S. Liu, “HyperDMA: Enhancing High-Performance Computing and AI Workflows with Advanced Data Transfer Capabilities,” in 2024 9th International Conference on Integrated Circuits and Microsystems (ICICM), Oct. 2024, pp. 636–644. doi: 10.1109/ICICM63644.2024.10814280.

D. Vaquerizo-Hdez, P. Muñoz, D. F. Barrero, and M. D. R-Moreno, “Continuous Energy Consumption Measure Approach Using a DMA Double-Buffering Technique,” EURASIP J. Wirel. Commun. Netw., vol. 2021, no. 1, p. 172, Aug. 2021, doi: 10.1186/s13638-021-02043-w.

J.-H. Jean and D.-S. Kim, “Hardware-Assisted Low-Latency NPU Virtualization Method for Multi-Sensor AI Systems,” Sensors, vol. 24, no. 24, p. 8012, Jan. 2024, doi: 10.3390/s24248012.

Y. Hong and D. Kim, “Performance and Efficiency Gains of NPU-Based Servers over GPUs for AI Model Inference,” Systems, vol. 13, no. 9, p. 797, Sep. 2025, doi: 10.3390/systems13090797.

J. Wang, G. Lv, Z. Liu, and X. Yang, “Programmable Deterministic Zero-Copy DMA Mechanism for FPGA Accelerator,” Appl. Sci., vol. 12, no. 19, p. 9581, Jan. 2022, doi: 10.3390/app12199581.

S. Gener et al., “RIMMS: Runtime Integrated Memory Management System for Heterogeneous Computing,” Jul. 28, 2025, arXiv: arXiv:2507.20514. doi: 10.48550/arXiv.2507.20514.

L. Zulberti, A. Monorchio, M. Monopoli, G. Mystkowska, P. Nannipieri, and L. Fanucci, “SmartDMA: Adaptable Memory Access Controller for CGRA-based Processing Systems,” in 2024 27th Euromicro Conference on Digital System Design (DSD), Aug. 2024, pp. 306–313. doi: 10.1109/DSD64264.2024.00048.

S. Pati, M. Islam, S. Aga, and M. A. Ibrahim, “DMA Collectives for Efficient ML Communication Offloads,” Nov. 10, 2025, arXiv: arXiv:2511.06605. doi: 10.48550/arXiv.2511.06605.

B. Shan, M. Araya-Polo, and B. Chapman, “DiOMP-Offloading: Toward Portable Distributed Heterogeneous OpenMP,” in Proceedings of the SC ’25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, in SC Workshops ’25. New York, NY, USA: Association for Computing Machinery, Nov. 2025, pp. 1289–1301. doi: 10.1145/3731599.3767505


Links