Hoang-Gia Vu

Verified email at lqdtu.edu.vn

Department of Microprocessor engineering
Le Quy Don Technical University




Reconfigurable computing
Embedded system
computing architecture


Scopus Publications

Scopus Publications

  • Encoder-based Many-Pattern Matching on FPGAs
    Hoang-Gia Vu and Ngoc-Dai Bui

    25th IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL Chips 2022 - Proceedings, Published: 2022 IEEE
    Many-pattern matching is one of the most essential algorithms in many application domains, such as data mining, network security, and bioinformatics. Such high-throughput application domains require high-performance matching engines, leading to the deployment of the algorithm on hardware. However, such hardware deployment consumes a large number of hardware resources. This challenge becomes more critical when scaling the number of patterns as well as the data throughput. In this paper, we first proposed an encoder-based hardware architecture for many-pattern matching on FPGAs. The matching architecture includes two parts: encoder-based filter and matching block. We also proposed an algorithm to simplify the structure of the encoder-based filter, thus reducing the hardware utilization. The hardware architecture is scalable with the number of patterns and the input data throughput. We evaluated our matching architecture and our algorithm with 2048 32-byte patterns abstracted from Snort rules for malware. The evaluation on Xilinx Zedboard shows that at 2.16 Gbps throughput, the proposed architecture achieves higher hardware efficiency at 0.05 LUTs per character, a block RAM consumption 10% of total device, and almost no flip-flop consumption, while the maximum clock frequency and the latency are 270 MHz and 11 ns, respectively.

  • Efficient hardware task migration for heterogeneous FPGA computing using HDL-based checkpointing
    Hoang-Gia Vu, Takashi Nakada, and Yasuhiko Nakashima

    Integration, ISSN: 01679260, Pages: 180-192, Published: March 2021 Elsevier BV
    Abstract Task migration plays an important role in load balancing and energy savings in data centers. It also challenges service providers to minimize service interruptions during task migration. FPGA computing requires checkpointing as an essential function for hardware task migration. However, the current methods of implementing such a function for FPGAs have a high cost in hardware resources and significant degradation in performance. To overcome these problems, in this paper we propose a system using checkpointing at the hardware description language (HDL) level for hardware task migration. First, we propose a hardware task migration scheme in which checkpointing procedures and context transfer can overlap to reduce the service downtime. Second, we present a new checkpointing architecture for FPGAs that flattens the structure of nested modules at the HDL level. Third, we propose a static analysis of the original HDL source code to reduce the cost of hardware. Fourth, we introduce a Python-based tool to generate the checkpointing architecture at the HDL level. We evaluated our checkpointing architecture and the migration scheme using four application benchmarks running on a heterogeneous FPGA cluster. Our evaluations showed that the migration downtime was minimized at only 1.251 ms in the S-Search benchmark. When compared with a tree-based checkpointing architecture, the proposed architecture with the static analysis can reduce the LUT overhead by up to 50%, on the average. The checkpointing hardware caused small degradation in the maximum clock frequency (1.66% on the average), and consumed small memory footprints. Other comparisons with the previous hardware task migration scheme highlight the advantages of our migration scheme.

  • Performance Evaluation of Quine-McCluskey Method on Multi-core CPU
    Hoang-Gia Vu, Ngoc-Dai Bui, Anh-Tu Nguyen, and ThanhBangLe

    Proceedings - 2021 8th NAFOSTED Conference on Information and Computer Science, NICS 2021, Pages: 60-64, Published: 2021 IEEE
    The Quine-McCluskey method is an algorithm to minimize Boolean functions. Although the method can be programmed on computers, it takes a long time to return the set of prime implicants, thus slowing the analysis and design of digital logic circuits. As a result, it slows down the dynamic reconfiguration process of programmable logic devices. In this paper, we first propose a data representation for storing implicants in memory to reduce the cache misses of the program. We then propose an algorithm to find all prime implicants of a Boolean function. The algorithm aims to reuse the data available on cache, thus decreasing cache misses. After that, we propose an algorithm for step 2 of the Quine-McCluskey method to select the minimal number of essential prime implicants. The evaluation shows that our proposals achieve much higher performance than the original Quine-McCluskey method. The number of essential prime implicants is a low percentage, less than 50%, of the total prime implicants generated in step 1 of the method.

  • An Approach to Design a Multi-Protocol Gateway Device for Internet of Things System
    Thanh Bang Le, Hoang-Gia Vu, Hai Duong Nguyen, and The Son Vu

    Proceedings - 2021 8th NAFOSTED Conference on Information and Computer Science, NICS 2021, Pages: 452-457, Published: 2021 IEEE
    Due to the expansion of the Internet of Things (IoT) system in both the number of devices and the connection technologies, new extremely high criteria for data processing speed, bandwidth and security have been developed. One of the solutions to achieve those requirements is to use a device that integrates multiple communicating protocols (IoT gateway) at the edge of the IoT system. The paper introduces an approach to design this device based on an embedded computing platform that connects different protocols of ZigBee and LoRa sensor networks. The experimental results have proved the feasibility of the proposed method.

  • Prefix-based multi-pattern matching on FPGA
    Hoang-Gia Vu and Yen Hoang Thi

    Proceedings - 2020 International Conference on Green and Human Information Technology, ICGHIT 2020, Pages: 68-69, Published: February 2020 IEEE
    Multi-pattern matching refers to the search for multiple patterns in a given text at the same time. This matching on FPGA is expected to scale with the number of patterns in hardware consumption. In this paper, we propose a matching architecture that compares the prefixes of multiple patterns with the prefix of the matching window in parallel. The comparison will continue with the body of each pattern if the corresponding prefix is matched. This architecture is called the prefix-based multi-pattern matching architecture. Our implementation on FPGA shows that the proposed matching architecture achieves much higher performance than the implementation on CPU, while the hardware cost is low.

  • A tree-based checkpointing architecture for the dependability of FPGA computing
    Hoang-Gia VU, Shinya TAKAMAEDA-YAMAZAKI, Takashi NAKADA, and Yasuhiko NAKASHIMA

    IEICE Transactions on Information and Systems, ISSN: 09168532, eISSN: 17451361, Volume: E101D, Pages: 288-302, Published: February 2018 Institute of Electronics, Information and Communications Engineers (IEICE)

  • Efficient multitasking on FPGA using HDL-based checkpointing
    Hoang-Gia Vu, Takashi Nakada, and Yasuhiko Nakashima

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), ISSN: 03029743, eISSN: 16113349, Volume: 10824 LNCS, Pages: 590-602, Published: 2018 Springer International Publishing
    Multitasking on FPGA is a method allowing multiple users to share a reconfigurable fabric, thus improving the flexibility of hardware task management. However, current multitasking schemes bring with it considerable performance degradation and several issues, that can be solved. In this paper, we first present a multitasking scheme based on checkpointing in the hardware description language (HDL) level. The scheme can eliminate the need for reading the bitstream back, thus reducing the task switch latency. We then propose a new HDL-based checkpointing architecture for FPGA computing. Third, we propose a static analysis of the original HDL source code in order to reduce the hardware overhead caused by the checkpointing insertion. Our evaluations show that the proposed architecture with the static analysis can reduce up to 50% of the LUT overhead, compared with the tree-based checkpointing architecture. The checkpointing architecture causes small degradation in maximum clock frequency (1.65% on average), while it consumes low memory footprints. Comparisons with previous multitasking schemes highlight the advantages of our scheme.

  • CPRring: A structure-aware ring-based checkpointing architecture for FPGA computing
    Hoang Gia Vu, Shinya Takamaeda-Yamazaki, Takashi Nakada, and Yasuhiko Nakashima

    Proceedings - IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM 2017, Pages: 192, Published: 30 June 2017 IEEE
    In this paper, we present a new architecture forFPGA checkpointing along with an efficient mechanism. Wethen provide a static analysis of original HDL source code toreduce the cost of hardware for checkpointing functionality. Ourevaluations show that with the proposals, checkpointing hardwarecauses small degradation in maximum clock frequency (less than10%). The LUT overhead varies from 14.4% (Dijkstra) to 103.84%(Matrix Multiplication).

  • CPRtree: A tree-based checkpointing architecture for heterogeneous FPGA computing
    Proceedings - 2016 4th International Symposium on Computing and Networking, CANDAR 2016, Pages: 57-66, Published: 13 January 2017