Hoang-Gia Vu

Scopus Publications

Boolean-Function-based IP Lookup on FPGAs
Dai-Do Tran, Hoang-Gia Vu
International Conference on Advanced Technologies for Communications, 2023
This paper presents a new approach to IP address lookup based on Boolean functions. Specifically, our method considers an IP address lookup system as a set of Boolean functions sharing a set of input variables. The output vector of such functions indicates the identifier of the input IP address. We then propose a mapping scheme to use each LUT6 as two LUT5s to implement two 5-input Boolean functions, thus increasing the resource efficiency in the IP address lookup systems. Finally, we propose an algorithm for partially mapping Boolean operations onto LUT5s on FPGA. The synthesis results on a Xilinx Artix-7 device show that our mapping scheme brings a much higher hardware efficiency, compared to the non-mapping scheme. Our scheme can save up to 50% of LUT consumption compared with the non-mapping scheme. Compared to the latest prior work, which also implements IP address lookup systems by LUTs, our approach achieves hardware resource efficiency 5x better than Binary content-addressable memory (BiCAM) while keeping the same latency.
Encoder-based Many-Pattern Matching on FPGAs
Hoang-Gia Vu, Ngoc-Dai Bui
25th IEEE Symposium on Low Power and High Speed Chips and Systems Cool Chips 2022 Proceedings, 2022
Many-pattern matching is one of the most essential algorithms in many application domains, such as data mining, network security, and bioinformatics. Such high-throughput application domains require high-performance matching engines, leading to the deployment of the algorithm on hardware. However, such hardware deployment consumes a large number of hardware resources. This challenge becomes more critical when scaling the number of patterns as well as the data throughput. In this paper, we first proposed an encoder-based hardware architecture for many-pattern matching on FPGAs. The matching architecture includes two parts: encoder-based filter and matching block. We also proposed an algorithm to simplify the structure of the encoder-based filter, thus reducing the hardware utilization. The hardware architecture is scalable with the number of patterns and the input data throughput. We evaluated our matching architecture and our algorithm with 2048 32-byte patterns abstracted from Snort rules for malware. The evaluation on Xilinx Zedboard shows that at 2.16 Gbps throughput, the proposed architecture achieves higher hardware efficiency at 0.05 LUTs per character, a block RAM consumption 10% of total device, and almost no flip-flop consumption, while the maximum clock frequency and the latency are 270 MHz and 11 ns, respectively.
Mapping Boolean Functions onto Lookup-Tables on FPGAs
Hoang-Gia Vu, Dai-Do Tran, Ngoc-Dai Bui, Thanh-Bang Le, Hai-Duong Nguyen
Proceedings 2022 Rivf International Conference on Computing and Communication Technologies Rivf 2022, 2022
This paper presents a lookup-table sharing scheme for implementing Boolean functions on Xilinx FPGAs. The scheme aims to exploit each LUT6 primitive on FPGAs as two Boolean functions sharing five input variables. The proposed algorithm searches for sets of five input variables appearing most frequently in the prime implicants of the Boolean function. These sets are then selected for mapping onto the shared five inputs of the two LUT5s inside an LUT6. The synthesis results on Vivado for Xilinx Virtex 7 show that our mapping scheme achieves better hardware resource utilizations in many cases compared to the non-mapping designs. Our proposals also achieve higher maximum clock frequencies on FPGAs than the non-mapping design for the complex Boolean functions.
A Novel In-memory Matching Circuit Based on Non-volatile Resistive Memory
Quang-Kien Trinh, Quang-Manh Duong, Xuan-Tien Do, Van-Phuc Hoang, Hoang-Gia Vu, et al.
Proceedings of 2022 IEEE International Conference on IC Design and Technology ICICDT 2022, 2022
This paper presents a novel in-memory matching circuit realizing the CAM applications based on Non-volatile resistive memory and 2T-2R bit cell structure that provides reliable lookup operations. The evaluations extended to different NV-RAM types (RRAM, PCRAM, and MRAM) demonstrate the high applicability of our design architecture. The advantages of the CAM matching circuit are verified by Monte Carlo simulations using the 65nm CMOS process technology. Compared to other conventional approaches, our proposed design can reach relatively low sensing latencies, varying from 0.14 to 0.24 ns while maintaining a good level of search error rates.
Efficient hardware task migration for heterogeneous FPGA computing using HDL-based checkpointing
Hoang-Gia Vu, Takashi Nakada, Yasuhiko Nakashima
Integration, 2021
Task migration plays an important role in load balancing and energy savings in data centers. It also challenges service providers to minimize service interruptions during task migration. FPGA computing requires checkpointing as an essential function for hardware task migration. However, the current methods of implementing such a function for FPGAs have a high cost in hardware resources and significant degradation in performance. To overcome these problems, in this paper we propose a system using checkpointing at the hardware description language (HDL) level for hardware task migration. First, we propose a hardware task migration scheme in which checkpointing procedures and context transfer can overlap to reduce the service downtime. Second, we present a new checkpointing architecture for FPGAs that flattens the structure of nested modules at the HDL level. Third, we propose a static analysis of the original HDL source code to reduce the cost of hardware. Fourth, we introduce a Python-based tool to generate the checkpointing architecture at the HDL level. We evaluated our checkpointing architecture and the migration scheme using four application benchmarks running on a heterogeneous FPGA cluster. Our evaluations showed that the migration downtime was minimized at only 1.251 ms in the S-Search benchmark. When compared with a tree-based checkpointing architecture, the proposed architecture with the static analysis can reduce the LUT overhead by up to 50%, on the average. The checkpointing hardware caused small degradation in the maximum clock frequency (1.66% on the average), and consumed small memory footprints. Other comparisons with the previous hardware task migration scheme highlight the advantages of our migration scheme.
Performance Evaluation of Quine-McCluskey Method on Multi-core CPU
Hoang-Gia Vu, Ngoc-Dai Bui, Anh-Tu Nguyen, ThanhBangLe
Proceedings 2021 8th Nafosted Conference on Information and Computer Science Nics 2021, 2021
The Quine-McCluskey method is an algorithm to minimize Boolean functions. Although the method can be programmed on computers, it takes a long time to return the set of prime implicants, thus slowing the analysis and design of digital logic circuits. As a result, it slows down the dynamic reconfiguration process of programmable logic devices. In this paper, we first propose a data representation for storing implicants in memory to reduce the cache misses of the program. We then propose an algorithm to find all prime implicants of a Boolean function. The algorithm aims to reuse the data available on cache, thus decreasing cache misses. After that, we propose an algorithm for step 2 of the Quine-McCluskey method to select the minimal number of essential prime implicants. The evaluation shows that our proposals achieve much higher performance than the original Quine-McCluskey method. The number of essential prime implicants is a low percentage, less than 50%, of the total prime implicants generated in step 1 of the method.
An Approach to Design a Multi-Protocol Gateway Device for Internet of Things System
Thanh Bang Le, Hoang-Gia Vu, Hai Duong Nguyen, The Son Vu
Proceedings 2021 8th Nafosted Conference on Information and Computer Science Nics 2021, 2021
Due to the expansion of the Internet of Things (IoT) system in both the number of devices and the connection technologies, new extremely high criteria for data processing speed, bandwidth and security have been developed. One of the solutions to achieve those requirements is to use a device that integrates multiple communicating protocols (IoT gateway) at the edge of the IoT system. The paper introduces an approach to design this device based on an embedded computing platform that connects different protocols of ZigBee and LoRa sensor networks. The experimental results have proved the feasibility of the proposed method.
Prefix-based multi-pattern matching on FPGA
Hoang-Gia Vu, Yen Hoang Thi
Proceedings 2020 International Conference on Green and Human Information Technology Icghit 2020, 2020
Multi-pattern matching refers to the search for multiple patterns in a given text at the same time. This matching on FPGA is expected to scale with the number of patterns in hardware consumption. In this paper, we propose a matching architecture that compares the prefixes of multiple patterns with the prefix of the matching window in parallel. The comparison will continue with the body of each pattern if the corresponding prefix is matched. This architecture is called the prefix-based multi-pattern matching architecture. Our implementation on FPGA shows that the proposed matching architecture achieves much higher performance than the implementation on CPU, while the hardware cost is low.
A tree-based checkpointing architecture for the dependability of FPGA computing
Hoang-Gia VU, Shinya TAKAMAEDA-YAMAZAKI, Takashi NAKADA, Yasuhiko NAKASHIMA
IEICE Transactions on Information and Systems, 2018
Efficient multitasking on FPGA using HDL-based checkpointing
Hoang-Gia Vu, Takashi Nakada, Yasuhiko Nakashima
Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 2018
Multitasking on FPGA is a method allowing multiple users to share a reconfigurable fabric, thus improving the flexibility of hardware task management. However, current multitasking schemes bring with it considerable performance degradation and several issues, that can be solved. In this paper, we first present a multitasking scheme based on checkpointing in the hardware description language (HDL) level. The scheme can eliminate the need for reading the bitstream back, thus reducing the task switch latency. We then propose a new HDL-based checkpointing architecture for FPGA computing. Third, we propose a static analysis of the original HDL source code in order to reduce the hardware overhead caused by the checkpointing insertion. Our evaluations show that the proposed architecture with the static analysis can reduce up to 50% of the LUT overhead, compared with the tree-based checkpointing architecture. The checkpointing architecture causes small degradation in maximum clock frequency (1.65% on average), while it consumes low memory footprints. Comparisons with previous multitasking schemes highlight the advantages of our scheme.
CPRring: A structure-aware ring-based checkpointing architecture for FPGA computing
Hoang Gia Vu, Shinya Takamaeda-Yamazaki, Takashi Nakada, Yasuhiko Nakashima
Proceedings IEEE 25th Annual International Symposium on Field Programmable Custom Computing Machines Fccm 2017, 2017
CPRtree: A tree-based checkpointing architecture for heterogeneous FPGA computing
Proceedings 2016 4th International Symposium on Computing and Networking Candar 2016, 2017

Hoang-Gia Vu

RESEARCH INTERESTS

Scopus Publications