Hoang-Gia Vu

@lqdtu.edu.vn

Department of Microprocessor engineering
Le Quy Don Technical University



           

https://researchid.co/giavh

RESEARCH INTERESTS

Reconfigurable computing
Embedded system
computing architecture

12

Scopus Publications

Scopus Publications

  • Boolean-Function-based IP Lookup on FPGAs
    Dai-Do Tran and Hoang-Gia Vu

    IEEE
    This paper presents a new approach to IP address lookup based on Boolean functions. Specifically, our method considers an IP address lookup system as a set of Boolean functions sharing a set of input variables. The output vector of such functions indicates the identifier of the input IP address. We then propose a mapping scheme to use each LUT6 as two LUT5s to implement two 5-input Boolean functions, thus increasing the resource efficiency in the IP address lookup systems. Finally, we propose an algorithm for partially mapping Boolean operations onto LUT5s on FPGA. The synthesis results on a Xilinx Artix-7 device show that our mapping scheme brings a much higher hardware efficiency, compared to the non-mapping scheme. Our scheme can save up to 50% of LUT consumption compared with the non-mapping scheme. Compared to the latest prior work, which also implements IP address lookup systems by LUTs, our approach achieves hardware resource efficiency 5x better than Binary content-addressable memory (BiCAM) while keeping the same latency.

  • Mapping Boolean Functions onto Lookup-Tables on FPGAs
    Hoang-Gia Vu, Dai-Do Tran, Ngoc-Dai Bui, Thanh-Bang Le, and Hai-Duong Nguyen

    IEEE
    This paper presents a lookup-table sharing scheme for implementing Boolean functions on Xilinx FPGAs. The scheme aims to exploit each LUT6 primitive on FPGAs as two Boolean functions sharing five input variables. The proposed algorithm searches for sets of five input variables appearing most frequently in the prime implicants of the Boolean function. These sets are then selected for mapping onto the shared five inputs of the two LUT5s inside an LUT6. The synthesis results on Vivado for Xilinx Virtex 7 show that our mapping scheme achieves better hardware resource utilizations in many cases compared to the non-mapping designs. Our proposals also achieve higher maximum clock frequencies on FPGAs than the non-mapping design for the complex Boolean functions.

  • A Novel In-memory Matching Circuit Based on Non-volatile Resistive Memory
    Quang-Kien Trinh, Quang-Manh Duong, Xuan-Tien Do, Van-Phuc Hoang, Hoang-Gia Vu, Van-Ngoc Dinh, and Xuan-Uoc Dao

    IEEE
    This paper presents a novel in-memory matching circuit realizing the CAM applications based on Non-volatile resistive memory and 2T-2R bit cell structure that provides reliable lookup operations. The evaluations extended to different NV-RAM types (RRAM, PCRAM, and MRAM) demonstrate the high applicability of our design architecture. The advantages of the CAM matching circuit are verified by Monte Carlo simulations using the 65nm CMOS process technology. Compared to other conventional approaches, our proposed design can reach relatively low sensing latencies, varying from 0.14 to 0.24 ns while maintaining a good level of search error rates.

  • Encoder-based Many-Pattern Matching on FPGAs
    Hoang-Gia Vu and Ngoc-Dai Bui

    IEEE
    Many-pattern matching is one of the most essential algorithms in many application domains, such as data mining, network security, and bioinformatics. Such high-throughput application domains require high-performance matching engines, leading to the deployment of the algorithm on hardware. However, such hardware deployment consumes a large number of hardware resources. This challenge becomes more critical when scaling the number of patterns as well as the data throughput. In this paper, we first proposed an encoder-based hardware architecture for many-pattern matching on FPGAs. The matching architecture includes two parts: encoder-based filter and matching block. We also proposed an algorithm to simplify the structure of the encoder-based filter, thus reducing the hardware utilization. The hardware architecture is scalable with the number of patterns and the input data throughput. We evaluated our matching architecture and our algorithm with 2048 32-byte patterns abstracted from Snort rules for malware. The evaluation on Xilinx Zedboard shows that at 2.16 Gbps throughput, the proposed architecture achieves higher hardware efficiency at 0.05 LUTs per character, a block RAM consumption 10% of total device, and almost no flip-flop consumption, while the maximum clock frequency and the latency are 270 MHz and 11 ns, respectively.

  • Efficient hardware task migration for heterogeneous FPGA computing using HDL-based checkpointing
    Hoang-Gia Vu, Takashi Nakada, and Yasuhiko Nakashima

    Elsevier BV
    Abstract Task migration plays an important role in load balancing and energy savings in data centers. It also challenges service providers to minimize service interruptions during task migration. FPGA computing requires checkpointing as an essential function for hardware task migration. However, the current methods of implementing such a function for FPGAs have a high cost in hardware resources and significant degradation in performance. To overcome these problems, in this paper we propose a system using checkpointing at the hardware description language (HDL) level for hardware task migration. First, we propose a hardware task migration scheme in which checkpointing procedures and context transfer can overlap to reduce the service downtime. Second, we present a new checkpointing architecture for FPGAs that flattens the structure of nested modules at the HDL level. Third, we propose a static analysis of the original HDL source code to reduce the cost of hardware. Fourth, we introduce a Python-based tool to generate the checkpointing architecture at the HDL level. We evaluated our checkpointing architecture and the migration scheme using four application benchmarks running on a heterogeneous FPGA cluster. Our evaluations showed that the migration downtime was minimized at only 1.251 ms in the S-Search benchmark. When compared with a tree-based checkpointing architecture, the proposed architecture with the static analysis can reduce the LUT overhead by up to 50%, on the average. The checkpointing hardware caused small degradation in the maximum clock frequency (1.66% on the average), and consumed small memory footprints. Other comparisons with the previous hardware task migration scheme highlight the advantages of our migration scheme.

  • Performance Evaluation of Quine-McCluskey Method on Multi-core CPU
    Hoang-Gia Vu, Ngoc-Dai Bui, Anh-Tu Nguyen, and ThanhBangLe

    IEEE
    The Quine-McCluskey method is an algorithm to minimize Boolean functions. Although the method can be programmed on computers, it takes a long time to return the set of prime implicants, thus slowing the analysis and design of digital logic circuits. As a result, it slows down the dynamic reconfiguration process of programmable logic devices. In this paper, we first propose a data representation for storing implicants in memory to reduce the cache misses of the program. We then propose an algorithm to find all prime implicants of a Boolean function. The algorithm aims to reuse the data available on cache, thus decreasing cache misses. After that, we propose an algorithm for step 2 of the Quine-McCluskey method to select the minimal number of essential prime implicants. The evaluation shows that our proposals achieve much higher performance than the original Quine-McCluskey method. The number of essential prime implicants is a low percentage, less than 50%, of the total prime implicants generated in step 1 of the method.

  • An Approach to Design a Multi-Protocol Gateway Device for Internet of Things System
    Thanh Bang Le, Hoang-Gia Vu, Hai Duong Nguyen, and The Son Vu

    IEEE
    Due to the expansion of the Internet of Things (IoT) system in both the number of devices and the connection technologies, new extremely high criteria for data processing speed, bandwidth and security have been developed. One of the solutions to achieve those requirements is to use a device that integrates multiple communicating protocols (IoT gateway) at the edge of the IoT system. The paper introduces an approach to design this device based on an embedded computing platform that connects different protocols of ZigBee and LoRa sensor networks. The experimental results have proved the feasibility of the proposed method.

  • Prefix-based multi-pattern matching on FPGA
    Hoang-Gia Vu and Yen Hoang Thi

    IEEE
    Multi-pattern matching refers to the search for multiple patterns in a given text at the same time. This matching on FPGA is expected to scale with the number of patterns in hardware consumption. In this paper, we propose a matching architecture that compares the prefixes of multiple patterns with the prefix of the matching window in parallel. The comparison will continue with the body of each pattern if the corresponding prefix is matched. This architecture is called the prefix-based multi-pattern matching architecture. Our implementation on FPGA shows that the proposed matching architecture achieves much higher performance than the implementation on CPU, while the hardware cost is low.

  • A tree-based checkpointing architecture for the dependability of FPGA computing
    Hoang-Gia VU, Shinya TAKAMAEDA-YAMAZAKI, Takashi NAKADA, and Yasuhiko NAKASHIMA

    Institute of Electronics, Information and Communications Engineers (IEICE)

  • Efficient multitasking on FPGA using HDL-based checkpointing
    Hoang-Gia Vu, Takashi Nakada, and Yasuhiko Nakashima

    Springer International Publishing
    Multitasking on FPGA is a method allowing multiple users to share a reconfigurable fabric, thus improving the flexibility of hardware task management. However, current multitasking schemes bring with it considerable performance degradation and several issues, that can be solved. In this paper, we first present a multitasking scheme based on checkpointing in the hardware description language (HDL) level. The scheme can eliminate the need for reading the bitstream back, thus reducing the task switch latency. We then propose a new HDL-based checkpointing architecture for FPGA computing. Third, we propose a static analysis of the original HDL source code in order to reduce the hardware overhead caused by the checkpointing insertion. Our evaluations show that the proposed architecture with the static analysis can reduce up to 50% of the LUT overhead, compared with the tree-based checkpointing architecture. The checkpointing architecture causes small degradation in maximum clock frequency (1.65% on average), while it consumes low memory footprints. Comparisons with previous multitasking schemes highlight the advantages of our scheme.

  • CPRring: A structure-aware ring-based checkpointing architecture for FPGA computing
    Hoang Gia Vu, Shinya Takamaeda-Yamazaki, Takashi Nakada, and Yasuhiko Nakashima

    IEEE
    In this paper, we present a new architecture forFPGA checkpointing along with an efficient mechanism. Wethen provide a static analysis of original HDL source code toreduce the cost of hardware for checkpointing functionality. Ourevaluations show that with the proposals, checkpointing hardwarecauses small degradation in maximum clock frequency (less than10%). The LUT overhead varies from 14.4% (Dijkstra) to 103.84%(Matrix Multiplication).