Mixed-precision Quantisation for Graph Neural Network Acceleration

In recent times, Graph Neural Networks (GNNs) have attracted great attention due to their classification performance on non-Euclidean data. FPGA acceleration proves particularly beneficial for GNNs given their irregular memory access patterns, resulting from the sparse structure of graphs. These unique compute requirements have been addressed by several FPGA and ASIC accelerators, such as HyGCN and GenGNN.

Additionally, quantisation has been widely explored as a method for reducing model complexity and computational latency in neural networks. Networks can benefit from low-precision numerical representations through Quantization-Aware Training (QAT), which aims to minimize accuracy loss in quantised models. Degree-Quant, proposed by Tailor et al., was one of the first suggested approaches in applying QAT to GNNs. After demonstrating high-degree nodes are the predominant source of quantisation error, the authors address this issue by stochastically applying a protection mask at each layer following the Bernoulli distribution. High-degree nodes are computed in a low-precision formats, significantly improving quantised model accuracies.

AGILE (Accelerated Graph Inference Logic Engine) is an FPGA accelerator enabling real-time GNN inference for large graphs, introduced during an FYP project last year (see GitHub). One of its main contributions was a multi-precision node dataflow inspired by DegreeQuant. The accelerator extends the DegreeQuant paradigm, by enabling GNN inference at an arbitrary number of numerical representations, with arbitrary bit widths. As the first GNN accelerator with hardware support for multi-precision computation, significant improvements were observed in throughput and device resource usage. However, it is still an open challenge to support training software for multi-precision inference and demonstrate the accuracy benefits at the software level.

This project involves:

Support the multi-precision GNN paradigm in software by writing PyTorch training code.
Experiment with several multi-precision quantisation set-ups, optimising for inference latency and accuracy across a range of models and datasets.

Potential extension tasks:

Integrate training software into MASE, the Circuits and Systems (CAS) group’s in-house tool chain for ML exploration.
Contribute to the design and verification of AGILE to fully support the proposed quantisation set-ups.

Proposed reading

Graph Neural Networks

A Gentle Introduction to Graph Neural Networks https://distill.pub/2021/gnn-intro
Semi-Supervised Classification with Graph Convolutional Networks https://arxiv.org/abs/1609.02907v4