SparrowHawk: Memory Safety Flaw Detection via Data-driven Source Code Annotation

1. Introduction

SparrowHawk is an automated data-driven annotation system, which aims to annotate specific type of funciton from source code.

Currently, SparrowHawk is implemented to annotate memory allocation/deallocation functions and adopt a code analyzer to detect bugs.


2. Source code

2.1 Environment Prerequisites

  1. Tensorflow == 2.2
  2. Clang Static Analyzer

2.2 Source code structure

We have implemented SparrowHawk with tensorflow in Python scripts, the source codes is available here: source code.

  1. train.py: Re-train your siamese network.
  2. test_other_funcs.py: Use your trained Siamese network to infer similarity for unseen functions.


3. DataSet

The dataset used in our evaluation is available here: Dataset