I have worked on rapidly and scalably detecting vulnerabilities in the source code of smart contracts written in Solidity language. NVD data crawled in the past was useful here for collecting Solidity-related vulnerable source code dataset. I have made following contributions:
I have collected a full dataset of real-world smart contract source codes, numbering up to approximately 1.94 millions. The dataset size approximates up to 40GB, and has been stored in MySQL DBMS.
GitHub - yjkellyjoo/etherscanSourcecodeScraper: scraper for ethereum smart contracts' source code
I have built a complete ANTLR4 parser for Solidity running in Python, which successfully parses source codes written in any compiler version of Solidity. As far as I know, none of the currently existing Solidity parsers succeed in parsing Solidity source code of all versions.
I present a new method to scan vulnerable Solidity code clones rapidly and scalably with scalability and better detection rate than any other existing tools.
I show that a lot of smart contract vulnerabilities propagate through code clones, and that existing vulnerable code clone detectors are incompatible with the Solidity language.
The research results are published as my master’s degree thesis paper: ProSmart: A Precise Mechanism for Scanning Propagated Vulnerable Smart Contracts at Large Scale.
Abstract - With the increasing popularity of blockchain ecosystem and smart contracts, so do security concerns, in particular that of the propagation of vulnerable code, arise. Due to the immutable property of smart contracts and the high financial value a blockchain platform holds today, preventing the replication of vulnerable codes in smart contracts is an urgent problem. To accurately inquire the vulnerable smart contract codes propagated over the real-world blockchain ecosystem, we present ProSmart, a precise mechanism for scanning Propagated vulnerable Smart contracts at large scale. ProSmart proposes a novel approach in detecting smart contract vulnerabilities at large scale, aiming to be the cornerstone of vulnerability detection in the fast-changing smart contract environment.