Findings of Halborn’s Summer Research on AI-Powered Chatbot, ChatGPT

Yesterday, Halborn, a leading blockchain security firm, unveiled the findings of its summer research, which involved an analysis of 134 smart contracts to assess the capabilities of the top AI-powered chatbot, ChatGPT. The scope of the study was broad, aiming to address various questions regarding the potential of ChatGPT to replace human smart contract auditors. Halborn’s team sought to answer questions about the utility of ChatGPT for students learning about smart contracts, its effectiveness in solving Capture the Flag (CTF) tasks commonly assigned in cybersecurity competitions, and its ability to identify vulnerabilities in code, among other considerations.

Curating Smart Contracts for Analysis

One aspect scrutinized by Halborn was ChatGPT’s proficiency in detecting fundamental textbook vulnerabilities. The team curated a set of sample contracts typically used to illustrate various types of attacks, vulnerabilities, and coding pitfalls. After preprocessing, which involved removing vulnerabilities that appeared too similar to each other and could be considered duplicates, the team narrowed down the sample to 134 contracts.

Evaluating ChatGPT’s Performance

Since chatbots tend to give multiple responses to the same question which can vary based on the prompt, Halborn implemented a testing strategy that involved dividing the assessments into two categories: one where ChatGPT was directly questioned and another where the chatbot was treated as a genuine cybersecurity auditor.

Results of ChatGPT’s Performance

Regarding the rate of detection of vulnerabilities, ChatGPT 3.5 is capable of detecting correctly 73.1% of them with the direct question, while ChatGPT 4 is capable of detecting 76.1%,” Halborn concludes, adding that “With the role-playing prompt the numbers are slightly lower, 70.1% for ChatGPT 3.5 and 67.9% for ChatGPT 4.”

ChatGPT’s Proficiency in Solving Capture the Flag (CTF) Challenges

To assess ChatGPT’s proficiency in solving Capture the Flag (CTF) challenges, Halborn employed tasks primarily sourced from three CTF repositories: Ethernaut, Damn Vulnerable DeFi, and Capture the Ether.

ChatGPT’s Application for Cybersecurity Specialists and Hackers

Halborn also references previous efforts made by other cybersecurity teams to assess ChatGPT’s capabilities in scanning code for malicious fragments. These attempts, supposedly indicating questionable effectiveness, may have been influenced by a relatively limited scope of studies.

