Report by Backslash Security
Can AI “Vibe Coding” Be Trusted? It Depends…
7 FINDINGSPublished Apr 24, 2025
View Original Report →Key Findings
Prompts specifying a need for security or requesting OWASP best practices produced more secure results, yet still yielded some code vulnerabilities for 5 out of the 7 LLMs tested.
Backslash SecurityCan AI “Vibe Coding” Be Trusted? It Depends…·Apr 24, 2025
AILLMsVulnerabilities
When prompted to generate secure code, GPT-4o still produced insecure outputs vulnerable to 8 out of 10 issues.
Backslash SecurityCan AI “Vibe Coding” Be Trusted? It Depends…·Apr 24, 2025
AILLMsVulnerabilitiesGPT-4o
In response to simple, “naive” prompts, all LLMs tested generated insecure code vulnerable to at least 4 of the 10 common CWEs.
Backslash SecurityCan AI “Vibe Coding” Be Trusted? It Depends…·Apr 24, 2025
AILLMsVulnerabilities
With naive prompts, ChatGPT scored a 1.5/10 secure code result.
Backslash SecurityCan AI “Vibe Coding” Be Trusted? It Depends…·Apr 24, 2025
AILLMsVulnerabilitiesChatGPT
Claude 3.7 Sonnet scored 6/10 secure code result using naive prompts.
Backslash SecurityCan AI “Vibe Coding” Be Trusted? It Depends…·Apr 24, 2025
AILLMsVulnerabilitiesClaude 3.7 Sonnet
OpenAI’s GPT-4o had the lowest performance, scoring a 1/10 secure code result using "naive" prompts.
Backslash SecurityCan AI “Vibe Coding” Be Trusted? It Depends…·Apr 24, 2025
AILLMsVulnerabilitiesGPT-4o
Claude 3.7 Sonnet scored 10/10 with security-focused prompts.
Backslash SecurityCan AI “Vibe Coding” Be Trusted? It Depends…·Apr 24, 2025
AILLMsVulnerabilitiesClaude 3.7 Sonnet