As a company that offers Offensive Cyber Security solutions as part of our Response Forward methodology in addition to our Incident Response and Proactive Monitoring, we have extensively researched AI and implemented it wherever it is safe and cost effective to do so.
To fully explain its shortcomings, we first need to properly categorize the basics of Offensive Security as the waters have become muddied by unscrupulous marketing and low quality Vulnerability Assessments (VA):
Vulnerability Assessment - The Bottom of The Barrel
Vulnerability Assessments are a largely automated, bare minimum security assessment designed to provide a very low level, initial assessment of a company's threat landscape. Unfortunately many companies are sold a VA under the guise of a Penetration Test (Pentest), often for the price of a full Pentest.
In their most basic form, a Vulnerability Assessment will consist of an automated scan of the IPs and URLs provided by the customer by a tool such as Nessus, Qualys or OpanVAS. Unscrupulous companies will happily label this service as a Pentest, provide the customer the auto-generated report from their tooling of choice and move on to their next victim customer.
A more in-depth VA may involve travel to site or deployment of an agent to perform an internal IP based audit of the environment, adding some additional value by scanning internal hosts.
A good, quality VA will also provide some human derived context around the automated findings which will be added to a report that includes said findings along with actionable recommendations.
If Cyber Security is a checkbox exercise for you and being compromised down the line is an acceptable risk then a VA might be right for you.
Penetration Test - What Most Companies Need
A Penetration Test (Pentest) will combine the automated discovery of a VA alongside manual enumeration, Open Source Intelligence gathering and manual exploitation of discovered vulnerabilities with the intent of actively compromising the in scope assets. This is what most organizations, who have a reasonable level of cybersecurity maturity require at least bi-annually to ensure gaps are closed and vulnerabilities addressed.
A Pentest provides an accurate representation of the level of effort a reasonably competent attacker will go to in order to gain access to those systems tested.
Most real world compromises occur due to automated/scripted attacks launched against every reachable IP address on the internet - the attacker will then follow up manually with those systems identified as exploitable (this is how the vast majority of Ransomware attacks start). A Pentest is a reasonable measure of a company's ability to withstand such attacks at that moment in time.
Red Team - You will Lose
Red Teaming seeks to mirror the Tactics, Techniques and Procedures of highly motivated, funded threats (e.g. Advanced Persistent Threats) encompassing all of the above with the addition of bespoke tooling and highly trained/motivated individuals who will actively exploit your environment to gain a foothold, escalate privileges, move laterally and obtain full network takeover.
Red Teaming is highly effective and understanding your true paths to compromise in the event a competent attacker gained access to your environment and was able to defeat your current tech stack. This is a highly specialised service and in order to ensure real value to the customer, only required after multiple Pentests have been conducted and the Recommendations and Remediation activity has been completed by the customer.
Most companies don’t need Red Teaming (despite how cool it sounds) until they have been through several iterations of both internal and external Pentesting and actioned all recommendations and remediation actions. A customer who engages in Red Team activity who is not mature enough for a cyber resilience perspective will gain little value from the activity as the Red Team will fully compromise the environment within days.
So What?
Now that we have categorised Offensive testing into three distinct groups, it is easier to understand where AI based testing can add value and where it is marketing hype in order to increase shareholder value.
AI Pentesting offers to provide customers with the same level of testing as human driven Pentesting in an automated fashion for a lower price when in reality, they currently offer little more than an AI driven summary of an automated Vulnerability Assessment with little to no added value beyond that.
A Large Language Model (LLM) can be hugely effective at taking the findings of a basic VA and providing that information in a human readable format, but in order to do add any evidential context, it would be required to have access to vast volumes of Pentest research data, and here’s the kicker - Pentest companies don’t actively release this because it A) violates customer confidentiality and B) hurts their bottom line if their competitors can use all of their hard work.
There is merit in having an LLM with a RAG (Retrieval-Augmented Generation) database of CVE’s to provide context around the findings of a VA but this is not a Pentest.
So how do these new AI Pentest startups with no history of Pentesting gain this information to provide an effective report?
Lets suppose we live in a cyber eutopia where all Offensive Security companies have hearts of gold and make all their previous findings available to all to allow everyone to freely train their own models, the running costs of training and maintaining such an LLM would push the operating costs of these companies to the point that pricing for their services would need to match or exceed those of human driven, expert led Pentesting for nothing more than AI powered VA (as many AI startups are now finding out the hard way).
The only other viable option is to query the findings against an existing LLM that has been trained against a huge dataset such as ChatGPT. This approach means that your VA results get pushed into a ChatGPT API query which in turns feeds the model but raises some huge Data Privacy concerns when Threat Actors using the same tool are able to use the same dataset to model their attack campaigns. It's worth noting that some providers offer API options that allegedly don't train the model on your inputs (e.g., OpenAI’s enterprise-tier products). However, this still requires strict controls on data sanitization and retention policies, which many startups skip in their rush to market.
If you are willing to accept the enormous Data Privacy risks then the potential value of adding a human readable wrapper around an automated VA report might be appealing but, you could achieve the same results by uploading your own VA report to ChatGPT and asking it to perform this for you without involving a middle man (please don’t do this).
And this is where current generation AI’s usefulness in a VA ends. What AI cannot do is add real context around which of the VA findings apply to your environment, nor can it understand and remove the large number of false positives from the report. There is value to be had by adding context around findings by using AI to consult a RAG containing common CVEs for example but this is not pentesting, its AI enabled Vulnerability Scanning.
When we compare AI driven VAs to a real, competent Pentest the gap gets even wider. We’ve seen wild claims that various solutions can extrapolate the results of a scan and then build exploit code on the fly and then test said code against the assets in scope. Without human validation, this is comparable to the age-old technique of Fuzzing where random code is fired at a device until it exhibits a response in order to find potential vulnerabilities. We’ve seen business critical devices crash spectacularly when subjected to Fuzzing ranging from Windows Domain Controllers through to Programmable Logic Controllers (PLCs) used to control Dams and Reactors (highly not recommended).
As of the time of writing this blog, AI is hugely ineffective in finding, creating and leveraging exploits as it still lacks the critical thinking and reasoning of a human being. AI currently lacks the human creativity necessary to push a square peg into a round hole, or drive the wrong way up a one way street just to see what would happen - it lacks the tenacity to just “try harder”.
Sean Heelan came close in his research of cve-2025-37899 (published May 22nd 2025) but he still concludes that a large amount of training and steering were required to see results:
https://sean.heelan.io/2025/05/22/how-i-used-o3-to-find-cve-2025-37899-a-remote-zeroday-vulnerability-in-the-linux-kernels-smb-implementation/
There is no doubt that in 3-5 years AI will play a serious role in both Offensive testing and real, adversarial activities but its current limitations and cost of ownership currently outweigh any advantage provided.
AI undoubtedly holds promise in cybersecurity, but when it comes to offensive testing, the gap between promise and practice is still wide. Current tools are better suited to supporting analysts, not replacing them. Whether it’s contextualizing a vulnerability, discerning false positives, or exploiting novel attack paths, human expertise remains essential.
That doesn’t mean AI has no place. Used properly, it can enhance triage, summarize reports, or automate repetitive steps. But marketed as a full replacement for skilled pentesters? That’s more fairy tale than fact, for now.
So before you trade your security posture for a line of buzzword beans, ask yourself: do you want flashy automation, or real protection?
If you are still interested in jumping on the AI Pentesting bandwagon after reading this blog then maybe I can interest you in some AI enabled magic beans to grow your own AI Pentesting beanstalk that maybe takes root in the Cloud or maybe, you would be interested in some seed funding?