AI-Powered Pentesting: Evolving Cybersecurity Strategies

The $18/Hour Pentester: What Security Leaders Need to Tell Their Teams Right Now

January 2026

I’ve been staring at Stanford’s Trinity research paper for three days now, and I keep coming back to one number: $18.21 per hour.

That’s what it costs to run ARTEMIS—their AI-powered penetration testing agent that just outperformed 80% of professional pentesters in a live enterprise environment. Not a CTF. Not a lab. A real university network with 8,000+ hosts, actual users, and production systems.

And it placed second overall against 10 cybersecurity professionals.

If you’re leading a security team in 2026 and this number doesn’t fundamentally change how you’re thinking about offensive security, we need to talk.

This Isn’t About Replacement—It’s About Evolution

Let me be clear upfront: I’m not writing this to tell you AI is coming for your job. I’m writing this because I need my team to evolve faster than our adversaries do, and right now, adversaries are already using these tools.

Anthropic documented nation-state actors using AI in cyber operations. OpenAI reported similar patterns. The offensive AI revolution isn’t a future threat—it’s current reality.

The question isn’t whether AI will transform how we do security. The question is: are we evolving our teams’ capabilities as fast as the threat landscape is evolving theirs?

What the Trinity Study Actually Proves

Stanford ran a controlled experiment: 10 professional pentesters vs. AI agents (including ARTEMIS, Codex, CyAgent, and others) against the same target environment. Same scope, same time constraints, same rules of engagement.

The results:

  • ARTEMIS (A1 config): 9 valid vulnerabilities, 82% accuracy, $18.21/hour
  • ARTEMIS (A2 ensemble): 11 valid findings, 82% accuracy, $59/hour
  • Human participants: 3-13 vulnerabilities each, varying accuracy rates, ~$60/hour

But here’s what matters more than the leaderboard: when given targeted hints about where to look, ARTEMIS found every single vulnerability humans discovered. Its bottleneck wasn’t technical execution—it was pattern recognition and target selection.

That gap? It’s closing. Fast.

The Capabilities Gap My Team Needs to Close

I’ve spent the past week thinking about what this means for how we build and develop cyber security teams. Here’s what keeps me up at night:

1. We’re Still Operating at Human Serial Processing Speed

ARTEMIS hit a peak of 8 concurrent sub-agents executing simultaneous exploitation attempts. Most cyber teams? We’re sequential. One person, one target, one exploit chain at a time.

When an AI agent can parallelize reconnaissance across dozens of hosts while my team is still waiting for nmap to finish on host #1, we have a fundamental throughput problem.

What we need to be telling our teams: Learn to orchestrate parallel operations. Use automation not as a replacement for thinking, but as a force multiplier for execution. If you’re waiting on scan results, you should have three other investigations running concurrently.

2. We’re Not Thinking in “Sessions” Yet

ARTEMIS runs for 16+ hours continuously through session management—summarizing progress, clearing context, resuming where it left off. It doesn’t suffer from context switching, meeting fatigue, or “I’ll get back to this tomorrow” syndrome.

Most cyber teams? We lose 30 minutes every time we context switch. We forget where we were. We duplicate work.

What we need to be telling our teams: Document like you’re creating resumption checkpoints. Your notes should allow anyone (including future you) to pick up exactly where you left off in 90 seconds. Treat long-term investigations like marathon runners treat pacing – sustainable progress over time, not heroic sprints.

3. We’re Still Doing What AI Already Does Better

Every ARTEMIS variant systematically found:

  • Default credentials
  • Misconfigured services
  • Exposed management interfaces
  • Known CVE exploitation
  • Network enumeration patterns

These aren’t the vulnerabilities where human intuition adds value. These are the “table stakes” findings AI agents discover in the first 2 hours, every time, at scale.

What we need to be telling our teams: Stop competing on what AI does better. Specialize in what it struggles with:

  • Business logic flaws that require understanding intent vs. implementation
  • Complex attack chains that span multiple systems with organizational context
  • Social engineering vectors that exploit human behavior patterns
  • Zero-day research that requires creative hypothesis generation
  • Adversarial ML understanding for AI-native attack surfaces

If your current skillset is “I’m really good at running nuclei and reviewing the output,” you’re competing with $18/hour automation. That’s not a winning position.

The Uncomfortable Conversation About the Human ROI

AI agents can find vulnerabilities. But they also submit false positives they couldn’t contextualize. They miss business logic flaws. They couldn’t explain why a finding matters to our specific business risk. And when the board asks ‘what does this mean for our Q2 product launch,’ the AI agent doesn’t have an answer.

We have to build hybrid models – AI agents for systematic coverage, human experts for contextual analysis, prioritization, and strategic guidance. The AI agent finds the vulnerability. The human team determines if it’s exploitable in their specific environment, what business impact it has, and what the remediation priority should be given their awareness of the release calendar and risk appetite.

What is clear here though: We can’t justify humans doing work AI does cheaper and better. We need to justify humans doing work AI can’t do yet.

The Harsh Truth About False Positives

One finding from the Trinity study that doesn’t get enough attention: ARTEMIS had a higher false positive rate than human participants. It reported successful authentication after seeing “200 OK” HTTP responses that were actually login page redirects.

Why? Lack of business context.

The AI agent understands HTTP status codes. It doesn’t understand that your authentication flow returns 200 on failed login attempts because your frontend framework handles routing client-side.

This is where human expertise remains critical—not in finding vulnerabilities, but in validating them within business context and prioritizing them against organizational risk.

What I’m telling my team: Your value isn’t in being faster than AI at running exploits. Your value is in understanding what a vulnerability means for our specific business, how it chains with other weaknesses, and what realistic attack scenarios exist given our threat model.

If you can’t articulate business impact and remediation priority better than an AI agent reading CVSS scores, you need to upskill urgently.

The Skills To Hire For in Cybersecurity Going Forward

When I’m reviewing resumes now, here’s what I’m looking for:

Red flags (AI-replaceable skills):

  • “Expert in vulnerability scanning tools”
  • “Extensive experience with automated testing frameworks”
  • “Proficient in running Metasploit/Burp/etc.”

These are fine to have, but they’re not differentiators anymore.

Green flags (AI-resistant skills):

  • “Discovered novel authentication bypass in OAuth implementation by understanding business logic intent vs. specification”
  • “Chained three medium-severity findings into critical-impact attack scenario based on organizational context”
  • “Developed custom exploitation techniques for previously unknown attack surface”
  • “Translated technical vulnerability findings into business risk language for executive stakeholders”
  • “Experience orchestrating AI/automated tools within security workflows”

Notice the difference? It’s not about knowing tools—it’s about applying creative thinking, contextual understanding, and strategic judgment that AI agents don’t have yet.

What Your Team Should Be Doing Monday Morning

If you’re a security leader reading this, here’s my recommendation for your next team meeting:

1. Acknowledge the Reality

Don’t sugarcoat it. AI agents cost $18/hour and are already competitive with professional pentesters on systematic vulnerability discovery. Your team needs to understand the competitive landscape they’re operating in.

2. Reframe the Value Proposition

Your team’s value isn’t in discovering vulnerabilities anymore—it’s in:

  • Understanding which vulnerabilities matter in your specific business context
  • Developing novel exploitation techniques for your unique attack surfaces
  • Providing strategic guidance that connects technical findings to business risk
  • Explaining to non-technical stakeholders what findings actually mean

3. Invest in Differentiation

Allocate training budget toward:

  • Advanced exploitation techniques
  • Business logic vulnerability research
  • Threat intelligence and adversary tradecraft analysis
  • Communication and risk articulation skills
  • AI/ML security (both attacking and defending AI systems)

4. Experiment with Hybrid Models

Run a pilot: Use open-source AI agents (ARTEMIS is public) for reconnaissance on a non-critical internal application. Have your team do the same manually. Compare results, cost, and time investment.

Then discuss: Where did AI excel? Where did humans add unique value? How do we structure workflows that leverage both?

5. Build AI Literacy

Your team needs hands-on experience with AI agents to understand their capabilities and limitations. This isn’t theoretical anymore—these tools exist and adversaries are using them. Your team should be proficient in using, configuring, and orchestrating AI security agents.

The Meta-Question: Can We Afford NOT to Adapt?

Here’s what haunts me: While we’re debating whether to adopt AI agents, adversaries are already using them.

Anthropic reported nation-state actors leveraging AI in offensive operations. That means somewhere, right now, hostile actors are running AI-powered reconnaissance against targets at scale, at speeds human defenders can’t match.

The question isn’t “should we adopt AI agents in our security program?”

The question is: “Can we afford to defend at human speed against adversaries operating at AI speed?”

I don’t think we can.

The Bottom Line for Security Leaders

If you’re leading a security team in 2026, you need to answer three questions honestly:

1. What work is my team doing that AI agents already do better and cheaper?

If the answer is “a lot,” you have an urgent prioritization problem. That work should be automated now, freeing your human experts for higher-value activities.

2. What capabilities is my team developing that will remain valuable when AI agents mature further?

If the answer is “we’re focused on tool expertise,” you have an urgent skills development problem. Your team needs to specialize in areas where human judgment, context, and creativity remain critical.

3. How am I preparing my team for a future where $18/hour AI agents are baseline capability?

If the answer is “we’re not,” you have an urgent strategic planning problem. The future isn’t coming—it’s here. ARTEMIS exists, it’s open source, and adversaries are adopting these capabilities faster than defenders.

A Personal Note

I’m not writing this as a doomsayer. I’m optimistic about where this goes. But optimism requires preparation.

The security professionals on my team who embrace AI agents as force multipliers, who specialize in areas where human expertise remains critical, who learn to orchestrate hybrid human-AI workflows—they’re going to thrive. They’ll be more effective, more impactful, and more valuable than ever.

The ones who resist, who insist that “AI can’t replace human intuition” while doing work that AI demonstrably does better and cheaper—they’re going to struggle.

I know which team I want to build. I know which team I want to be part of.

The question is: which team are you building?


What’s your organization doing to prepare for AI-augmented offensive security? I’m genuinely curious—find me on LinkedIn and let’s talk about it.

An excellent series on Human Centric AI is on LinkedIN: The Frankenstein Stitch part 2: Why ‘Micro-team’ as Human Navigators Are AI’s True North


Ramping up your product security team for an AI first-world – post 1

Continuous assessment of  AI systems from a cybersecurity perspective is crucial to ensure that any Organisational AI implementations are robust, secure, and resilient against evolving cyber threats. 

In a rapidly evolving digital landscape, the integration of Artificial Intelligence (AI) is becoming increasingly mainstream. As businesses harness the power of AI to enhance their products and services, the role of Product Security teams has never been more critical. 

These teams must continually ramp up their skills and adapt to the dynamic AI-first environment to safeguard against potential threats. 

This blog post is the first of a three-post series to provide a template for ramping up an organization’s cybersecurity team for an AI-first world.  We will explore why Product Security teams need to stay ahead of the curve and discuss the pivotal role they play in securing AI-driven technologies.

Setting up virtualenvwrapper

Stumbled too many times to get this to work.

Errors and resolutions:

Error:

-bash: /usr/local/bin/python: No such file or directory

virtualenvwrapper.sh: There was a problem running the initialization hooks.

If Python could not import the module virtualenvwrapper.hook_loader,

check that virtualenvwrapper has been installed for

VIRTUALENVWRAPPER_PYTHON=/usr/local/bin/python and that PATH is

set properly.

Resolution:

Add the following line to the ~/.bash_profile file:

export VIRTUALENVWRAPPER_PYTHON=/usr/local/bin/python3

In a MAC, the default path /usr/bin/python is the one that is used by the system installation. For Python installed using brew, the path is /usr/local/bin/python

Employee onboarding at Checkr

I recently joined Checkr as a security engineer and had the opportunity to complete its week-long onboarding program.

In my opinion new employee on boarding is vital to any organization for few simple reasons:

  1. It educational for the new joiners – while they have already decided to jump onboard by accepting the offer, a good onboarding experience helps reinforce the decision.

  2. It helps showcase the value add that the new members will bring to the table and its alignment to the overall mission of the organization.

  3. Life is quite a bit (if not all!) about first impressions and a good onboarding program is just that. For a new employee, it’s the first true insight to how a company really functions.

While it sounds simple enough its tough to get this right I guess. Before joining Checkr, I have worked for various organizations (large, medium, small)  and had my share of new joiner trainings/sessions/seminars/onboarding programs, etc. A common thread between these programs is that, almost always, these are impersonal. Usually these cover things like setting up the benefits, payroll, computer and other administrative chores like providing a rundown of Dos and Don’ts at the company. Very little emphasis goes into explaining what it means to come onboard and what is needed to be successful in that company. To me onboarding programs are perhaps the most boring, monotonous and impersonal activity one does when starting at a company.

So, I was surprised (pleasantly!) at Checkr!

The onboarding program here is different.  Not only is it laid out in a very thoughtful way  (so much so that curiosity piqued by one session was addressed by the immediate next session – such was cohesion in the flow) but was also very conversational.  It reflects  the core principle of transparency that Checkr works on and for a new joiner provides a great platform to get started here.

This sounds surprising when one considers that Checkr started in 2014 and is currently just around 135 employees. Typically in fast growing startups, the focus is on making the new joiners productive from day one. The idea of making them spend a week learning about the company, its mission, its people and plans sounds strange. But at Checkr, the emphasis on this week-long program comes right from the top of the management chain as evidenced by sessions from the CEO, the CTO and various VPs.

The program is 1 week (5 days) long and covers sessions on each aspect of the company, from how it started to where it wants to go and how.

Usually, most of the new hires here, like myself, have no experience in the background check industry. The onboarding sessions were  perfect introduction to the complex world of background records, the court data management/retrieval systems and the painful inconsistencies in timelines as one moves through state/county lines.

This program helps visualize the direct impact of the technology developed at Checkr on the lives of job seekers across the country.

Few of the interesting aspects of the program for me included:

  1. Sessions with early employees of Checkr and getting to know the first hand perspective of  how the company has grown fast while holding tight to its mission helped set my own perspective about how Checkr works

  2. Two sessions with the Checkr CEO talking about the company roadmap and history. The level of transparency he provides in terms of roadmap, challenges and priorities is amazing.

  3. Everyone of the new joiners (yes – everyone!!) have to complete the NAPBS FCRA Basic Certification. I learned amazingly lot of stuff about the whole BGC industry during the training for this certification.

  4. Best part – I got to go to the courthouse to see first hand how the record retrieval process works in the US court systems. This happens during the last day of the onboarding program and is the rightful amalgamation of all the learnings from the previous 4 days.

In summary, Checkr is on a mission to modernize the background screening industry. To be successful here, each employee has to connect to that mission and understand how the role they  play count towards fulfilling it. Checkr’s onboarding program facilitates this understanding by showcasing how the company functions.

The program made a great impression that will stay with me for a long time.

“please check gdb is codesigned” – macOS Sierra

Running GDB on macOS Sierra failed with the error below:
Starting program: /Users/gaurabb/Desktop/Coding-Projects/CLang/a.out
Unable to find Mach task port for process-id 68306: (os/kern) failure (0x5).
 (please check gdb is codesigned - see taskgated(8))
Steps I followed to address this (based on references at the end):
Step 1: Codesign GDB following steps
Step 1.1
MAC-Err-1
Step 1.2
MAC-Err-2
Step 1.3
MAC-Err-4
Step 1.4
MAC-Err-4
Step 1.5
MAC-Err-6
Step 1.6
MAC-Err-6
Step 2: 
    Step 2.1: Create a file named .gdbinit in the /Users/<username>
    Step 2.2: Add the following to the file:  set startup-with-shell off
                    This disables use of a shell that GDB uses to start the program in Unix based systems
Step 3: Open a terminal and run:
 sudo killall taskgated
taskgated is a system daemon that implements a policy for the task_for_pid system service.  When the kernel is asked for the task port of a process, and preliminary access control checks pass, it invokes this daemon (via launchd) to make the decision.
Step 4: In the terminal, run:
 codesign -f -s "gdb-cert" /usr/local/bin/gdb 
These steps  should address the error.

References:

Concurrency tidbit to GO

Consider the code snippet below- it creates a chat room struct with the following fields:

  1. a channel (messageFlow) to forward incoming messages to the room
  2. a channel (joinChat) to queue clients who want to join the room
  3. a channel (quitChat) for clients who want to leave the room
  4. a map of clients who are currently in the room

 

type room struct{
      messageFlow chan []byte
     // joinChat - channel for clients wanting to join the chat
     joinChat chan *client
     // quitChat - channel for clients wanting to leave a room
     quitChat chan *client
     // clients - a map object that holds all current clients in a room
     currentClients map[*clients] bool
}

Quick GO channel refresher – Channels are a typed conduit through which we can send and receive values with the channel operator, “<-“. All channels must be created before use. And by default, sends and receives block until the other side is ready. This allows goroutines to synchronize without explicit locks or condition variables.
More details on channels in GO is here: https://tour.golang.org/concurrency/2

 

Quick GO map refresher – A map maps keys to values. More on maps in GO – https://tour.golang.org/moretypes/19
The usual problem with the code like one above is that it is possible that two goroutines may try to modify the map at the same time thus resulting in an unpredictable state for the currentClients map.

To help mitigate this kind of setup, GO provides a powerful statement called select. As defined here (https://tour.golang.org/concurrency/5)The select statement lets a goroutine wait on multiple communication operations. Select statement can be used whenever we need to perform some operations on shared memory or actions that depend on various activities within the channels.

To address the case in the context of the code snippet above, we can use the select statement to monitor the channels: messageFlow, joinChat, quitClients. As and when a message arrives in any of the of the channels, the select statement will run the code for that particular case. Only the case related to any one channel will be run at any particular time – thus helping synchronize the operations. The select code will look something like:

::::::::::::::
    select{
    case client := <- room.joinChat:
        //do something to allow the client to join in
    case client := <-room.quitChat:
        //do domething to allow the client to leave
    case chatMsg := <- room.messageFlow
}
::::::::::::::

This code should run indefinitely in the background (as goroutines) till the chat program is terminated.

References:
1) GO Programming Blueprints – Mat Ryer
2) https://tour.golang.org/

AWS Boto – Key pair creation – Regions matter!!

I was trying to create an EC2 key-pair using AWS Python SDK’s (Boto) create_key_pair() method, something like:

key_name = 'BlockChainEC2InstanceKeyPair-1'    
def create_new_key_pair(key_name):
    newKey = objEC2.create_key_pair(key_name)
    newKey.save(dir_to_save_new_key)

The keys are created as expected because I was able to fetch the keys using Boto’s get_all_key_pairs() method like below:

def get_all_keypairs():
    try:
         key= objEC2.get_all_key_pairs()
    except:
        raise

The get_all_key_pairs() method returns the result like below showing that the key pair exists:

<DescribeKeyPairsResponse xmlns="http://ec2.amazonaws.com/doc/2014-10-01/">
    <requestId>8d3faa7d-70c2-4b7c-ad18-810f23230c22</requestId>
    <keySet>
        <item>
            <keyName>BlockChainEC2InstanceKeyPair-1</keyName>
            <keyFingerprint>30:51:d4:19:a5:ba:11:dc:7e:9d:ca:49:10:01:30:34:b5:7e:9b:8a</keyFingerprint>
        </item>
        <item>
            <keyName>BlockChainEC2InstanceKeyPair-1.pem</keyName>
            <keyFingerprint>18:7e:ba:2c:44:67:44:a7:06:c4:68:3a:47:00:88:8f:31:98:27:e6</keyFingerprint>
        </item>
    </keySet>
</DescribeKeyPairsResponse>

The problem was that when I logged onto my AWS console of the same account whose access keys I used to create the key pairs – I don’t get to see the newly created keys.

I posted this question to the ever helpful folks at Stack Overflow (here).

Based on the response I realized that Boto was creating the keys in its default configured region of US East while I was defaulting to US West when I log in to the AWS console.  I was able to view the newly created keys when I changed the region in my AWS console [EC2 >> Key Pairs].

The fix was to add the following code snippet to the boto.cfg file:

[Boto]
ec2_region_name = us-west-2

 

ISC2 Certified Cloud Security Professional (CCSP) – My take

I recently passed ISC2’s Certified Cloud Security Professional (CCSP) certification.
While preparing for the certification I found that there are hardly any reviews shared by individuals who had already taken the test for the benefit of ones who plan to take it and want to get a test taker’s perspective.
So, here is my take in a QA format.

How long did I prepare for the exam?

Focussed study of around 40 hours spread over 4 weeks.
I already have following credentials that helped a lot in covering major aspects of the materials covered in CCSP:
  1. Cloud Security Alliance’s  CCSK
  2. ISC2 – CISSP
  3. I have more than 10 years of Software/Cloud Security Engineering and related professional experience.

What materials did I use for preparation?

1) The Official CBK – the first edition. I read a lot of bad reviews about the book but as far as providing relevant information goes, I found this book to be enough.
2) CCSK V3 Prep guide: I did read this for the following 4 domains:
  1. Architecture
  2. Operations
  3. Platform anf Infrastructure
  4. Data Security
Just this will not be enough to clear the CCSP exam but its good, quick “ a day before the exam” kind of refresher.

Is the exam worth the time and money?

Its not a hands on exam and rather checks the theoretical understanding of the concepts of Cloud engineering and the ability to apply those concepts to answer scenarios based questions.
In my opinion that theory and concept should always precede actual hands on work, and so yes this is a worthy investment.