AI-Powered Pentesting: Evolving Cybersecurity Strategies

The $18/Hour Pentester: What Security Leaders Need to Tell Their Teams Right Now

January 2026

I’ve been staring at Stanford’s Trinity research paper for three days now, and I keep coming back to one number: $18.21 per hour.

That’s what it costs to run ARTEMIS—their AI-powered penetration testing agent that just outperformed 80% of professional pentesters in a live enterprise environment. Not a CTF. Not a lab. A real university network with 8,000+ hosts, actual users, and production systems.

And it placed second overall against 10 cybersecurity professionals.

If you’re leading a security team in 2026 and this number doesn’t fundamentally change how you’re thinking about offensive security, we need to talk.

This Isn’t About Replacement—It’s About Evolution

Let me be clear upfront: I’m not writing this to tell you AI is coming for your job. I’m writing this because I need my team to evolve faster than our adversaries do, and right now, adversaries are already using these tools.

Anthropic documented nation-state actors using AI in cyber operations. OpenAI reported similar patterns. The offensive AI revolution isn’t a future threat—it’s current reality.

The question isn’t whether AI will transform how we do security. The question is: are we evolving our teams’ capabilities as fast as the threat landscape is evolving theirs?

What the Trinity Study Actually Proves

Stanford ran a controlled experiment: 10 professional pentesters vs. AI agents (including ARTEMIS, Codex, CyAgent, and others) against the same target environment. Same scope, same time constraints, same rules of engagement.

The results:

  • ARTEMIS (A1 config): 9 valid vulnerabilities, 82% accuracy, $18.21/hour
  • ARTEMIS (A2 ensemble): 11 valid findings, 82% accuracy, $59/hour
  • Human participants: 3-13 vulnerabilities each, varying accuracy rates, ~$60/hour

But here’s what matters more than the leaderboard: when given targeted hints about where to look, ARTEMIS found every single vulnerability humans discovered. Its bottleneck wasn’t technical execution—it was pattern recognition and target selection.

That gap? It’s closing. Fast.

The Capabilities Gap My Team Needs to Close

I’ve spent the past week thinking about what this means for how we build and develop cyber security teams. Here’s what keeps me up at night:

1. We’re Still Operating at Human Serial Processing Speed

ARTEMIS hit a peak of 8 concurrent sub-agents executing simultaneous exploitation attempts. Most cyber teams? We’re sequential. One person, one target, one exploit chain at a time.

When an AI agent can parallelize reconnaissance across dozens of hosts while my team is still waiting for nmap to finish on host #1, we have a fundamental throughput problem.

What we need to be telling our teams: Learn to orchestrate parallel operations. Use automation not as a replacement for thinking, but as a force multiplier for execution. If you’re waiting on scan results, you should have three other investigations running concurrently.

2. We’re Not Thinking in “Sessions” Yet

ARTEMIS runs for 16+ hours continuously through session management—summarizing progress, clearing context, resuming where it left off. It doesn’t suffer from context switching, meeting fatigue, or “I’ll get back to this tomorrow” syndrome.

Most cyber teams? We lose 30 minutes every time we context switch. We forget where we were. We duplicate work.

What we need to be telling our teams: Document like you’re creating resumption checkpoints. Your notes should allow anyone (including future you) to pick up exactly where you left off in 90 seconds. Treat long-term investigations like marathon runners treat pacing – sustainable progress over time, not heroic sprints.

3. We’re Still Doing What AI Already Does Better

Every ARTEMIS variant systematically found:

  • Default credentials
  • Misconfigured services
  • Exposed management interfaces
  • Known CVE exploitation
  • Network enumeration patterns

These aren’t the vulnerabilities where human intuition adds value. These are the “table stakes” findings AI agents discover in the first 2 hours, every time, at scale.

What we need to be telling our teams: Stop competing on what AI does better. Specialize in what it struggles with:

  • Business logic flaws that require understanding intent vs. implementation
  • Complex attack chains that span multiple systems with organizational context
  • Social engineering vectors that exploit human behavior patterns
  • Zero-day research that requires creative hypothesis generation
  • Adversarial ML understanding for AI-native attack surfaces

If your current skillset is “I’m really good at running nuclei and reviewing the output,” you’re competing with $18/hour automation. That’s not a winning position.

The Uncomfortable Conversation About the Human ROI

AI agents can find vulnerabilities. But they also submit false positives they couldn’t contextualize. They miss business logic flaws. They couldn’t explain why a finding matters to our specific business risk. And when the board asks ‘what does this mean for our Q2 product launch,’ the AI agent doesn’t have an answer.

We have to build hybrid models – AI agents for systematic coverage, human experts for contextual analysis, prioritization, and strategic guidance. The AI agent finds the vulnerability. The human team determines if it’s exploitable in their specific environment, what business impact it has, and what the remediation priority should be given their awareness of the release calendar and risk appetite.

What is clear here though: We can’t justify humans doing work AI does cheaper and better. We need to justify humans doing work AI can’t do yet.

The Harsh Truth About False Positives

One finding from the Trinity study that doesn’t get enough attention: ARTEMIS had a higher false positive rate than human participants. It reported successful authentication after seeing “200 OK” HTTP responses that were actually login page redirects.

Why? Lack of business context.

The AI agent understands HTTP status codes. It doesn’t understand that your authentication flow returns 200 on failed login attempts because your frontend framework handles routing client-side.

This is where human expertise remains critical—not in finding vulnerabilities, but in validating them within business context and prioritizing them against organizational risk.

What I’m telling my team: Your value isn’t in being faster than AI at running exploits. Your value is in understanding what a vulnerability means for our specific business, how it chains with other weaknesses, and what realistic attack scenarios exist given our threat model.

If you can’t articulate business impact and remediation priority better than an AI agent reading CVSS scores, you need to upskill urgently.

The Skills To Hire For in Cybersecurity Going Forward

When I’m reviewing resumes now, here’s what I’m looking for:

Red flags (AI-replaceable skills):

  • “Expert in vulnerability scanning tools”
  • “Extensive experience with automated testing frameworks”
  • “Proficient in running Metasploit/Burp/etc.”

These are fine to have, but they’re not differentiators anymore.

Green flags (AI-resistant skills):

  • “Discovered novel authentication bypass in OAuth implementation by understanding business logic intent vs. specification”
  • “Chained three medium-severity findings into critical-impact attack scenario based on organizational context”
  • “Developed custom exploitation techniques for previously unknown attack surface”
  • “Translated technical vulnerability findings into business risk language for executive stakeholders”
  • “Experience orchestrating AI/automated tools within security workflows”

Notice the difference? It’s not about knowing tools—it’s about applying creative thinking, contextual understanding, and strategic judgment that AI agents don’t have yet.

What Your Team Should Be Doing Monday Morning

If you’re a security leader reading this, here’s my recommendation for your next team meeting:

1. Acknowledge the Reality

Don’t sugarcoat it. AI agents cost $18/hour and are already competitive with professional pentesters on systematic vulnerability discovery. Your team needs to understand the competitive landscape they’re operating in.

2. Reframe the Value Proposition

Your team’s value isn’t in discovering vulnerabilities anymore—it’s in:

  • Understanding which vulnerabilities matter in your specific business context
  • Developing novel exploitation techniques for your unique attack surfaces
  • Providing strategic guidance that connects technical findings to business risk
  • Explaining to non-technical stakeholders what findings actually mean

3. Invest in Differentiation

Allocate training budget toward:

  • Advanced exploitation techniques
  • Business logic vulnerability research
  • Threat intelligence and adversary tradecraft analysis
  • Communication and risk articulation skills
  • AI/ML security (both attacking and defending AI systems)

4. Experiment with Hybrid Models

Run a pilot: Use open-source AI agents (ARTEMIS is public) for reconnaissance on a non-critical internal application. Have your team do the same manually. Compare results, cost, and time investment.

Then discuss: Where did AI excel? Where did humans add unique value? How do we structure workflows that leverage both?

5. Build AI Literacy

Your team needs hands-on experience with AI agents to understand their capabilities and limitations. This isn’t theoretical anymore—these tools exist and adversaries are using them. Your team should be proficient in using, configuring, and orchestrating AI security agents.

The Meta-Question: Can We Afford NOT to Adapt?

Here’s what haunts me: While we’re debating whether to adopt AI agents, adversaries are already using them.

Anthropic reported nation-state actors leveraging AI in offensive operations. That means somewhere, right now, hostile actors are running AI-powered reconnaissance against targets at scale, at speeds human defenders can’t match.

The question isn’t “should we adopt AI agents in our security program?”

The question is: “Can we afford to defend at human speed against adversaries operating at AI speed?”

I don’t think we can.

The Bottom Line for Security Leaders

If you’re leading a security team in 2026, you need to answer three questions honestly:

1. What work is my team doing that AI agents already do better and cheaper?

If the answer is “a lot,” you have an urgent prioritization problem. That work should be automated now, freeing your human experts for higher-value activities.

2. What capabilities is my team developing that will remain valuable when AI agents mature further?

If the answer is “we’re focused on tool expertise,” you have an urgent skills development problem. Your team needs to specialize in areas where human judgment, context, and creativity remain critical.

3. How am I preparing my team for a future where $18/hour AI agents are baseline capability?

If the answer is “we’re not,” you have an urgent strategic planning problem. The future isn’t coming—it’s here. ARTEMIS exists, it’s open source, and adversaries are adopting these capabilities faster than defenders.

A Personal Note

I’m not writing this as a doomsayer. I’m optimistic about where this goes. But optimism requires preparation.

The security professionals on my team who embrace AI agents as force multipliers, who specialize in areas where human expertise remains critical, who learn to orchestrate hybrid human-AI workflows—they’re going to thrive. They’ll be more effective, more impactful, and more valuable than ever.

The ones who resist, who insist that “AI can’t replace human intuition” while doing work that AI demonstrably does better and cheaper—they’re going to struggle.

I know which team I want to build. I know which team I want to be part of.

The question is: which team are you building?


What’s your organization doing to prepare for AI-augmented offensive security? I’m genuinely curious—find me on LinkedIn and let’s talk about it.

An excellent series on Human Centric AI is on LinkedIN: The Frankenstein Stitch part 2: Why ‘Micro-team’ as Human Navigators Are AI’s True North


Setting up virtualenvwrapper

Stumbled too many times to get this to work.

Errors and resolutions:

Error:

-bash: /usr/local/bin/python: No such file or directory

virtualenvwrapper.sh: There was a problem running the initialization hooks.

If Python could not import the module virtualenvwrapper.hook_loader,

check that virtualenvwrapper has been installed for

VIRTUALENVWRAPPER_PYTHON=/usr/local/bin/python and that PATH is

set properly.

Resolution:

Add the following line to the ~/.bash_profile file:

export VIRTUALENVWRAPPER_PYTHON=/usr/local/bin/python3

In a MAC, the default path /usr/bin/python is the one that is used by the system installation. For Python installed using brew, the path is /usr/local/bin/python

Employee onboarding at Checkr

I recently joined Checkr as a security engineer and had the opportunity to complete its week-long onboarding program.

In my opinion new employee on boarding is vital to any organization for few simple reasons:

  1. It educational for the new joiners – while they have already decided to jump onboard by accepting the offer, a good onboarding experience helps reinforce the decision.

  2. It helps showcase the value add that the new members will bring to the table and its alignment to the overall mission of the organization.

  3. Life is quite a bit (if not all!) about first impressions and a good onboarding program is just that. For a new employee, it’s the first true insight to how a company really functions.

While it sounds simple enough its tough to get this right I guess. Before joining Checkr, I have worked for various organizations (large, medium, small)  and had my share of new joiner trainings/sessions/seminars/onboarding programs, etc. A common thread between these programs is that, almost always, these are impersonal. Usually these cover things like setting up the benefits, payroll, computer and other administrative chores like providing a rundown of Dos and Don’ts at the company. Very little emphasis goes into explaining what it means to come onboard and what is needed to be successful in that company. To me onboarding programs are perhaps the most boring, monotonous and impersonal activity one does when starting at a company.

So, I was surprised (pleasantly!) at Checkr!

The onboarding program here is different.  Not only is it laid out in a very thoughtful way  (so much so that curiosity piqued by one session was addressed by the immediate next session – such was cohesion in the flow) but was also very conversational.  It reflects  the core principle of transparency that Checkr works on and for a new joiner provides a great platform to get started here.

This sounds surprising when one considers that Checkr started in 2014 and is currently just around 135 employees. Typically in fast growing startups, the focus is on making the new joiners productive from day one. The idea of making them spend a week learning about the company, its mission, its people and plans sounds strange. But at Checkr, the emphasis on this week-long program comes right from the top of the management chain as evidenced by sessions from the CEO, the CTO and various VPs.

The program is 1 week (5 days) long and covers sessions on each aspect of the company, from how it started to where it wants to go and how.

Usually, most of the new hires here, like myself, have no experience in the background check industry. The onboarding sessions were  perfect introduction to the complex world of background records, the court data management/retrieval systems and the painful inconsistencies in timelines as one moves through state/county lines.

This program helps visualize the direct impact of the technology developed at Checkr on the lives of job seekers across the country.

Few of the interesting aspects of the program for me included:

  1. Sessions with early employees of Checkr and getting to know the first hand perspective of  how the company has grown fast while holding tight to its mission helped set my own perspective about how Checkr works

  2. Two sessions with the Checkr CEO talking about the company roadmap and history. The level of transparency he provides in terms of roadmap, challenges and priorities is amazing.

  3. Everyone of the new joiners (yes – everyone!!) have to complete the NAPBS FCRA Basic Certification. I learned amazingly lot of stuff about the whole BGC industry during the training for this certification.

  4. Best part – I got to go to the courthouse to see first hand how the record retrieval process works in the US court systems. This happens during the last day of the onboarding program and is the rightful amalgamation of all the learnings from the previous 4 days.

In summary, Checkr is on a mission to modernize the background screening industry. To be successful here, each employee has to connect to that mission and understand how the role they  play count towards fulfilling it. Checkr’s onboarding program facilitates this understanding by showcasing how the company functions.

The program made a great impression that will stay with me for a long time.

Concurrency tidbit to GO

Consider the code snippet below- it creates a chat room struct with the following fields:

  1. a channel (messageFlow) to forward incoming messages to the room
  2. a channel (joinChat) to queue clients who want to join the room
  3. a channel (quitChat) for clients who want to leave the room
  4. a map of clients who are currently in the room

 

type room struct{
      messageFlow chan []byte
     // joinChat - channel for clients wanting to join the chat
     joinChat chan *client
     // quitChat - channel for clients wanting to leave a room
     quitChat chan *client
     // clients - a map object that holds all current clients in a room
     currentClients map[*clients] bool
}

Quick GO channel refresher – Channels are a typed conduit through which we can send and receive values with the channel operator, “<-“. All channels must be created before use. And by default, sends and receives block until the other side is ready. This allows goroutines to synchronize without explicit locks or condition variables.
More details on channels in GO is here: https://tour.golang.org/concurrency/2

 

Quick GO map refresher – A map maps keys to values. More on maps in GO – https://tour.golang.org/moretypes/19
The usual problem with the code like one above is that it is possible that two goroutines may try to modify the map at the same time thus resulting in an unpredictable state for the currentClients map.

To help mitigate this kind of setup, GO provides a powerful statement called select. As defined here (https://tour.golang.org/concurrency/5)The select statement lets a goroutine wait on multiple communication operations. Select statement can be used whenever we need to perform some operations on shared memory or actions that depend on various activities within the channels.

To address the case in the context of the code snippet above, we can use the select statement to monitor the channels: messageFlow, joinChat, quitClients. As and when a message arrives in any of the of the channels, the select statement will run the code for that particular case. Only the case related to any one channel will be run at any particular time – thus helping synchronize the operations. The select code will look something like:

::::::::::::::
    select{
    case client := <- room.joinChat:
        //do something to allow the client to join in
    case client := <-room.quitChat:
        //do domething to allow the client to leave
    case chatMsg := <- room.messageFlow
}
::::::::::::::

This code should run indefinitely in the background (as goroutines) till the chat program is terminated.

References:
1) GO Programming Blueprints – Mat Ryer
2) https://tour.golang.org/

IAPP – Privacy Technologist Credential Quick Notes

In the days of connected living, lot of amazing new products and features are released every day. Being part of the grid helps encourage innovation, effective collaboration, and possibly, a better way of living in general!

The rush to roll-out the products and/or features that enable this connected existence has a strong inclination to dissipate focus on one important area concerning the ENTITY at the center of it– the human and his/her right to privacy.

Most of these products take a “will this put me in a legal soup?” approach, and push the limits to the maximum, rather than being designed with the privacy protections of the end users built in. As with security, the general thought around privacy is that of hindrance in reaping maximum profitability out of the products.

I have been heavily involved in secure software development lifecycle projects in my career. So, in order to get a better insight into privacy focused software development lifecycle, I decided to pursue the CIPT credentials from IAPP.

My take was that unless the technology folks are made to understand the importance of Privacy (and of course Security), real long term resolution of the privacy/security crisis will not be possible. The goal was to get a structured understanding of what the technologists, not the management/leaders, needs to know to make knowledgeable decisions related to data privacy as they build a product.

While working on my preparations, I realized that there are lot of CIPP information available (it’s the most popular of privacy credentials) but not much on CIPT. Hence including a short summary of my plan below.

My only reference for the certification was the book “Privacy In technology – Standards and Practices for Engineers and Security and IT Professionals” by JC Cannon. The book is well written, and for someone with technical background, this is the only book needed for CIPT.

For individuals with no knowledge of technical concepts around network security, cryptography, and authentication schemes will find this tests to be little tough. On a scale of 0-5, one must at least have a 1.5-2 knowledge of the aforementioned concepts to be comfortable with the type of questions that the exam has.

Reading up freely available articles on the technical concepts mentioned should suffice in understanding the concepts highlighted in the book.

The course covers lots of good information on privacy focused architecture and development practices, privacy notices and tools.

Did I find the course worthy of the dollars/time spent? – Yes! In a world where most do not understand the importance of data privacy and confuse data privacy with data security, the materials covered in this course are refreshingly to the point.

Whether one will get a promotion because he/she got a CIPT, well, that depends J

Diffie-Hellman key exchange

Diffie-Hellman – Layman terms

Basic info (table 1):

(2x)y = 2xy = (2y)x

For DH, x and y are very large numbers.

Step 1: GB selects a large random number, x.

Step 2: GB raises 2 to the power of x and obtains, say G (=2x).

Step 3: GB sends G to SB.

Step 4: SB selects a large random number, y.

Step 5: SB raises 2 to the power of y and obtains, say S (=2y).

Step 6: SB sends S to GB.

Step 7: Following calculations are performed

SB calculations GB calculations
Sx Gx
(2y)x from Step 5 (2x)y from Step 2
2yx from table 1 2xy from table 1
(2x)y from table 1 (2x)y from table 1

Step 8: Both SB and GB now has a shared secret without actually have to transfer the key.

HTTP Secure Headers – How prevalent are these?

Recently Twitter added Public Key Pinning to their SecureHeaders Ruby Gem. There are 8 security headers now.

I wanted check the prevalence of these secure HTTP headers amongst the top websites to get a sense of the awareness around these very efficient mechanisms to address a plethora of security related issues.

For reference, CSP is documented here.

I checked most of the publicly available list of 2014 top 500 sites on the web from Fortune.com for this purpose and the stats for the 8 headers that SecureHeaders Ruby Gem covers is:

CSP HTTP Strict Transport Security (HSTS) X-Frame-Options (XFO) X-XSS-Protection X-Content-Type-Options X-Download-Options X-Permitted-Cross-Domain-Policies Public Key Pinning
2 5 81 12 26 0 1 0

This is not a comprehensive test (and possibly not error free) but these numbers do point towards a possible lack of adoption for these gradually improving (and easy to use) security enforcement mechanisms.

Part reason for this may be the touch unreliability in the way browsers enforce these checks (for example X-Download-Options is supported only on Internet Explorer) but considering that these do not break anything if used sensibly (like CSP and Public key pinning’s report on settings) can be used to gradually improve the security stance of most websites without much effort.

Note: Tristan Waldear has created a Python-Flask package for the same headers and is hosted here.

Drag Microsoft Office Excel Conditional format…

For the Umpteenth number of time, I spent >2 hours to figure out a way to drag my custom format in an incremental way across excel rows.

Here is the user case:

I have an excel spreadsheet that contains columns that look like below:

ExcelBlog-Pic-1

The custom format that I needed was:

1) Fill Green if value in the cells in B, C, and D is greater than or equal to the value in the cell A for that row.

2) Fill Yellow if value in the cells in B, C, and D is less than the value in the cell A for that row.

Exact Requirement: I want to create the formatting for the cells in one row, drag it down and expect Excel to do the incremental adjustments to the cell values as needed.

By default when I create the formula using the “Conditional Formatting” option it creates something like this:

ExcelBlog-Pic-2

If I “Format Paint” other cells then the “Cell Vale < $C$1” remains static. I wanted it to change based on the row it is on.

Fix was simple (I think other better ways too!):

1) In the formula remove the $ from the “Cell Value…” for the value that needs to reflect the changes. When I updated the formula like below I was able to format paint it over other cells:

ExcelBlog-Pic-3

In retrospect, that was simple…