Two years ago at OWASP Global AppSec, I presented a vision for a framework to both evaluate and train security testers, introducing a structured curriculum and objective metrics to a historically arcane field. After two years of careful contemplation and – more recently – a few months of an almost non-stop grind working on the implementation, I will finally be sharing the project.
This project, which I am about to present today at OWASP Global AppSec in Vienna (I am currently sitting in the OpenCRE talk), is called the OSASTS: Objective Structured Assessment of Security Tester Skills. The name is derived from its inspiration in the domain of surgical training. This would be the OSATS, which is an exercise where an experienced surgeon observes a trainee’s operation (or simulation) and evaluates their adherence to the proper procedures as well as their performance of the skills involved. This approach has been evaluated empirically and has been found to be an effective way to evaluate skills and provide feedback. The exercise can be both formative (develop knowledge/skills) and summative (assess knowledge/skills).
Applying this approach to the training of security testers was inspired largely by the gap that can be observed between how we train security testers and how more mature professions (such as medicine) train and evaluate their practitioners. I experienced this firsthand as a novice (learning) security practitioner in 2016. One particular activity that I do not remember fondly was attempting to solve CTF-style exercises where I was not sufficiently prepared with the knowledge and skills to solve. This consumed a substantial amount of my early training time despite failing to meet the standards of contemporary education science. The OSASTS is different.
Rather than dropping trainees into an environment where they must identify an unknown vulnerability with little or no guidance, the OSASTS requires trainees to apply every appropriate test case within a constrained domain (a full procedure or specific methodology). Rather than receiving feedback limited only to whether or not they solved the exercise, the OSASTS provides the opportunity receive feedback on every test case and how they approached the challenge. This is accomplished with the aid of a relatively novel approach in security tester training: the introduction of an expert observer/mentor.
One of the distinguishing features of OSASTS assessments compared to every other cyber range or CTF-style exercise is the participation of an expert who both observes and optionally guides throughout the exercise. There are a number of benefits to this approach:
The observer can identify knowledge gaps and inefficiencies in both methodology and approach
The observer can confirm whether the trainee completed a test case through intent or incidentally
The observer can actively assist the trainee, filling in gaps to promote learning and successful completion of the exercise.
The observer can also benefit from learning novel approaches from trainees (even as a veteran in the field, I continue to learn from novice testers who apply novel approaches)
Perhaps the greatest challenge in implementing the OSASTS is actually capturing the full range of expected test cases for a specific scenario and then implementing their detection within a simulated environment. To create an environment for a pilot study, I decided to focus on the testing of JSON Web Tokens (JWTs). JWTs are a great candidate because test cases do not vary substantially across applications (in theory). Unfortunately, there still existed a great challenge in enumerating the necessary test cases that should be known and mastered by security testers.
Ideally, comprehensive testing methodology would be captured by a project like the OWASP WSTG. The unfortunate reality is that the project lags behind contemporary security testing practice in many domains, including JWT testing. As this is an open-source project, we are all guilty of producing this reality, but there is only so much time in a day. As is, the WSTG was simply one of the resources that I used to assemble a comprehensive set of test cases, which is captured in our (to be published shortly) application security curriculum.
I am largely satisfied with the set of test cases captured, but our field is substantially limited by the lack of prevalence and incidence data, which makes it very challenging to identify test cases that should be prioritized over test cases that were relevant at one time but will never again produce a legitimate finding. This challenge is compounded by the fact that the prevalence and incidence of various findings likely evolves substantially over time (this is true in fields like medicine, but at a much slower rate of change). Consider, for example, the Psychic Signature vulnerability that exclusively impacts Java 15-18. One day, there won’t be unpatched Java 15-18 systems (I assume).
With test cases assembled, I constructed a lab environment. Unlike a traditional CTF, the environment does not expose vulnerabilities to be identified, or flags to be collected. Instead, it implements detection logic to verify that specific test cases have been applied. When trainees successfully apply a test case, their success is logged and indicated to both them and the observer. Naturally, the test cases are not secret. Trainees are supplied a list of all relevant test cases and should have familiarity with them from prior instruction. The lab itself is kept private (available only during the exercise) as there is still some possibility to over-fit solutions to the lab environment, which should be discouraged.
In designing a pilot study to evaluate the OSASTS, there were two main considerations. First: what do I hope to achieve with this approach? There are a number of specific questions that could be asked, but – at the highest level – there are two:
Can the exercise be used to effectively improve skills?
Can the exercise be used to effectively evaluate skills?
I decided to begin with a focus on (1) and use the platform as an opportunity for trainees to practice skills and obtain feedback to improve them. The second consideration in designing a pilot is simply: what is the purpose of a pilot study? Ultimately, a pilot study is accepted to have the lowest form of methodological and statistical rigor, generally intended to prove nothing other than the feasibility of running an experiment at all. I am OK with that (for now), so I put together a single group pretest-posttest design where the primary study endpoints include trainee self-efficacy evaluation and feedback about the OSASTS exercise itself (both Likert-style).
To save myself the effort of copying and reformatting, here is a slide from my upcoming talk detailing the study design:

A significant challenge in designing experiments in this area is that we are starting from almost nothing. For example, what is the present standard for evaluating tester skills? There is none, so it is non-trivial to implement a pretest-posttest with objective measures to demonstrate changes in skill. Indeed, the OSASTS is intended to solve this problem as well, but we must first address questions of validity (can the exercise measure what we intend to measure: tester skill). Lucky for us, the surgical profession recently navigated these challenges, so we can learn from them here as well (more on that later). For now, we have a simple pilot study.
The results? You will have watch my talk, which I assume will come soon to YouTube following the conference. As for future work, my next task is to plan and conduct a rigorous study that is designed to measure actual objective outcomes from the use of this approach. If you work in the area of application security testing and are able to participate as an expert observer and recruit trainees of your own to participate, please contact me.
Connect
Respond to this email to reach me directly.
Connect with me on LinkedIn.
Follow my YouTube.
RSS feed here.

