Real-Time Input Risk Detection and Adversarial Testing Platform for GenAI

Dr. Michael Rivera and his team are developing SentinelPrompt, a framework designed to strengthen the safety and reliability of generative AI systems. The project focuses on identifying vulnerabilities and improving defenses against adversarial prompts and manipulative user inputs.  By integrating insights from human-AI interaction research with practical defense mechanisms, the team aims to create tools that enhance AI security, governance, and resilience—laying the groundwork for safer use of AI in defense and other high-stakes settings.

Background

As enterprises rapidly adopt generative AI, they face a growing threat from prompt injection attacks—malicious or manipulative inputs that cause AI systems to produce unsafe, biased, or confidential outputs.

Current defenses often focus on filtering outputs rather than detecting risky inputs, leaving organizations vulnerable to data leakage, reputational damage, and compliance failures in regulated sectors like healthcare, finance, and law.

Technology Overview

Developed by researchers at Lehigh University, SentinelPrompt is a risk management platform that safeguards generative AI systems by addressing vulnerabilities at the input stage. The system integrates two components:

  • SentinelScan – a real-time API that evaluates incoming prompts using behavioral and linguistic risk signals such as emotional entropy and linguistic concreteness.
  • SentinelPenTest – a simulation suite that enables organizations to test their AI systems against adversarial scenarios and identify weaknesses before deployment.

Grounded in empirical research (Emotional Agents Research Study) analyzing over 5,000 real-world prompt injection attempts, SentinelPrompt translates behavioral science insights into a scalable, proactive defense. It integrates seamlessly into enterprise workflows with customizable policies, scoring, and reporting features.

Benefits

  • Proactive defense: Detects high-risk prompts before harmful outputs are generated
  • Behavioral-science foundation: Uses validated features (emotional diversity, concreteness) that correlate with adversarial success
  • Customizable policies: Allows organizations to set rules for warning, rewriting, or blocking prompts
  • Enterprise-ready: Supports API integration, real-time monitoring, and compliance reporting
  • Dual-layer protection: Combines real-time scanning with adversarial penetration testing

Applications

  • Healthcare – preventing the disclosure of sensitive patient data and ensuring HIPAA compliance
  • Finance – safeguarding confidential financial models and regulatory reporting
  • Legal & Compliance – protecting attorney–client privileged information and sensitive case records
  • Enterprise AI deployment – securing internal chatbots, digital assistants, and knowledge management systems
  • AI governance & auditing – providing measurable safety metrics for regulators and policymakers