Cognitive Evaluation Lead

Lead the design and execution of evaluation frameworks that ensure AI systems are trustworthy, reliable, and aligned with human intent.

COGNITIVE EVALUATION LEAD

Rational Exponent is seeking a strategic Cognitive Evaluation Lead to transform our AI evaluation practice from reactive to systematic. This role owns the evaluation strategy for all AI cognitive functions, designs experiments to test and optimize prompts and configurations, and provides clear, actionable recommendations that drive measurable improvements in our AI features.

TECH STACK

Python, LangChain, LlamaIndex, MLflow, Svelte/SvelteKit/TypeScript, MongoDB, Qdrant, FastAPI, Kubernetes, Terraform, AWS (EKS, Lambda, S3, Bedrock, etc), Azure Cognitive Services, REST, GraphQL, OpenAI and HuggingFace APIs, Anthropic Claude API, scikit-learn, pandas, numpy, prompt engineering frameworks, evaluation libraries, A/B testing tools, statistical analysis tools.

RESPONSIBILITIES

Create rigorous experiments to test prompt variations, hyperparameters, and agentic tooling configurations
Define success criteria and quality gates for AI features before development begins
Interpret evaluation results and identify systematic patterns in failures and successes
Make data-driven go/no-go decisions on feature readiness
Drive prompt engineering improvements based on systematic testing and iteration
Recommend specific changes to cognitive functions: prompt adjustments, parameter tuning, tool selection
Provide statistical rigor to experiment design (sample sizes, significance testing, holdout sets)
Transform metrics into actionable insights with clear next steps for developers
Lead weekly evaluation standups and present findings to stakeholders
Mentor team members on evaluation best practices and ML principles
Document evaluation frameworks and build institutional knowledge

QUALIFICATIONS

Advanced degree in Computer Science, Machine Learning, Statistics, or related field (MS/PhD preferred)
3+ years of hands-on experience with large language models and prompt engineering
Strong background in applied machine learning, particularly NLP or generative AI
Deep understanding of evaluation methodologies: metrics selection, dataset design, statistical testing
Experience with experiment design: A/B testing, hyperparameter optimization, systematic variation testing
Proficiency with LLM APIs and prompt engineering frameworks
Strong programming skills in Python and ML libraries
Practical experience optimizing AI systems for production use cases
Understanding of agentic AI architectures: tool use, function calling, multi-step reasoning
Proven ability to translate technical findings into clear, actionable recommendations
Strong decision-making skills: comfortable making go/no-go calls based on data
Excellent communication and cross-functional collaboration abilities
Self-directed and proactive problem solver
Experience building evaluation infrastructure or MLOps tooling desirable
Background in RLHF or constitutional AI desirable
Published research or blog posts on LLM evaluation or prompt engineering desirable

ABOUT RATIONAL EXPONENT

Rational Exponent is a new company with an experienced team with a track record of successfully building, scaling, and exiting enterprise software and services companies with a particular focus on finance, banking, capital markets, healthcare, and other highly regulated industries. Our mission is to provide the tools and controls that will allow complex, regulated entities to confidently deploy applications based on state-of-the-art foundation model AI. We develop systems and frameworks to allow our customers to create trustworthy cognitive applications that control risks, evidence compliance and deliver business value.

COMPREHENSIVE BENEFITS THAT SUPPORT YOU AND YOUR FAMILY

We provide a competitive benefits package designed to support your health, financial future, and work-life balance including multiple national medical, dental (with a $0 employee option), and vision plans with company premium contributions, tax-advantaged accounts such as an HSA, Healthcare FSA, and Dependent Care FSA, and life and disability coverage, all within a flexible, remote-first work environment.

Job Details

Job Type: Full-time
Salary: competitive + equity
Vacancy: 1 Person
Years of Experience: 8+
Education: Bachelor's Degree
Deadline: 2026-12-31.