Cognitive Evaluation Lead

Lead the design and execution of evaluation frameworks that ensure AI systems are trustworthy, reliable, and aligned with human intent.

COGNITIVE EVALUATION LEAD

Rational Exponent is seeking a strategic Cognitive Evaluation Lead to transform our AI evaluation practice from reactive to systematic. This role owns the evaluation strategy for all AI cognitive functions, designs experiments to test and optimize prompts and configurations, and provides clear, actionable recommendations that drive measurable improvements in our AI features.

TECH STACK

Python, LangChain, LlamaIndex, MLflow, Svelte/SvelteKit/TypeScript, MongoDB, Qdrant, FastAPI, Kubernetes, Terraform, AWS (EKS, Lambda, S3, Bedrock, etc), Azure Cognitive Services, REST, GraphQL, OpenAI and HuggingFace APIs, Anthropic Claude API, scikit-learn, pandas, numpy, prompt engineering frameworks, evaluation libraries, A/B testing tools, statistical analysis tools.

RESPONSIBILITIES

  • Create rigorous experiments to test prompt variations, hyperparameters, and agentic tooling configurations
  • Define success criteria and quality gates for AI features before development begins
  • Interpret evaluation results and identify systematic patterns in failures and successes
  • Make data-driven go/no-go decisions on feature readiness
  • Drive prompt engineering improvements based on systematic testing and iteration
  • Recommend specific changes to cognitive functions: prompt adjustments, parameter tuning, tool selection
  • Provide statistical rigor to experiment design (sample sizes, significance testing, holdout sets)
  • Transform metrics into actionable insights with clear next steps for developers
  • Lead weekly evaluation standups and present findings to stakeholders
  • Mentor team members on evaluation best practices and ML principles
  • Document evaluation frameworks and build institutional knowledge

QUALIFICATIONS

  • Advanced degree in Computer Science, Machine Learning, Statistics, or related field (MS/PhD preferred)
  • 3+ years of hands-on experience with large language models and prompt engineering
  • Strong background in applied machine learning, particularly NLP or generative AI
  • Deep understanding of evaluation methodologies: metrics selection, dataset design, statistical testing
  • Experience with experiment design: A/B testing, hyperparameter optimization, systematic variation testing
  • Proficiency with LLM APIs and prompt engineering frameworks
  • Strong programming skills in Python and ML libraries
  • Practical experience optimizing AI systems for production use cases
  • Understanding of agentic AI architectures: tool use, function calling, multi-step reasoning
  • Proven ability to translate technical findings into clear, actionable recommendations
  • Strong decision-making skills: comfortable making go/no-go calls based on data
  • Excellent communication and cross-functional collaboration abilities
  • Self-directed and proactive problem solver
  • Experience building evaluation infrastructure or MLOps tooling desirable
  • Background in RLHF or constitutional AI desirable
  • Published research or blog posts on LLM evaluation or prompt engineering desirable

ABOUT RATIONAL EXPONENT

Rational Exponent is a new company with an experienced team with a track record of successfully building, scaling, and exiting enterprise software and services companies with a particular focus on finance, banking, capital markets, healthcare, and other highly related industries. Our mission is to provide the tools and controls that will allow complex, regulated entities to confidently deploy applications based on state-of-the-art foundation model AI. We develop systems and frameworks to allow our customers to create trustworthy cognitive applications that control risks, evidence compliance and deliver business value.

Job Details


  • Job Type: Full-time
  • Salary: competitive + equity
  • Vacancy: 1 Person
  • Years of Experience: 8+
  • Education: Bachelor's Degree
  • Deadline: 2026-12-31.