Principal AI Architect

CoAdvantage

8 days ago

Remote

Full Time

Intermediate or Experienced

Bradenton, Florida, United States

About the job

CoAdvantage is an HCM company providing payroll, ASO, and PEO services to 16,000 clients. We deliver payroll, benefits, HR compliance, time/PTO, and risk management solutions, and we are building a governed AI platform that will become a primary source of differentiation versus AI-native competitors.

We are standing up a parallel AI architecture- a microservice plane that talks to legacy systems via ETL contracts- anchored on three substrates (engineering knowledge graph, analytics feature store, customer knowledge store) and a governed multi-agent harness. The Principal AI Architect owns the technical shape of that platform.

What You'll Own- You are the most senior individual contributor on the AI program. You make the load-bearing technical decisions that everyone else builds against.

• Own the reference architecture for the AI plane end-to-end: data substrates, model layer, agent harness, orchestration, application surfaces, observability, and the ETL boundary to core platform and adjacent systems (payroll, benefits, claims, ADO).

• Drive build-vs-buy decisions on the substrate trio- graph (Neo4j vs. TigerGraph vs. managed), vector (pgvector vs. Pinecone vs. Azure AI Search), feature store (Feast vs. Databricks Feature Store vs. Tecton), and warehouse direction (Azure Fabric vs. Snowflake)- with written, defensible recommendations.

• Specify the multi-agent harness: agent envelopes (planning, tools, reflection, memory), reward functions and hacking watchlists, HITL triggers, handoffs, capability gradients, and the deterministic Orchestrator that controls them.

• Define tenant isolation, identity resolution, and AuthZ for the Customer Knowledge Store across three access tiers (Client Admin, WSE, Internal CoAd)- cross-tenant leakage is a catastrophic-tier risk and you own the controls that prevent it.

• Set the assurance program: input/output guardrails, red-team plan, evals, circuit breakers, immutable lineage, and the production → KG feedback loop.

• Partner with the Head of AI on the 18-month roadmap and the platform consolidation thesis.

• Own the runtime execution architecture for the agentic platform, including orchestration topology, state management, workflow durability, retry semantics, and failure recovery strategies across distributed multi-agent systems.

• Define platform-wide evaluation and reliability standards, including groundedness metrics, hallucination detection, behavioral regression testing, drift monitoring, degraded operating modes, and production resiliency requirements.

• Establish governance standards for AI platform operations, including inference cost optimization, data contracts and lineage, tenant-safe memory architecture, and security controls for regulated multi-tenant environments.

• You will write specs, code, and prototypes. You will not be a deck-only architect.

How We Work

• AI-first coding. Claude Code, Copilot, and successor tools are the default development surface. Hand-coding without AI assistance is the exception, not the norm.

• Build your own agentic workflows. You will compose and operate your own multi-step agent pipelines- for code generation, spec mining, design review, eval authoring, migration analysis. If a workflow doesn't yet exist for a job you do twice, you build it.

• Every workflow is testable. Every agent, every chain, every prompt that touches production has an eval harness, a regression suite, and a defined success criterion before it ships.

• Ambiguity is the job. You will get problems framed at the strategy level and return a sequenced, costed, testable plan with a working prototype.

• You estimate. Every recommendation comes with a timeline, a headcount ask, a confidence interval, and a list of the assumptions that could blow it up.

• You suggest the tools. Strong opinions, loosely held, written down.

First 90 Days

• Publish a v1.0 reference architecture for the AI plane that the Head of AI signs and the CIO endorses.

• Deliver a build-vs-buy recommendation, with cost model and 18-month TCO, for the substrate trio.

• Stand up a working prototype of one agent in the harness (Codi or Tespi from the v0.4 spec) with its eval suite, reward function, and HITL path- end-to-end on a real CoAd repo or ticket stream.

• Define the cross-tenant leakage red-team protocol for the Customer Knowledge Store and run the first round against the prototype.

• Co-author the implementation timeline that converts the current planning blueprint into a sequenced, dependency-ordered build plan.

Required Skills & Experience

• 10+ years building production software; 5+ years architecting ML, LLM, or AI platforms at scale.

• Demonstrated ownership of a multi-team AI platform- including substrate decisions (vector, graph, feature store, warehouse), orchestration, and serving- through one or more production releases.

• Working fluency with agentic coding tools (Claude Code, Cursor, Copilot, or equivalent) as a daily driver, with a body of work- repos, specs, postmortems- produced with them.

• Production experience with at least one multi-agent system: orchestration patterns, tool calling, memory tiers, reward design, HITL, and the failure modes (collusion, reward hacking, loop runaway, capability creep).

• Hands-on with vector search, knowledge graphs, and RAG at tenant-scoped, regulated-data scale.

• Strong opinions on eval design: regression suites, behavioral evals, red-team protocols, training-serving skew detection, drift monitors.

• Comfort writing executable timelines and headcount estimates and defending them in front of a CIO and a board.

• Cloud-native infrastructure depth (Azure preferred given current CoAd footprint, AWS or GCP acceptable).

• Excellent technical writing- specs, ADRs, ConOps, RFCs- that non-engineers can follow.

Preferred Experience

• Prior architect-level work in PEO, HCM, payroll, benefits, insurance, or another regulated multi-tenant SaaS.

• Experience operating under HIPAA, SOC 2, and state-level payroll/tax regimes.

• Public work on agent safety, governance, or assurance (writing, OSS, research).

• Familiarity with Databricks, Azure AI Search, Snowflake, and modern observability for LLM systems (Langfuse, Phoenix, OpenTelemetry-for-LLMs).

• Experience leading a platform migration / consolidation under a hard deadline.

EEO

CoAdvantage is committed to providing equal employment opportunities to all employees and applicants without regard to race, color, religion, national origin, ancestry, citizenship status, age, sex (including pregnancy, childbirth, breast feeding and pregnancy-related medical conditions), gender, gender identity or expression, sexual orientation, marital status, uniform service member and veteran status, disability, genetic information, or any other characteristic protected by applicable federal, state, or local laws and ordinances.

Benefits

Health Insurance

Dental Insurance

Vision Insurance

401(k) Matching

Paid Time Off (PTO)

Paid Holidays

Remote Work

Bonus

Life Insurance

Hey — Let's find your next opportunity 🚀

Principal AI Architect

About the job

Benefits