Engineering

A model that owns your domain — in your VPC, on your evals.

Twelve weeks from kickoff to a fine-tuned model your eval harness accepts and your CISO will sign off on. No prompts leave your boundary, no weights leave your repo, no surprise renewal at month thirteen.

Talk to engineering See the 22-week plan

12 wks

From kickoff to a model your eval gates accept.

−43%

Median latency vs. the off-the-shelf baseline we replaced.

Customer prompts in third-party hands. VPC-only by default.

What's included

Four artefacts. All yours at the end.

⌬

Eval harness, owned by your team

We co-author your evals on day three. Goldens, adversarial sets, drift checks, and a CI gate that blocks any deploy that regresses past your tolerance.

⊞

Fine-tuning + alignment loop

LoRA / QLoRA against your task. Preference data collection, reward modelling, DPO/IPO when the use case warrants. We tell you when not to fine-tune.

⌖

Trust boundary diagrams

Where prompts come from, where outputs go, what runs in your VPC vs. ours. Auditors and CISOs leave the kickoff with the diagram they wanted.

⤳

Deployment + observability

Deploys behind your VPC (vLLM, TGI, Triton, or hosted — your call). Metrics, traces, and prompt-replay built in. PagerDuty wired to your existing rotation.

How we work

Twenty-two weeks. Five gates. One pager handle.

01
Discovery (week 1)
Telemetry pull, cost model, capability gap analysis. We leave with a written hypothesis and a kill-criterion both sides agreed to up front.
02
Eval harness first (weeks 2-3)
Before any training, we author the gates. Goldens, adversarial sets, drift detectors, latency budgets. If the harness is wrong, the model is wrong.
03
Baseline + iteration (weeks 4-7)
Best-in-class off-the-shelf model becomes the floor. Iterate against your evals. Each run produces a leaderboard your team can read.
04
Hardening (weeks 8-10)
Adversarial robustness, jailbreak defence, PII redaction, prompt injection mitigations. Soak tests against synthetic and real traffic.
05
Production cutover + 90-day operate (weeks 11-22)
Canary deploy, then full rollout behind a feature flag. We hold the pager for 90 days while your team takes over the runbook with us.

Engagement shapes

Pick the depth that matches the problem.

	Sprint	EngagementMost picked	Production
Discovery + capability map	●	●	●
Eval harness co-authored	—	●	●
Custom model fine-tuning	—	●	●
VPC deployment + observability	—	●	●
Adversarial hardening	—	—	●
On-call coverage (90 days)	—	—	●
Quarterly upgrade cadence	—	—	●

“The eval harness is the artefact we still measure every deploy against, eighteen months later. That alone paid for the engagement twice over.”

Daniel Okafor

Head of Platform · Northwind Apparel

−61%

Latency vs. baseline at p95

3.2×

Eval-pass rate vs. baseline

18 mo

In production without rollback

FAQ

The questions every CISO asks first.

Do you bring a preferred model?

We're model-agnostic. Llama, Mistral, Qwen, Claude, GPT — whichever your eval harness picks. We tell you when an off-the-shelf API is the right answer instead of fine-tuning.

Where does training happen?

Your VPC by default (AWS, GCP, Azure, on-prem). We bring our infra IaC, you bring the credentials. Customer prompts and gradients never leave your boundary.

How do you price?

Fixed-price per phase with a written kill-criterion. No multi-month retainers, no scope creep tax. Numbers in the SOW, not in the back half.

What happens after the engagement ends?

You own the repo, weights, evals, dashboards, and runbook. We hand off via a two-week parallel-rotation. Quarterly upgrade cadence is optional, paid per quarter.

Do you sign a BAA / data processing agreement?

Yes. We've signed BAAs, DPAs, and customer-specific security agreements. Bring your paper.

Can you ramp faster than the listed weeks?

Sometimes. We compress when the eval harness can be authored in parallel with discovery. We won't compress at the cost of the eval harness — that's the only artifact that survives the engagement.

Bring us the model decision.

Twenty-minute call with engineering. We'll tell you in plain terms whether to fine-tune, prompt, distill, or stay with the API.

Talk to engineering Browse all services

A model that owns your domain — in your VPC, on your evals.

Four artefacts. All yours at the end.

Eval harness, owned by your team

Fine-tuning + alignment loop

Trust boundary diagrams

Deployment + observability

Twenty-two weeks. Five gates. One pager handle.

Discovery (week 1)

Eval harness first (weeks 2-3)

Baseline + iteration (weeks 4-7)

Hardening (weeks 8-10)

Production cutover + 90-day operate (weeks 11-22)

Pick the depth that matches the problem.

The questions every CISO asks first.

Bring us the model decision.