How much does a CREASTRA website cost?

Landing — from $3,000, corporate site — from $7,500, SaaS platform — from $18,000. Final estimate after a short brief.

How long does a project take?

Landing — 3-4 weeks, corporate site — 6-10 weeks, mobile MVP — 8-14 weeks. Timeline is fixed in the contract.

Do you do franchise packaging?

Yes. Full franchise pack: brand book, ops manuals, training, marketing kit, control system. We've launched franchises in HoReCa, EdTech, retail.

Is there a delivery guarantee?

Yes. Contract-backed, KPI-fixed, with checkpoint approvals. Development comes with a 12-month bug-fix warranty.

What locations do you serve?

Studio in Moscow, projects worldwide. Remote-first, regular calls, transparent task tracker.

← Back to blog

AI agentsMarch 4, 2026· 7 min read

AI agents in production: a 2026 checklist

Over twelve months we shipped nine agents for retail, FinTech and logistics. What breaks most often, which stack to pick, and why a GPT wrapper isn't an agent.

Creastra Digest

An agent = tools + memory + checklists; a prompt alone is not an agent
You spend 80% of effort on guardrails and observability, not the model
Quality measurement: a 50-case golden set, refreshed each sprint

By early 2026 we run nine agents in production. One triages support tickets at a bank, another writes marketplace product copy, a third reconciles warehouse inventory in real time. None of them is a GPT wrapped in a system prompt — they're full services with their own lifecycle.

1. How an agent differs from a chatbot

An agent solves a task, a chatbot holds a conversation. That means: tools (search, code, DB), memory (short + long-term) and an explicit success-criteria checklist. Skip any one of those and you have a demo, not a service.

2. The stack that survived production

Models: GPT-5 + Claude Opus 4.6 + local Qwen 3 — each on its own task class
Orchestration: LangGraph for long-running processes, Inngest for cron agents
Memory: pgvector for semantics, Redis for episodes, Postgres for facts
Observability: Langfuse + our own grader agent

3. What breaks most often

Not the model. Frontier models drift by single-digit percent per quarter. What breaks: integrations (an API changed shape), policies (a new rule landed but the prompt wasn't updated) and user expectations (yesterday was "describe", today is "sell").

4. How to measure quality

Take 50 real inputs from production (not synthetic). Label the gold answer. Run the agent daily, score on two axes: correctness and completeness. Refresh the golden set every sprint — that's the quality contract you offer the client.