About usAt Programize, we partner with teams of all sizes - from startups to established enterprises - across industries and continents to create innovative, high-impact software products. We don’t just implement requirements; we turn ambitious ideas into marketable software solutions we are genuinely proud to put our names on. With 200+ successfully delivered projects behind us, we’ve tackled everything from greenfield architectures to complex, large-scale platforms.
Our vision is to become the go-to company for entrepreneurs and engineers, who want to design and develop impactful, scalable software systems.
To achieve that, we need talented professionals to join our team, to share the thrill for technology and innovation.
The RoleWe are looking for a Software Engineer to join our team and collaborate with an international organization building an advanced AI agent evaluation platform.
The platform is designed to test, evaluate, and benchmark AI agents at scale, helping organizations understand and improve the behavior, reliability, and performance of AI-powered systems. You will design and build production-grade services, APIs, and data-driven systems that power AI evaluation workflows. While you will work closely with LLMs and agent-based applications, the focus of the role is on engineering excellence, system reliability, and building scalable software solutions.
You will have the opportunity to own features end-to-end, collaborate with experienced engineers, and contribute to the next generation of AI-powered products.
What You'll Do- Backend services & APIs. Build and maintain the services, data models, and APIs that power the platform - designed for correctness, testability, and scale.
- Simulation & orchestration. Work on the systems that coordinate complex, multi-step interactions between AI agents and external systems, improving their reliability and throughput.
- Evaluation & scoring. Design systems that grade agent outputs, combining deterministic checks with model-assisted judgment - and make scoring reliable, explainable, and reproducible.
- Data pipelines. Build pipelines that generate, transform, and quality-check large volumes of structured data and benchmark content.
- Quality & reliability. Add the tests, instrumentation, and safeguards needed to trust outputs from systems that are inherently non-deterministic.
What We're Looking For- 4+ years building and shipping production software, with strong proficiency in Python.
- Deep software engineering fundamentals: system and API design, data modeling, concurrency/async, testing strategy, debugging, and code review. You can own a non-trivial end-to-end service.
- Experience designing and operating distributed or service-oriented systems (queues, workers, APIs), not just calling them.
- Comfort designing schemas and working with relational databases, plus the migrations and performance concerns that come with them.
- Working knowledge of LLM APIs, orchestration, structured outputs, and handling non-determinism. We expect you to use LLMs effectively, but this is not a prompt-engineering role.
- Ability to reason about correctness of probabilistic systems: how to test, measure, and trust outputs that aren't byte-for-byte deterministic.
- High quality bar: you write tests, types, and docs by default, and you keep changes small and reviewable.
Nice to Have- Experience building agentic or multi-agent systems, tool-use, or orchestration frameworks.
- Background in evaluation / benchmarking of ML or LLM systems (rubrics, golden datasets, model-as-judge, inter-rater reliability).
- Experience with distributed task queues and async workloads.
- Modern Python tooling and typed codebases (e.g. type checkers, linters, Pydantic, FastAPI).
- Retrieval / search experience and working with data ingest pipelines.
- Some comfort with the infra side (Docker, CI/CD) so you can ship what you build.
What to expect from usProgramize was founded on the values of respect and appreciation for customers and colleagues alike. We believe in equal opportunity, diversity, flexibility, hard work and continuous improvement in all aspects of our company. We want our people to feel happy, creative, productive and motivated. So, in Programize you will find the following:
- Friendly, respectful, and appreciative working environment.
- Competitive remuneration package.
- On-site and remote working options.
- Indefinite term contract.
- Lab-like, collaborative, and engaging environment.
- Continuous learning and growth opportunities.
- International working environment.
- Work-life balance.
- Private health insurance plan, including dependents.
- A place to grow. Programize supports and helps ideas to come to life!
- Team building activities.