Every job listed here is analyzed by our AI to identify worldwide hiring — not just “remote in the US.” Our classification is actively being improved, some results may be inaccurate.
Worldwide Remote
Jobs reviewed for worldwide hiring.
Real Hiring Data
Country flags show the countries where each company has team members
Updated Hourly
Fresh jobs synced from thousands of career pages
Principal Python + LangGraph Engineer Engagement: Atlas (US health-tech, AI-driven mobile coaching platform) Engagement context We are partnering with a US-based health-tech company on the takeover of a production AI-powered mobile coaching platform. The platform is built around a Python AI core (atlas-ai) which runs A FastAPI chat surface with intent routing and tool-using agents A LangGraph-based agent framework with multi-agent orchestration A Celery + Redis task queue for asynchronous agent flows MongoDB for fitness-plan storage, Redis for conversation state, Postgres for LangGraph checkpoints Several agent personas: onboarding, chat, plan creation, in-workout smart adjust, plan smart adjust, habit formation Direct OpenAI integration for the LLM layer This is the product. If the AI core breaks, the product breaks. If LLM token cost runs away, our margin runs away. If the agent flows behave unpredictably under load, real users get bad coaching Role summary. As Principal AI Engineer you are the technical owner of the AI core. You lead the audit of the existing codebase, define the architecture we evolve toward, build the test and evaluation harness that lets us ship changes safely, and you are the engineer who is paged when the AI surface misbehaves in productio n.You will be the deciding voice on whether we can responsibly take this service in our scope or whether we keep the client's original team embedded longer. The audit conclusions you write in the first 4 weeks will inform the contractual scope of the entire engagement. What you'll do First 90 days Audit atlas-ai: agent flows, LangGraph state machines, Celery topology, datastore usage, OpenAI integration patteens Produce a written assessment of operational risk: failure modes, race conditions, retry semantics, idempotency, checkpoint integrity Quantify token cost per agent flow and per user session Identify the highest-risk subsystems and propose stabilisation plans Build (or harden) an evaluation harness for the agent flows — golden cases, regression suites, hallucination/safety tests Lead the knowledge-transfer sessions from the client's AI team Ongoing Set the technical direction for the AI core Lead design for new agent flows and major changes to existing ones Own the production health of the AI surface (with platform/SRE support) Hire and mentor the rest of the AI squad (~10 engineers at full scale) Represent the AI core in cross-team architecture conversations with the client Must-have skills 7+ years Python in production at senior+ level Deep LangGraph experience — state graphs, checkpoints, interrupts, multi-agent supervision, subgraphs Strong LangChain ecosystem knowledge (chains, tools, memory, output parsers, callbacks) Production FastAPI — streaming responses, dependency injection, middleware, async patterns Celery + Redis broker in production — task ordering, retries, idempotency, priority queues, dead-letter handing Concurrency in Python — asyncio (gather, structured concurrency, cancellation), threading boundaries, mixing sync and async code safelty Multi-datastore operations — MongoDB + Redis + Postgres in a single service, transaction boundaries across them OpenAI API at scale — rate limits, retries with exponential backoff, fallback model routing, streaming, tool/function calling Agent design patterns — ReAct, plan-and-execute, supervisor patterns, tool-use loops, multi-turn state, interrupt resumption Prompt engineering with discipline — evaluation, A/B testing, version control of prompts, regression detection Token cost optimisation — prompt caching (Anthropic-style), model tiering, context window trimming, summary memory Production LLM observability — per-route token spend, prompt-level tracing, drift monitoring Testing discipline — pytest (including pytest-asyncio), property-based testing, snapshot tests for prompts, eval-based tests for aagents Pydantic v2 fluency, type-hinted code throughout Nice-to have: RAG production experience (vector stores: Pinecone, Qdrant, pgvector) Production incident command for LLM-powered systems ML engineering background (model serving, feature engineering) Anthropic / Claude API experience in addition to Open AI Data pipeline experience (Airflow, Dagster, Prefect) Domain knowledge in fitness / health / wearables Experience working with cross-team JSI / native bridges (the Python core integrates with a mobile JSI layer Offer: Location: 100% remote Engagement: B2B Rate: upt to PLN 180 p/h Start: July