Every job listed here is analyzed by our AI to identify worldwide hiring — not just “remote in the US.” Our classification is actively being improved, some results may be inaccurate.
Worldwide Remote
Jobs reviewed for worldwide hiring.
Real Hiring Data
Country flags show the countries where each company has team members
Updated Hourly
Fresh jobs synced from thousands of career pages
Compiler Optimization Engineer / Remote or on-site / Well-funded startup Rare opportunity to join a well-funded start-up building a hardware-agnostic AI compiler that allows teams to deploy to any accelerator architecture from a single codebase. We are looking for a core engineer to join the team behind our graph optimization layer. In this role, you will have a direct hand in shaping how the next generation of AI models scale across diverse hardware. About the role: You'll design, implement, and maintain graph-level optimisation passes including operator fusion, layout propagation, tiling, dead code elimination, and constant folding You'll get the chance to define and evolve the intermediate representation (IR) to support new optimisation opportunities as ML model architectures advance You'll analyse real performance data to identify gaps and drive measurable improvements in throughput and latency You'll get the chance to build and contribute to testing and validation infrastructure to ensure correctness across optimisation passes You'll collaborate closely with frontend and code generation teams to maintain clean IR interfaces and well-structured pipelines You'll get the chance to propose and prototype new optimisation strategies in response to advances in model design and hardware capabilities Key Requirements: You'll have a degree in CS or Computer Engineering (BS, MS, or PhD) You'll bring strong C/C++ experience across performance-critical codebases You'll have deep understanding of graph-level compiler optimisation — fusion, tiling, layout transformations, DCE You'll be able to speak concretely about how your work translated into measurable performance improvements It's a big plus if: You've worked with MLIR, XLA, or similar graph-level IR frameworks You have familiarity with ML framework internals — PyTorch eager/compile mode, JAX/XLA, or TensorRT You've explored polyhedral models or affine analysis for loop and tensor optimisation You have an understanding of hardware memory hierarchies and how layout decisions affect GPU/accelerator performance You've worked with quantisation, sparsity, or model-level optimisation techniques You've contributed to open-source compiler or ML infrastructure projects