Help shape the future of AI by designing challenging math problems that test and improve reasoning in cutting-edge models. Location: Remote Type: Contract / Part-time Commitment: 20 hours per week Compensation: Up to 40 USD / hr Project duration: 2 months , with potential extension Availability: Immediate start About the role We create high-quality STEM training data for frontier AI models. Our data is used directly in training and evaluation pipelines at leading AI labs to improve model reasoning in technical domains. We are looking for experts in Mathematics to design rigorous, deterministic problems that are genuinely challenging for state-of-the-art AI systems. Each problem must have exactly one verifiable correct answer and be submitted together with a complete, verified solution. What you’ll do Design advanced mathematical problems for frontier AI training and evaluation Create deterministic problems with exactly one correct answer Write complete, verified solutions and clearly document the reasoning process Develop problems that test deep mathematical reasoning, not just memorization Where relevant, use Python or specialized tools to build computational workflows Ensure all outputs are technically precise, reproducible, and well-written in English What we’re looking for Master’s or PhD in Mathematics or a closely related field Strong research or industry experience involving mathematical modeling, proof-based reasoning, applied mathematics, or computational mathematics Strong Python skills; comfort with libraries such as numpy, scipy, pandas, or similar Solid grasp of algorithms, numerical methods, and computational approaches Ability to design original, difficult problems that mirror real mathematical workflows Excellent attention to detail and technical writing skills in English Nice to have Experience with symbolic math systems, theorem provers, optimization solvers, or other mathematical software Background in olympiad-level, graduate-level, or research-level problem design Experience evaluating model reasoning, benchmarking, or technical assessment design
Product Analyst for TPP Supply
Emerging Travel Group
Conversational Data Collection Associate (Audio Recording) - Portuguese
Volga Partners
OpenShift Platform Engineer
Syffer
Senior Product Marketing Manager (all genders)
TeamViewer Germany GmbH
(Senior) Product Owner - Frontline (all genders)
TeamViewer Germany GmbH
Business Data Analyst
Synnex