We compress large language models (LLMs). Our information-theoretic structural pruning and quantization algorithm shrinks model footprints by over 80% without retraining, in hours rather than weeks.
This is an applied research role. You'll develop new compression methods and ship them, not write papers about them. The cycle is short: read the literature, prototype, benchmark on real models, integrate into our pipeline, iterate with customers running compressed models in production.
You'll own significant technical scope from day one. Expect to work across the stack: pruning algorithms, quantization, evaluation infrastructure, and the production code that customers actually use.
Send CV, list of representative papers, and other relevant info to [email protected]. Tell us in two paragraphs what you'd want to work on at Ora and why. We respond within a week.
Initiativbewerbung
Groupone
Key Account Manager (w/m/d)
Element Logic
Implementation Engineer (all genders)
TeamViewer Germany GmbH
Team Lead, Software Development (all genders)
TeamViewer Germany GmbH
Head of Sales and Marketing (m/f/d)
team.blue Global
Team Lead Marketing & Sales (m/f/d)
team.blue Global