All roles
AI/ML
AI/ML Platform Engineer
AI/ML Remote — Global Full-time Senior
Ready to apply? Send us your details and a short note.
Apply for this roleYou'll own the platform that runs open-weight models and autonomous agents in production — GPU scheduling, inference serving with vLLM, vector stores, and the secure runtimes our clients deploy agents on.
What you'll do
- Operate model-serving infrastructure (vLLM, Ollama) on GPU clusters
- Build agent runtimes with sandboxing, tool gateways, and queues
- Design retrieval pipelines and vector store integrations
- Optimize inference latency, throughput, and cost
What we're looking for
- 4+ years in ML infrastructure or backend platform engineering
- Hands-on with GPU serving and open-weight LLMs (Llama, Qwen, DeepSeek)
- Strong Python and containerized deployment skills
- Comfortable with Kubernetes and distributed systems
Nice to have
- Experience with RAG and vector databases
- Contributions to inference/serving OSS
Think you're a fit?
Apply for this role