AI/ML

AI/ML Platform Engineer

AI/ML Remote — Global Full-time Senior

Ready to apply? Send us your details and a short note.

You'll own the platform that runs open-weight models and autonomous agents in production — GPU scheduling, inference serving with vLLM, vector stores, and the secure runtimes our clients deploy agents on.

What you'll do

Operate model-serving infrastructure (vLLM, Ollama) on GPU clusters
Build agent runtimes with sandboxing, tool gateways, and queues
Design retrieval pipelines and vector store integrations
Optimize inference latency, throughput, and cost

What we're looking for

4+ years in ML infrastructure or backend platform engineering
Hands-on with GPU serving and open-weight LLMs (Llama, Qwen, DeepSeek)
Strong Python and containerized deployment skills
Comfortable with Kubernetes and distributed systems

Nice to have

Experience with RAG and vector databases
Contributions to inference/serving OSS

Think you're a fit?

Apply for this role