All roles
AI/ML

AI/ML Platform Engineer

AI/ML Remote — Global Full-time Senior

Ready to apply? Send us your details and a short note.

Apply for this role

You'll own the platform that runs open-weight models and autonomous agents in production — GPU scheduling, inference serving with vLLM, vector stores, and the secure runtimes our clients deploy agents on.

What you'll do

  • Operate model-serving infrastructure (vLLM, Ollama) on GPU clusters
  • Build agent runtimes with sandboxing, tool gateways, and queues
  • Design retrieval pipelines and vector store integrations
  • Optimize inference latency, throughput, and cost

What we're looking for

  • 4+ years in ML infrastructure or backend platform engineering
  • Hands-on with GPU serving and open-weight LLMs (Llama, Qwen, DeepSeek)
  • Strong Python and containerized deployment skills
  • Comfortable with Kubernetes and distributed systems

Nice to have

  • Experience with RAG and vector databases
  • Contributions to inference/serving OSS

Think you're a fit?

Apply for this role