Job Title: ML Infrastructure Senior Engineer Location: Abu Dhabi, United Arab Emirates [Full relocation package provided] Job Overview We are seeking a skilled ML Infrastructure Engineer to join our growing AI/ML platform team. This role is ideal for someone passionate about large-scale machine learning systems and has hands-on experience deploying LLMs/SLMs using advanced inference engines like vLLM. You will play a critical role in designing, deploying, optimizing, and managing ML models and the infrastructure around them—both for inference, fine-tuning and continued pre-training. Key Responsibilities · Deploy large-scale or small language models (LLMs/SLMs) using inference engines (e.g., vLLM, Triton, etc.). · Collaborate with research and data science teams to fine-tune models or build automated fine-tuning pipelines. · Extend inference-level capabilities by integrating advanced features such as multi-modality, real-time inferencing, model quantization, and tool-calling. · Evaluate and recommend optimal hardware configurations (GPU, CPU, RAM) based on model size and workload patterns. · Build, test, and optimize LLMs Inference for consistent model deployment. · Implement and maintain infrastructure-as-code to manage scalable, secure, and elastic cloud-based ML environments. · Ensure seamless orchestration of the MLOps lifecycle, including experiment tracking, model registry, deployment automation, and monitoring. · Manage ML model lifecycle on AWS (preferred) or other cloud platforms. · Understand LLM architecture fundamentals to design efficient scalability strategies for both inference and fine-tuning processes. Required Skills Core Skills: · Proven experience deploying LLMs or SLMs using inference engines like vLLM, TGI, or similar. · Experience in fine-tuning language models or creating automated pipelines for model training and evaluation. · Deep understanding of LLM architecture fundamentals (e.g., attention mechanisms, transformer layers) and how they influence infrastructure scalability and optimization. · Strong understanding of hardware-resource alignment for ML inference and training. Technical Proficiency: · Programming experience in Python and C/C++, especially for inference optimization. · Solid understanding of the end-to-end MLOps lifecycle and related tools. · Experience with containerization, image building, and deployment (e.g., Docker, Kubernetes optional). Cloud & Infrastructure: · Hands-on experience with AWS services for ML workloads (SageMaker, EC2, EKS, etc.) or equivalent services in Azure/GCP. · Ability to manage cloud infrastructure to ensure high availability, scalability, and cost efficiency. Nice-to-Have · Experience with ML orchestration platforms like MLflow, SageMaker Pipelines, Kubeflow, or similar. · Familiarity with model quantization, pruning, or other performance optimization techniques. · Exposure to distributed training frameworks like Unsloth, DeepSpeed, Accelerate, or FSDP.