Posted May 7, 2026
Key Responsibilities:
End-to-end development of production-grade ASR and Speech AI systems for real-time voice agents. - Design and optimize streaming architectures including end-pointing, end-of-turn detection, and conversational turn-taking. - Focus on LLM fine-tuning and building Retrieval-Augmented Generation (RAG) pipelines as well as agentic systems for enterprise conversational AI use cases. - Drive integration/benchmarking of allied modules such as VAD, LID, speaker diarization, and paralinguistic modeling (emotion, prosody). - Own benchmarking strategy against 3P ASR/LLM and define evaluation frameworks (accuracy, latency, cost). - Collaborate with engineering teams to deploy scalable, low-latency, and cost-effective speech/NLP systems. Qualifications:
Masters or Ph.D in Computer Science, Electrical/Computer Engineering or related field. - Minimum of around 2 years of industry experience in AI, NLP, Vision or ASR for candidates with Masters degree. - Expertise in Python and deep learning frameworks such as PyTorch or TensorFlow, with experience using modern ML tooling (e.g., Hugging Face, vLLM). - Experience optimizing streaming ASR (RTF, decoding strategies, end-pointing latency). - Ability to drive benchmarking, system optimization, and deployment readiness. - Good publication record in leading conferences. - [Good to have] Prior Experience in pre-training, fine-tuning, agentic systems and reinforcement learning. Role Overview: As a part of Uniphore, you will be involved in the end-to-end development of production-grade ASR and Speech AI systems for real-time voice agents. Your role will also include designing and optimizing streaming architectures, focusing on LLM fine-tuning, building Retrieval-Augmented Generation (RAG) pipelines, and driving integration/benchmarking of allied modules such as VAD, LID, speaker diarization, and paralinguistic modeling. Collaboration with engineering teams to deploy scalable, low-latency, and cost-effective speech/NLP systems will be a significant part of your responsibilities. Key Responsibilities:
End-to-end development of production-grade ASR and Speech AI systems for real-time voice agents. - Design and optimize streaming architectures including end-pointing, end-of-turn detection, and conversational turn-taking. - Focus on LLM fine-tuning and building Retrieval-Augmented Generation (RAG) pipelines as well as agentic systems for enterprise conversational AI use cases. - Drive integration/benchmarking of allied modules such as VAD, LID, speaker diarization, and paralinguistic modeling (emotion, prosody). - Own benchmarking strategy against 3P ASR/LLM and define evaluation frameworks (accuracy, latency, cost). - Collaborate with engineering teams to deploy scalable, low-latency, and cost-effective speech/NLP systems. Qualifications:
Masters or Ph.D in Computer Science, Electrical/Computer Engineering or related field. - Minimum of around 2 years of industry experience in AI, NLP, Vision or ASR for candidates with Masters degree. - Expertise in Python and deep learning frameworks such as PyTorch or TensorFlow, with experience using modern ML tooling (e.g., Hugging Face, vLLM). - Experience optimizing streaming ASR (RTF, decoding strategies, end-pointing latency). - Ability to drive benchmarking, system optimization, and deployment readiness. - Good publication record in leading conferences. - [Good to have] Prior Experience in pre-training, fine-tuning, agentic systems and reinforcement learning.
Don't want to apply yourself?
Our team writes your resume, applies for you, preps you for interviews, and negotiates your offer.
Browse Jobs
By Role
By City