Job Description
Business Unit
What the Role Entails
Lead core technology R&D for the post-training stage of large language models (LLMs), including the design and optimization of high-quality reward systems. Continuously improve the model’s capabilities in complex instruction following, logical reasoning, and value alignment through Reward Modeling (RM) and Reinforcement Learning (RL) algorithms.
Conduct in-depth research on and optimize post-training algorithms such as RLHF to improve training stability and overall model performance.
Take ownership of data synthesis and management in the post-training stage. Design efficient data flywheel mechanisms, leverage techniques such as Supervised Fine-Tuning (SFT) and Self-Instruct to generate high-quality training data, and build a closed-loop signal modeling system that translates user feedback into model iteration.
Be responsible for comprehensive evaluation and analysis of post-trained models, establish scientific evaluation metrics, stay up to date with cutting-edge research, and rapidly translate the latest advances into business value.
Who We Look For
Master’s degree or above in Computer Science, Software Engineering, Artificial Intelligence, or a related field.
Strong understanding of Transformer architecture and the training principles of large language models. Deep research and hands-on experience in at least one of the following areas: LLM alignment, RLHF, Reward Modeling, or other post-training techniques.
Solid foundation in algorithms and strong engineering skills. Proficient in Python and familiar with deep learning frameworks such as PyTorch or TensorFlow.
Hands-on experience with distributed training. Familiarity with large-scale training and inference frameworks such as Megatron-LM, DeepSpeed, and vLLM is preferred. Experience training or fine-tuning models with tens or hundreds of billions of parameters is a strong plus.
Strong research capability. Candidates with publications at top-tier conferences such as NeurIPS, ICLR, ICML, ACL, or EMNLP, or with high-impact contributions to open-source communities such as Hugging Face, will be preferred.
Strong passion for technology and self-motivation, with the ability to analyze and solve complex problems, as well as excellent teamwork and communication skills.
Location State(s)
US-New York State-New York
The expected base pay range for this position in the location(s) listed above is $182,500.00 to $343,200.00 per year. Actual pay may vary depending on job-related knowledge, skills, and experience.
Employees hired for this position may be eligible for a sign on payment, relocation package, and restricted stock units, which will be evaluated on a case-by-case basis.
Subject to the terms and conditions of the plans in effect, hired applicants are also eligible for medical, dental, vision, life and disability benefits, and participation in the Company’s 401(k) plan. The Employee is also eligible for up to 15 to 25 days of vacation per year (depending on the employee’s tenure), up to 13 days of holidays throughout the calendar year, and up to 10 days of paid sick leave per year.
Your benefits may be adjusted to reflect your location, employment status, duration of employment with the company, and position level. Benefits may also be pro-rated for those who start working during the calendar year.
Equal Employment Opportunity at Tencent
As an equal opportunity employer, we firmly believe that diverse voices fuel our innovation and allow us to better serve our users and the community. We foster an environment where every employee of Tencent feels supported and inspired to achieve individual and common goals.