Join our Talent Network
Skip to main content

NLP & LLM Data Scientist – Healthcare & Life Sciences

Location: , United States

Save Job Saved


About Citeline

Citeline is one of the world's leading providers of data and intelligence on clinical trials, drug 
treatments, medical devices and what's new in the regulatory and commercial landscape. Relying on us to deliver vital advantage when making critical R&D and commercial decisions, our customers come from over 3000 of the world’s leading pharmaceutical, contract research organizations (CROs), medical technology, biotechnology and healthcare service providers, including the top 10 global pharma and CROs.  
From drug and device discovery and development to regulatory approval, and from product launch to lifecycle management, we provide the intelligence and insight to help our customers seize opportunities, mitigate risk and make business-critical decisions, faster. As the pharma and healthcare sector faces unparalleled upheaval, customers rely on our independent advice, enabling them to cut through the clutter and make sense of changing drug development, regulatory and competitive landscapes. 
Now, Citeline is proud to be a part of Norstella, an organization that consists of market-leading pharmaceutical solutions providers united under one goal: to improve patient access to life-saving therapies. Within this organization, Citeline plays a key role in helping clients connect the dots from pipeline to patient. 

Job Description

Norstella Real World Data (RWD) is seeking a skilled NLP Data Scientist with a focus on Language Models to join our AI & Life Sciences Solutions team. Your expertise in processing and understanding natural language data, along with your knowledge of Electronic Health Records (EHR) and laboratory reports analysis, will be instrumental in driving our data science initiatives and innovations, particularly in the development of rich multimodal real-world datasets to expedite RWD-driven drug development in pharma.

Key duties & responsibilities

  • Employ and leverage NLP and open-source Large Language Models (LLM) such as LLama2, Mixtral, BERT, etc., to extract, process, and interpret unstructured medical data from diverse sources like EHRs, medical notes, and laboratory reports.
  • Collaborate with clinical scientists and data scientists to create efficient NLP models for healthcare, exhibiting an understanding of both the technical and medical aspects of the data.
  • Conduct data cleaning, preprocessing, and validation to maintain the accuracy and reliability of insights gathered from NLP processes.
  • Validate and present data findings to stakeholders, exhibiting clear and effective communication skills.

Key requirements

  • Master's or Ph.D. degree in Computer Science, Data Science, Computational Linguistics, or a related analytical field.
  • Deep understanding and direct experience (2+ years) in handling and interpreting either Electronic Health Records (EHR) and laboratory tests results or genetic test results is a must.
  • Proven experience (2+ years) in NLP with a strong knowledge of NLP techniques such as Named Entity Recognition (NER), text summarization, topic modeling, etc. and their applied use in healthcare.
  • Expert-level understanding and practical experience (1+ years) with open-source Large Language Models (Llama2/3, Mixtral etc.), e.g., prompt engineering, inference, and fine-tuning.
  • Proficient in Python and SQL, with strong experience in NLP libraries such as NLTK, spaCy, Hugging face Transformers, and deep learning libraries such as PyTorch, TensorFlow.
  • Familiarity with common data science and ML practices, e.g., version control systems, agile methodologies, and documentation.
  • Experience in working with AWS cloud environment and large databases (e.g., AWS redshift).
  • Experience in managing ML lifecycle using open-source tools (e.g., MLflow).
  • Detail-oriented with strong analytical and problem-solving abilities.
  • Excellent verbal and written communication skills, with ability to present complex data to non-technical audience.

Nice to have 

  • Experience dealing with protected health information (PHI) and familiarity with healthcare-related data privacy laws such as HIPAA.
  • Familiarity with standard healthcare codes and terminologies such as ICD-10, CPT, LOINC, and SNOMED CT.
  • Experience in RAG (Retrieval-Augmented Generation) and vector store in the context of storing large volume of healthcare unstructured documents and querying those.

Our guiding principles for success at Norstella

01:  Bold, Passionate, Mission-First   
We have a lofty mission to Smooth Access to Life Saving Therapies and we will get there by being bold and passionate about the mission and our clients.  Our clients and the mission in what we are trying to accomplish must be in the forefront of our minds in everything we do.   
02:  Integrity, Truth, Reality  

We make promises that we can keep, and goals that push us to new heights.  Our integrity offers us the opportunity to learn and improve by being honest about what works and what doesn’t.  By being true to the data and producing realistic metrics, we are able to create plans and resources to achieve our goals.    
03:  Kindness, Empathy, Grace  

We will empathize with everyone's situation, provide positive and constructive feedback with kindness, and accept opportunities for improvement with grace and gratitude.  We use this principle across the organization to collaborate and build lines of open communication.    
04:  Resilience, Mettle, Perseverance  

We will persevere – even in difficult and challenging situations.  Our ability to recover from missteps and failures in a positive way will help us to be successful in our mission.  
05:  Humility, Gratitude, Learning  

We will be true learners by showing humility and gratitude in our work.  We recognize that the smartest person in the room is the one who is always listening, learning, and willing to shift their thinking.   


  • Medical and Prescription Drug Benefits 
  • Health Savings Accounts (HSA) or Flexible Spending Accounts (FSA)
  • Dental & Vision Benefits 
  • Basic Life and AD&D Benefits 
  • 401k Retirement Plan with Company Match 
  • Company Paid Short & Long-Term Disability
  • Paid Parental Leave 
  • Education Reimbursement   
  • Paid Time Off & Company Holidays 
The expected base salary for this position ranges from $130,000 to $160,000. It is not typical for offers to be made at or near the top of the range. Salary offers are based on a wide range of factors including relevant skills, training, experience, education, and, where applicable, licensure or certifications obtained. Market and organizational factors are also considered. In addition to base salary and a competitive benefits package, successful candidates are eligible to receive a discretionary bonus.
Norstella is an equal opportunities employer and do not discriminate on the grounds of gender, sexual orientation, marital or civil partner status, pregnancy or maternity, gender reassignment, race, colour, nationality, ethnic or national origin, religion or belief, disability or age. Our ethos is to respect and value people’s differences, to help everyone achieve more at work as well as in their personal lives so that they feel proud of the part they play in our success. We believe that all decisions about people at work should be based on the individual’s abilities, skills, performance and behaviour and our business requirements. Norstella operates a zero tolerance policy to any form of discrimination, abuse or harassment.   
We know that sometimes the 'perfect candidate' doesn't exist, and that sometimes the best opportunities are hidden by self-doubt. We disqualify ourselves before we have the opportunity to be considered. Regardless of where you came from, how you identify, or the path that led you here, you are welcome. If you read this job description and feel engaged and excited, we’d love to see you apply.   

Interested in a career at Citeline?
Join our Talent Network today!

Join our Talent Network