Associate NLP Text Mining Data Scientist jobs in Pune at Evolent Health

Job title: Associate NLP & Text Mining Data Scientist

No of yrs. of exp: 0-2 years

Work Location:Pune, MH, India

No of Positions:1

Your Future Evolves Here

Evolent Health has a bold mission to change the health of the nation by changing the way health care is delivered. Our pursuit of this mission is the driving force that brings us to work each day. We believe in embracing new ideas, challenging ourselves and failing forward. We respect and celebrate individual talents and team wins. We have fun while working hard and Evolenteers often make a difference in everything from scrubs to jeans.

Are we growing? Absolutely—56.7% in year-over-year revenue growth in 2016. Are we recognized? Definitely. We have been named one of “Becker’s 150 Great Places to Work in Healthcare” in 2016 and 2017, and one of the “50 Great Places to Work” in 2017 by Washingtonian, and our CEO was number one on Glassdoor’s 2015 Highest-Rated CEOs for Small and Medium Companies. If you’re looking for a place where your work can be personally and professionally rewarding, don’t just join a company with a mission. Join a mission with a company behind it.

Position Summary

The Associate NLP and Text Mining scientist will support building of Data Science & AI products in Agile fashion that empower healthcare payers, providers and members to quickly process medical data to making informed decisions and overall reduce health care costs. As a junior research scientist/engineer part of Data Science and Artificial Intelligence team you will be working primarily on unstructured text data to build machine learning models for information retrieval applications.  These applications include but are not limited to optical character recognition, understanding the contents of the medical documents using natural language processing, and integrating processes into the overall AI pipeline to mine healthcare and medical information with high recall and other relevant metrics. We ingest claims, medical charts, etc. from providers containing unstructured data which will be transformed into structured data to support automated entry into our storage layers for downstream applications. The results will be used dually for real-time operational processes with both automated and human-based decision making as well as contribute to reducing healthcare administrative costs. We work with all major cloud and big data vendors offerings including but not limited to (Azure, AWS, Google, IBM, etc.) to achieve AI goals in healthcare and support Evolent business.  

Essential Functions

The Jr. NLP Text Mining Scientist / Engineer will have the opportunity Learn, contribute to shape team culture and operating norms as a result of the fast-paced nature of a new, high-growth organization. 

0-2 years of Industry experience primarily related to Unstructured Text Data and NLP (PhD work and internships will be considered if they are related to unstructured text towards industry experience

Develop Natural Language Medical/Healthcare documents comprehension related products to support Evolent Health business objectives, products and improve processing efficiency, reducing overall healthcare costs

Gather external data sets; build synthetic data and label data sets as per the needs for NLP/NLR/NLU  

Apply software engineering skills to build Natural Language products to improve automation and improve user experiences leveraging unstructured data storage, Entity Recognition, POS Tagging, ontologies, taxonomies, data mining, information retrieval techniques, machine learning approach, distributed and cloud computing platforms

Assist owners of the Natural Language and Text Mining products — from platforms to systems for model training, versioning, deploying, storage, and testing models with creating real time feedback with human in the loops to fully automated services

Work closely and collaborate with Data Scientists, Machine Learning engineers, IT teams and Business stakeholders spread out across various locations in US and India to achieve business goals

Strong understanding of mathematical concepts including but not limited to linear algebra, Advanced calculus, partial differential equations, and statistics including Bayesian approaches

Good programming experience including understanding of concepts in data structures, algorithms, compression techniques, high performance computing, distributed computing, and various computer architecture through education and internship

Good understanding and experience with traditional data science approaches like sampling techniques, feature engineering, classification and regressions, SVM, trees, model evaluations

Additional course work, projects, research participation and/or publications in Natural Language processing, reasoning, and understanding, information retrieval, text mining, search, computational linguistics, ontologies, semantics is preferred     

Hands on Knowledge with developing in two or more of the following languages (Python, C++, Java, Scala)

Strong Unix/Linux background and experience with at least exposure to one of the following cloud vendors like AWS, Azure, and Google

Knowledge or Hands on experience with one or more of high-performance computing and distributed computing like Spark, Dask, Hadoop, CUDA distributed GPU

Basic understanding of deep learning architectures and hands on experience with one or more frameworks like tensorflow, pytorch, keras

Knowledge or experience with one or more libraries and tools like Spacy, NLTK, Stanford core NLP, Genism,

Understanding business use cases and be able to translate them to actual work

Identify enhancements and follow best practices that can help to improve the productivity of the team.

Nice to Have

Knowledge of Medical concepts with codes from standard ontologies (SNOMED CT, LOINC, RxNorm, ICD, etc.)

Knowledge/Experience with dockers  

Knowledge of REST API’s

Participation in open source community projects

Academic Qualification

Master’s degree or above in Computer Science, Computational linguistics, Mathematics, Physics or electrical engineering with research experience from a strong academic program along with thesis (No Post Graduate diplomas and undergraduate degrees)

Completion of thesis/research is required as part of graduation in computer science, artificial intelligence, Mathematics, Physics, Electrical Engineering or statistics

A PhD degree in Computer Science, Artificial Intelligence, Computational Linguistics, Machine Learning, or related technical field is preferred from a strong academic program

Publication record in top NLP conferences (NIPS, ICLR, ACL, NAACL, EMNLP, SIGIR, WWW etc) is preferred