Data Scientist you will be part of our growing team of data scientists and experts
You will be responsible for expanding and optimizing our data models prediction algorithms correlation algorithms as well as text analytics models
You will support our software developers data engineers on building and enhancing models You must be self directed and comfortable supporting the data needs of multiple teams systems and products
The right candidate will be excited by the prospect of optimizing or even re designing our real time anomaly detection correlation and forecasting models
You will be working on as a Data Scientist
The solutions developed by you would be highly scalable leveraging that needs to be run in Kubernetes Container environment Responsibilities
Some of the Solutions we work involve the following
Real time anomaly detection solutions that proactively identify service impacting incidents and prevent system downtimes
This is done by leveraging an ensemble of Deep learning and LSTM models
Natural Language Processing for entity topic clusters and relationship extraction
Text Analytics in human generated tickets and correlation with event tickets for event noise reduction
Apply Natural Language Classification and RNN algorithms to automatically route tickets
Log Analysis Text mining message clustering templatization
Logs to metrics anomaly detection event annotation and sequencing
Learn Log Message Sequence for each mainframe batch job and Identify
Anomalies during job runs using sequence mining techniques and provide early warning alerts
Cloud Migration Patternsbased discovery optimization Identify potential business application boundaries using algorithmic approach from Cloudscape data
Wave planner Employ goalbased reasoning from AI planning capabilities for Server affinity cost time blackout windows etc
To power the above use cases we have a Big Data system that can handle 23 TB of data daily and we manage a data lake that is 15 PB in size
Required Technical and Professional Expertise
Degree in Statistics
Mathematics-Computer Science or another quantitative field with 4 to 6 years of experience in manipulating data sets and building statistical models
Experience in using statistical computer languages R Python SQL etc to manipulate data and draw insights from large data sets
Experience in creating and using advanced machine learning algorithms and statistics such as regression simulation scenario analysis modelling clustering decision trees neural networks etc Experience in visualizing or presenting data for stakeholders using
Excel PowerBI Tableau etc Experience with distributed data or computing tools such as Hive Spark MySQL etc
Experience creating and using advanced machine learning algorithms and statistics regression simulation scenario analysis modelling clustering decision trees etc Strong knowledge of Java or Python and general software development skills source code management debugging testing deployment etc
Preferred Technical and Professional Expertise Bachelor s degree in Computer Science Mathematics Physics Computational Linguistics or related field
Experience with opensource distributed data processing frameworks such as SparkExperience working in a Linux environment
Experience working on a development team building product
Experience with presenting complex data science processes information to non-data scientists
Experience with Information Retrieval and relevant tools such as Lucene Elasticsearch SolrExperience with conducting projects from requirements generation annotation and modelling through NLP output deliverables and management of internal external clients
Prioritization skills ability to manage adhoc requests in parallel with ongoing projects