Role - HPC + Linux
Base Location - Pune/Bangalore/Chennai
Site Location -Mandi Bangalore Roorkee Gandhinagar Trichy Mohali
Mode of Hire- Permanent
Educational Qualification - Only B.E/B.Tech or MCA
Responsibilities / Technical skills
Monitoring, management and optimisation of the data centre facility and HPC clusters (including hardware and software)
Provide user support for technical issues, application queries, data management, etc.
System administration of HPC system and Developments.
Operational/Schedule maintenance of servers and HPC system.
Hardware Monitoring, Trouble shooting, replacement and
Installed software trouble shooting, patch updates, Customer application installation
Regular node health check including analysis of performance, temperature monitoring.
Monitoring the customer (premises) water loop performance aspects like flow, temp, blockages etc.
Infiniband , Ethernet troubleshooting including Cables, Controllers, Drivers, IP address clashes, reassignment etc.
Lustre storage maintenance and backup policies.
Documentation of the HPC environment as well as documenting system administration policies and procedures (Weekly Report Generation).
Experience in administering Red Hat/Cent OS Linux Systems, including the installation, configuration and maintenance of Linux services (DNS, DHCP, LDAP, NFS, NTP, etc),and Linux networking (TCP/IP protocols).
Hands on hardware and software troubleshooting experience (preferably in large-scale high-performance computing environments).
Experience in Linux kernel modules, preferably for NVIDIA GPUs and mellonox infiniband cards.