We are looking for a PySpark developer (with ETL background) to be able to design and build solution on one of our customer programs. This to build a data standardized and curation layer that will integrate data across internal and external sources, provide analytical insights and integrate with customer's critical systems.Roles and Responsibilities-
Ability to design, build and unit test the application in Spark/Pyspark-
In-depth knowledge of Hadoop, Spark, and similar frameworks-
Ability to understand existing ETL logic to convert into Spark/PySpark-
Good implementation experience of oops concepts-
Knowledge of Unix shell scripting, RDBMS, Hive, HDFS File System, HDFS File Types, HDFS compression codec-
Experience in processing large amounts of structured and unstructured data, including integrating data from multiple sources-
Experience in working with Bitbucket and CI-CD process-
Have knowledge of the agile methodology for delivering the projects-
Good communication skillsSkills:-
Minimum 2 years of extensive experience in design, build and deployment of PySpark-based applications-
Expertise in handling complex large-scale Big Data environments-
Minimum 2 years of experience in the following: HIVE, YARN, HDFS-
Experience in working in ETL products e.g. Ab Initio, Informatica, Data Stage etc.-
Hands-on experience writing complex SQL queries, exporting, and importing large amounts of data using utilities
Location: PuneNote: We can even consider someone with good hands-on experience in Spark and Scala.