Responsibilities:
1. Collaborate with the data engineering team to design, develop, and maintain data pipelines and ETL (Extract, Transform, Load) processes
2. Utilize Python programming language and Pandas library to implement efficient data manipulation and transformation tasks
3. Work closely with stakeholders to understand data requirements and ensure data integrity, quality, and reliability
4. Develop and optimize SQL queries, particularly in Impala SQL, to query and analyze large datasets stored in distributed systems.
5. Assist in troubleshooting data-related issues and performance bottlenecks in data processing pipelines
6. Contribute to the documentation of data processes, including data dictionaries, data lineage, and workflow diagrams
7. Stay updated on emerging technologies and best practices in data engineering and apply them to improve existing processes and systems
Key Skills:
1. Proficiency in Python programming language, with experience in data manipulation and analysis
2. Strong understanding of data engineering concepts, including data pipelines, ETL processes, and data warehousing
3. Experience with Impala SQL or similar SQL dialects for querying and analyzing large datasets
4. Familiarity with data manipulation libraries in Python, such as Pandas, NumPy
5. Ability to work independently and collaboratively in a fast-paced environment, with excellent problem-solving and communication skills
6. Attention to detail and a commitment to delivering high-quality, scalable data solutions
Skill(s) required
Python
Who can apply
Only those candidates can apply who:
1. are available for full time (in-office) internship
2 have relevant skills and interests
Number of openings
1