Build, deploy and manage Data Warehouse solutions that can adequately handle the needs of a rapidly growing data driven company.
Lead the development of systems, architectures, and platforms that can scale to the 3 Vs of Big data (Volume, Velocity, Variety).
Build out scalable and reliable ETL pipelines and processes to ingest data from a large number and variety of data sources.
Maintain and optimize the performance of our data analytics infrastructure to ensure accurate, reliable, and timely delivery of key insights for decision making.
Streamline data access and security to enable data scientists and analysts to easily access to data whenever they need to.
Bachelor's degree in Computer Science, Software Engineering, Information Technology, Electrical Engineering, or other quantitative fields.
Strong Programming in SQL and Python.
Deep understanding of databases and engineering practices – include building deterministic pipelines, human-fault-tolerant pipelines, understanding how to scale up, handling and logging errors and data cleansing.
Excellent communication skills to coordinate development of data pipelines, and or any new products features that can be built on top of the results of data analysis.
Experience with Hadoop, Delta Lake, Apache Spark, Apache Airflow, Apache Hive, Presto / Trino will be a plus point.
Experience in handling large data sets (hundreds of TBs) and working with structured datasets will be a plus point.
Available to attend min. six months internship program, with a time commitment of approximately 40 hours per week (Mon-Fri) and committed to work and be basedâ¦