You are a highly experienced and versatile Lead Data Engineer sought to join the engineering team at our company. Your role will involve proficiently handling the full software development lifecycle, with a strong expertise in building microservices, Python, PySpark, and AWS cloud services. You will be responsible for developing scalable enterprise applications within a Microservices Architecture, integrating databases, and ensuring high code quality through rigorous testing and engineering best practices. **Key Responsibilities:**
Contribute across the entire technology stack: frontend, backend, and database layers. - Design and develop enterprise-grade services using Python, Pyspark, and AWS. - Implement microservices-based architecture to build scalable and maintainable solutions. - Apply software engineering principles to enhance the reliability, scalability, and maintainability of the codebase. - Incorporate automated testing and ensure robust code coverage as a part of daily development. - Collaborate closely with cross-functional teams in an Agile development environment. - Design and implement ETL/ELT pipelines for automated data extraction, transformation, and loading. - Develop parameterized queries, job configurations, and data ingestion logic for diverse datasets. - Build and maintain data orchestration workflows using tools like Apache Airflow. - Partner with the Full Stack team to integrate backend APIs with ETL jobs and job status monitoring. - Support integration and load testing, and optimize job execution across environments. - Work on metadata ingestion, job audit logging, and traceability for compliance. - Participate in production deployment, job monitoring, and troubleshooting post go-live. - Document data models, pipelines, and configurations; support training and knowledge transfer. **Must-Have Skills:**
10+ years of experience in data engineering and ETL development in complex enterprise environments. - Strong object-oriented programming knowledge. - Proficient in Python 3.6+. Strong debugging and performance tuning skills across the stack. - Strong experience in Python and Django for backend API development. - Hands-on experience with AWS services (EC2, S3, Lambda, etc). - Experience integrating and managing SQL databases. - Hands-on experience with Apache Airflow for workflow orchestration. - Hands-on experience with Apache Spark for big data processing and analytics. - Hands-on experience in building ETL pipelines with tools like Python, SQL, or similar. - Exposure to containerization and orchestration tools (e.g., Docker, Kubernetes). - CI/CD pipeline experience with tools like Jenkins, GitLab CI/CD, or AWS Code Pipeline. - Expertise in SQL Server: complex queries, stored procedures, performance tuning. - Familiarity with RESTful API integration for triggering and monitoring jobs. - Proficiency with job parameterization, scheduling, and configuration-driven pipelines. - Exposure to data validation, quality checks, and error handling mechanisms. - Experience with Git-based version control, CI/CD, and deployment processes. - Knowledge of metadata management and data cataloging concepts. - Familiarity with unit tests (Preferably with PYtest). - Proven track record of working in fast-paced, agile teams. Our benefits and rewards program at the company have been thoughtfully designed to recognize your skills and contributions, elevate your learning/upskilling experience, and provide care and support for you and your loved ones. As an Apexon Associate, you will receive continuous skill-based development, opportunities for career advancement, and access to comprehensive health and well-being benefits and assistance.