WHAT DOES A DATA ENGINEER DO?

Published: Jun 03, 2025 - The Data Engineer develops and optimizes cloud-based data pipelines for Janssen’s Commercial Omnichannel operations, applying advanced knowledge in data engineering technologies. This position collaborates with cross-functional teams to enhance data architecture, support ML model deployment, and engineer features for accurate and scalable data solutions. This role drives innovation by identifying new data sources, improving ML model performance, and building long-term standards using tools such as AWS, PySpark, Python, and Dataiku.

A Review of Professional Skills and Functions for Data Engineer

1. Data Engineer Duties

  • Pipeline Maintenance: Maintain ETL and streaming pipelines while extending them by implementing new features.
  • System Monitoring: Set up monitoring and alarm systems to control the health of pipelines and ensure stability and that data quality standards are met.
  • Infrastructure Development: Maintain and develop data infrastructure.
  • Architecture Advising: Discuss and advise on architectural and implementation decisions, as well as best practices.
  • Integration Support: Support the engineering team in integrating the products with the data services and databases.
  • Cross-functional Collaboration: Collaborate and align with data engineers, data analysts, and software developers, creating value and, occasionally, working cross-functionally with Customer Success, Customer Support, Marketing, and Sales teams.
  • Solution Design: Design and implement technical solutions to obtain, process, and store data.
  • Process Automation: Automate and optimize internal ETL processes in MS Azure.
  • Skill Development: Get access to the comprehensive training portfolio, including MS Certifications, Coursera, and other professional courses.
  • Community Learning: Learn from the global P&G IT community to advance skills.
  • Knowledge Sharing: Share best practices with the local IT community by delivering training sessions and other learning opportunities.

2. Data Engineer Details

  • Cloud Pipelines: Develop data engineering cloud pipelines, using knowledge of Janssen Commercial sales and marketing Omnichannel data
  • Domain Expertise: Serve as a domain specialist in Data Engineering technologies and bring innovative ideas to develop, test, and measure the impact and performance of key business initiatives
  • Architecture Optimization: Understand existing pipelines and data science models and architecture related to Janssen’s Omnichannel brands, provide recommendations and suggestions to optimize data architecture, and make suggestions for performance improvement
  • DevOps Transition: Build long-term practices and standards that will allow the team to easily transition production models to ML DevOps, so that model performance can be measured and data accuracy is guaranteed
  • Feature Engineering: Partner with data scientists on feature engineering in an effort to build data models that can provide raw, curated, and processed data to downstream teams
  • Collaborative Solutions: Collaborate with other data engineers, ML specialists, and partners from multiple therapeutic areas to implement findings as they arise to develop solutions that break the status quo
  • Data Sourcing: Proactively identify new data sources that will improve decision-making and increase model accuracy and consistency
  • Scalable Engineering: Collaborate with data engineers and data scientists to build scalable data engineering and data science solutions using AWS (S3, EC2, EMR, Amazon Redshift), PySpark, Python, and Dataiku
  • Cloud Architecture: Assist in developing architectural models for cloud-based solutions using AWS to support data science and ML products
  • Model Innovation: Foster innovation by improving ML model performance, efficiency, and infrastructure through experimentation and testing, leading to creative solutions

3. Data Engineer Responsibilities

  • Solution Design: Collaborate with team members to conceptualize, design, and deliver enterprise and departmental data solutions to support business intelligence, data warehousing, reporting, and machine learning requirements
  • Scalable Implementation: Implement solutions that are reliable and scalable to meet the service levels associated with mission-critical solutions
  • DevOps Practice: Participate in and enhance DevOps practice to ensure highly available solutions and quick issue resolution
  • Pipeline Translation: Translate business requirements into data pipelines and data stores to support business requirements
  • Tool Assessment: Perform assessments (Proof of Concepts) of the latest tools and technologies
  • Migration Strategy: Work with Data and Solution Architects to define and implement migration strategies from legacy systems to cloud architecture and technologies
  • Team Feedback: Provide team feedback to optimize the delivery of solutions
  • Data Requirements: Collection, evaluation, and documentation of business data requirements, defining data standards and guidelines for business
  • System Integration: Develop and manage connections to enterprise systems, edge devices, and cloud systems (using existing APIs or creating new ones)
  • Data Automation: Build algorithms and programs to automate data collection, aggregation, segregation, and transformation
  • Data Preparation: Prepare data for prescriptive and predictive modelling to help support the development of analytics tools and programs

4. Data Engineer Job Summary

  • Business Understanding: Develop a strong understanding of the business, strategic direction, overall organizational goals, and individual user needs
  • Proof Implementation: Work with developers to design and implement proof of concept solutions and create advanced applications to deliver on the desired customer experience
  • Team Collaboration: Drive effective teamwork, communication, collaboration, and adoption to achieve desired outcomes and realize benefits
  • Mentorship Practice: Act as a Coach and Mentor to drive Automation, DevOps, and industry best practices
  • Analytical Curiosity: Demonstrate intellectual curiosity and passion about data, visualization, and solving business problems, staying ahead of current trends related to analytics and visualization technologies
  • Development Oversight: Provide oversight and management of development activities and ensure standards are being adhered to
  • Technology Adaptation: Adapt to new technologies and languages to focus on areas of demand
  • System Provisioning: Provide the data and systems used by Data Analysts and Data Scientists
  • Requirement Review: Collaborate with the Product Owners and Business Partners to review product requirements and user stories, propose viable implementation options, and define specific software tasks
  • Technology Research: Research new and evolving technologies and develop prototypes
  • Agile Delivery: Work in an agile development team to iteratively deliver system updates and product releases
  • Problem Solving: Create elegant, well-tested solutions to complex problems

5. Data Engineer Functions

  • Team Leadership: Lead the local team of data engineers, data analysts, and software developers, provide them technical guidance, and ensure that the team understands and follows global Tesla best practices and technical standards
  • Global Collaboration: Work with the global Data Analytics team on setting the direction of data infrastructure and contribute to the implementation together with the team in Giga Berlin
  • Stakeholder Engagement: Build close relationships with key business stakeholders to understand long-term business needs that will set the agenda for Data Analytics
  • Pipeline Development: Be a hands-on contributor in building data pipelines using Airflow for Vertica at design, development, maintenance, and support of the Enterprise Data Warehouse & BI platform
  • Streaming Solutions: Create real-time data streaming and processing solutions using open-source technologies like Kafka, Spark, etc.
  • Code Review: Establish a healthy peer review culture, organize, and facilitate design and code reviews
  • Technology Awareness: Keep up to date on relevant technologies and frameworks
  • Network Management: Manage IT/OT network jump hosts to facilitate operational data transfer to enterprise systems or vice-versa
  • Project Communication: Communicate and present project value and work progress to stakeholders and senior leadership
  • Issue Resolution: Respond quickly to bug fixes and enhancement requests and be able to take directions/complete tasks on time with minimal supervision
  • Data Accuracy: Ensure that the business-critical data is accurate and correct
  • Solution Building: Build solutions where the problem is often ill-defined