Data Engineer Job at VDart Inc, Remote

MlUvZDg3TUhVMVZXRXkrOUVPZHlBL1pPYmc9PQ==
  • VDart Inc
  • Remote

Job Description

Title: Data Engineer

Location: Remote

Duration: 6 Months

Work Description:

We are in the process of migrating off CRMA Data Manager by rewriting queries and implementing the required data transformations in AWS. This platform modernization effort includes working through a backlog of datasets that must be migrated to AWS and transformed to meet current and future reporting needs.

Business Knowledge:

Limited business knowledge is needed.

Technical Skills:

Must-Have Technical Skills:

  • AWS Data Services (Hands-on)
  • S3: Data lake design, partitioning strategies, lifecycle management
  • IAM: Roles & policies, least-privilege access, cross-account access
  • Glue / EMR: Crawlers, Data Catalog, ETL job development
  • Athena: Querying data lakes with performance and cost optimization
  • Lake Formation: Basic governance and permission management

Compute & Processing

  • Apache Spark (PySpark): Batch processing, performance tuning, joins, partitioning
  • Python: Production-grade coding (packaging, testing, logging, type hints)
  • SQL: Advanced querying (window functions, query optimization, data modeling support)

Orchestration & Scheduling

  • Airflow / MWAA / AWS Step Functions
  • DAG design
  • Retry mechanisms
  • SLA management
  • Backfills
  • Data Warehousing & Modeling
  • Redshift / Snowflake (on AWS): Fundamentals and performance considerations
  • Dimensional Modeling: Star/Snowflake schema design

ETL/ELT Patterns:

  • CDC (Change Data Capture)
  • SCD (Slowly Changing Dimensions)
  • Idempotent data pipelines
  • Data Reliability & Observability
  • Data quality frameworks: Great Expectations / Deequ (or equivalent)
  • Data reconciliation & validation
  • Monitoring & observability: CloudWatch logs, metrics, alerts

DevOps & Delivery

  • Version Control: Git, branching strategies, code reviews
  • CI/CD: Data pipeline automation (e.g., GitLab CI/CD)
  • Infrastructure-as-Code: OpenTofu / CloudFormation for AWS resource deployment

Security & Compliance

  • Encryption: At rest & in transit (KMS)
  • Secrets management: AWS Secrets Manager / SSM
  • Networking fundamentals: VPC, private subnets, endpoints (data access control)

Role Expectations (Hands-on Experience Required):

  • Designed, developed, and maintained production-grade ETL pipelines using AWS Glue (PySpark)
  • Built scalable data ingestion pipelines from S3, databases, and streaming sources into S3 data lakes
  • Implemented complex transformations and joins in PySpark, optimizing performance (partitioning, broadcast joins, caching)
  • Developed incremental and idempotent pipelines, including handling CDC and SCD
  • Automated schema discovery using Glue Crawlers and Data Catalog
  • Tuned Glue Spark jobs for performance, concurrency, and cost efficiency
  • Integrated pipelines with orchestration tools like Airflow (MWAA) or Step Functions
  • Collaborated with data teams to load curated data into Redshift / Snowflake / Iceberg for analytics
  • Implemented data quality checks using built-in validations or tools like Great Expectations / Deequ
  • Applied AWS security best practices (IAM roles, KMS encryption, secure data access)
  • Contributed to CI/CD pipelines for Glue job deployment using Git and IaC tools
  • Monitored pipelines using CloudWatch, ensuring reliability and quick incident resolution
  • Worked closely with stakeholders to define data contracts, SLAs, and business expectations

Key Skills: Data Engineer, AWS Glue, IAM, ETL, Athena, PySpark

Job Tags

Full time

Similar Jobs

BJC Healthcare

Ambulatory Clinical Triage Nurse - Cardiology Job at BJC Healthcare

 ...Not all benefits apply to all jobs The above information on this description has been designed to indicate the general nature and level of work performed by employees in this position. It is not designed to contain or be interpreted as an exhaustive list of all... 

Eagle Rock Distributing

Lead CDL Delivery Driver Job at Eagle Rock Distributing

 ...our employees and loyal customers.Eagle Rock entered the Colorado market in late-2020, assuming distribution operations for Anheuser-Busch products across the state. Today, we have over 500 employees in the state, operating out of 7 locations, and continue to welcome... 

6084-Janssen Research & Development Legal Entity

Associate Scientist, Lentivirus Upstream Process Development Job at 6084-Janssen Research & Development Legal Entity

 ...At Johnson & Johnsonwe believe health is everything. Our strength in healthcare innovation empowers us to build aworld where complex diseases are prevented treated and curedwhere treatments are smarter and less invasive andsolutions are our expertise in Innovative Medicine... 

Johnson & Johnson

Sr Director Head of GMS Safety Analytics Job at Johnson & Johnson

 ...At Johnson & Johnsonwe believe health is everything. Our strength in healthcare innovation empowers us to build aworld where complex diseases are prevented treated and curedwhere treatments are smarter and less invasive andsolutions are our expertise in Innovative Medicine... 

EnviroWaste Services Group

Superintendent - Storm/Sewer & Underground Utilities Job at EnviroWaste Services Group

Join Envirowaste and Make a Difference Every Day! At Envirowaste, we are committed to making a positive impact on our environment. With over 20 years of experience in storm and wastewater infrastructure service, support, and consultation, our professional teams are ...