Role and responsibilities
· Understands the process flow and the impact on the project module outcome.
· Works on coding assignments for specific technologies basis the project requirements and documentation available
· Debugs basic software components and identifies code defects.
· Focusses on building depth in project specific technologies.
· Expected to develop domain knowledge along with technical skills.
· Effectively communicate with team members, project managers and clients, as required.
· A proven high-performer and team-player, with the ability to take the lead on projects.
· Design and create S3 buckets and folder structures (raw, cleansed_data, output, script, temp-dir, spark-ui)
· Develop AWS Lambda functions (Python/Boto3) to download Bhav Copy via REST API and ingest into S3
· Author and maintain AWS Glue Spark jobs to:
– partition data by scrip, year and month
– convert CSV to Parquet with Snappy compression
· Configure and run AWS Glue Crawlers to populate the Glue Data Catalog
· Write and optimize AWS Athena SQL queries to generate business-ready datasets
· Monitor, troubleshoot and tune data workflows for cost and performance
· Document architecture, code and operational runbooks
· Collaborate with analytics and downstream teams to understand requirements and deliver SLAs
Technical skills requirements
The candidate must demonstrate proficiency in,
· 3+ years’ hands-on experience with AWS data services (S3, Lambda, Glue, Athena)
· PostgreSQL basics
· Proficient in SQL and data partitioning strategies
· Experience with Parquet file formats and compression techniques (Snappy)
· Ability to configure Glue Crawlers and manage the AWS Glue Data Catalog
· Understanding of “serverless” architecture and best practices in security, encryption and cost control
· Good documentation, communication and problem-solving skills
Nice-to-have skills
· SQL Database
· Experience in Python (or “Rubby”) scripting to integrate with AWS services
· Familiarity with RESTful API consumption and JSON processing
· Background in financial markets or working with large-scale time-series data
· Knowledge of CI/CD pipelines for data workflows