Backend AI & Data Pipeline Engineer at Seeka Technology
Job Description
About the role
We are looking for a Backend AI & Data Pipeline Engineer to own the end-to-end data processing infrastructure that powers Yuzee's intelligent course and job matching platform. You will design and maintain scalable, event-driven pipelines that process tens of thousands of daily records, generate semantic embeddings, and feed a growing knowledge graph used for personalised career pathway recommendations.
What you'll do
- Design and maintain three distinct processing pipelines — scheduled job ingestion, event-driven course processing, and a periodic knowledge graph builder — each with independent trigger logic and cost controls
- Generate and manage semantic embeddings via Amazon Bedrock (Titan v2), index them in MongoDB Atlas Vector Search, and calibrate similarity thresholds to ensure match accuracy
- Build and maintain a knowledge graph linking jobs, courses, skills, and industries using FP-Growth association rules and archetype-to-SOC code mapping
- Build and improve a two-stage discovery and matching API on AWS Lambda — vector retrieval first, then deep eligibility scoring with LLM re-ranking
- Right-size Fargate Spot instances and design resumable processing loops that tolerate interruption, keeping infrastructure costs under control as data volume scales
- Maintain and improve daily job scrapers across multiple sources and build institution data scrapers with robust HTML cleaning pipelines
What we're looking for
- 1+ years of backend engineering experience focused on data pipelines, ML infrastructure, or search systems
- Hands-on experience with AWS serverless and container services — Lambda, ECS Fargate, EventBridge, and Step Functions
- Strong Python skills — Pandas, async processing, bulk database operations, and text cleaning
- Familiarity with vector databases and semantic similarity search; MongoDB Atlas Vector Search experience is a strong plus
- Cost-conscious infrastructure mindset — you think in per-record compute costs, free tiers, Spot resilience, and right-sizing
- Ability to document and communicate complex architecture clearly to both technical and non-technical stakeholders
Nice to have
- Experience with knowledge graphs or association rule mining (FP-Growth, Apriori)
- Experience using LLMs for re-ranking or eligibility assessment on top of vector retrieval results
- Background in edtech, jobtech, or recommendation/matching systems
Degree or existing proven experience
Benefits
- You can work from home for the whole internship period
- A reference letter can be requested upon completion of internship
- A bit of flexibility with working time aside from the usual 9am to 6pm (Ex. 8am to 5pm / 7:30am to 4:30pm)
- The possibility of retainment for part-time or Full-time work post-internship based on your performance, even if you are not based in Malaysia
