Data Engineer (Full-time, Remote) at Kata.ai


Company Logo

Kata.ai is Hiring

Job Info:
  • Company Kata.ai
  • Position Data Engineer (Full-time, Remote)
  • Location South Jakarta, Indonesia
  • Source SmartRecruiters
  • Published April 08, 2026
  • Category Data
  • Type Full-Time


Job Description

Design, build, and maintain scalable data pipelines, streaming infrastructure, and AI/ML data workflows that power data-driven products and enterprise AI solutions — ensuring reliable, timely, and high-quality data is available across the organization — so that AI Engineers, Product teams, and enterprise clients can make accurate, insight-driven decisions and deliver intelligent customer experiences through Kata's AI and voice platforms.

Qualifications & Education :

  • Bachelor's degree in Computer Science, Information Systems, Data Engineering, Statistics, or related field
  • Relevant certifications (GCP Professional Data Engineer, Databricks, Airflow/Astronomer, etc.) are a plus

Technical Skills :

  • Streaming: Apache Kafka — topic design, consumer groups, partitioning strategy, and real-time event processing
  • Batch Orchestration: Apache Airflow — DAG design, scheduling, dependency management, and failure handling
  • Distributed Processing: Apache Spark — batch and micro-batch transformations, DataFrame API, optimization
  • Data Warehousing: Google BigQuery (primary); Apache Hive for large-scale batch analytics
  • NoSQL / Wide-Column: Apache Cassandra — data modeling for high-write, time-series, and event-driven workloads
  • Languages: Python (required); SQL (required); Scala is a plus
  • Cloud: GCP — BigQuery, Dataflow, Cloud Storage, Pub/Sub, Vertex AI Pipelines; Azure is a plus
  • Containerization: Docker; basic Kubernetes for deploying data services
  • CI/CD: GitLab CI, GitHub Actions, or equivalent for pipeline deployment automation
  • Data Quality: Great Expectations, dbt tests, or custom validation frameworks
  • Monitoring: Prometheus, Grafana, or GCP Monitoring for pipeline observability; alerting on SLA breaches
  • Version Control: Git with feature branching and pull request workflow

Experience : 

Associate Level (1–2 years)

  • 1–2 years of professional experience in data engineering, software engineering with data focus, or a related technical role
  • Hands-on experience building or maintaining data pipelines in a production environment
  • Practical exposure to at least one streaming or batch processing technology (Kafka, Spark, or Airflow)
  • Familiarity with SQL and relational or columnar databases (BigQuery, PostgreSQL, Hive, or equivalent)
  • Exposure to cloud data services on GCP or Azure
  • Experience working in Agile/Scrum teams with sprint-based delivery


Mid Level (3–5 years)

  • 3–5 years of professional experience in data engineering, with at least 2 years building and operating production-grade pipelines
  • Proven hands-on experience with Apache Kafka for real-time event streaming — including topic design, consumer group management, and at-least-once/exactly-once delivery patterns
  • Demonstrated experience designing and maintaining batch workflows using Apache Airflow and large-scale data transformations with Apache Spark
  • Experience working with BigQuery and/or Hive for large-scale analytics workloads, including query optimization and partitioning strategies
  • Hands-on experience with Cassandra or similar NoSQL wide-column stores for high-write or time-series data use cases
  • Experience supporting AI/ML data pipelines — feature engineering, training dataset preparation, or model inference data feeds
  • Experience with data quality frameworks and implementing data observability practices in production environments

We value a flexible working hour for our employees.

The most important is we provide a learning experience in Conversational AI Industry.


✉️