InterviewStack.io LogoInterviewStack.io
Browse more Applied Scientist jobs

Applied Scientist 3

Oracle

BENGALURU, KARNATAKA, India2 weeks ago
6 views3 saves0 applies

Prepare for this role


Benefits

Remote Work

Job Type

full time

Description

  • Design and build data-centric GenAI methods for synthetic data generation, multimodal data curation, data augmentation, filtering, deduplication, and quality assessment.

  • Develop and evaluate synthetic data pipelines for text, speech, vision, and multimodal GenAI use cases, including controllable generation, provenance tracking, safety checks, and domain adaptation.

  • Build evaluation frameworks that connect data quality to downstream GenAI model performance, including benchmark design, ablation studies, error analysis, and model-feedback loops.

  • Research and implement modern generative AI techniques, including LLM/VLM-based data generation, fine-tuning, instruction tuning, preference optimization, and model-based data labeling.

  • Build scalable data and ML pipelines for acquisition, cleaning, transformation, metadata extraction, embedding generation, labeling, training, and evaluation.

  • Develop production-quality code for batch and real-time ML workflows, including model inference, feature processing, data validation, monitoring, and operational automation.

  • Translate research papers and emerging GenAI techniques into practical systems that improve data quality, model quality, and customer-facing AI outcomes.

  • Partner with modeling, product, infrastructure, and domain teams to define GenAI data requirements, quality bars, evaluation criteria, and delivery plans.

  • Operate across the full lifecycle: research, prototyping, experimentation, productionization, testing, CI/CD, monitoring, runbooks, and production support.

  • Ph.D. degree, Master's degree, or equivalent experience in computer science, artificial intelligence, machine learning, operations research, statistics, or a related technical field.

  • 5+ years with a Master's degree or 3+ years with a Ph.D. applying machine learning to real-world problems.

  • Strong Python programming skills and experience building production-quality ML, GenAI, or data systems.

  • Hands-on experience with PyTorch and modern deep learning stacks; experience with Hugging Face, LLMs, VLMs, diffusion models, or multimodal models is strongly preferred.

  • Experience with data-centric AI or GenAI methods such as synthetic data generation, data quality measurement, dataset curation, weak supervision, model-based labeling, active learning, deduplication, or data augmentation.

  • Experience designing experiments and interpreting results through statistical analysis, ablation studies, benchmark evaluation, and error analysis.

  • Strong understanding of model training, inference, evaluation, and production monitoring.

  • Ability to read research papers, identify practical value, and implement useful techniques in real systems.

  • Strong written and verbal communication skills, including technical proposals, design documents, experiment reports, and stakeholder presentations.

  • Experience building scalable data or ML pipelines using distributed compute, cloud storage, batch processing, or workflow orchestration.

Career Level - IC3

This job is found at InterviewStack.io

Skills

generative aidata pipelinesllmmonitoringprototypingci/cdmachine learningstatisticspythonpytorchdeep learningllmsdata qualitystatistical analysismodel trainingfine tuningexperimentationml pipelinesbatch processing

About Oracle

Oracle offers integrated suites of applications plus secure, autonomous infrastructure in the Oracle Cloud.

software, cloud computingWebsite