InterviewStack.io LogoInterviewStack.io
Browse more Data Scientist jobs

Data Scientist

fa-ewjt-saasfaprod1

Bengaluru, Karnataka, India1 month ago
36 views10 saves5 applies

Prepare for this role


Job Type

full time

Description

  • Design and implement entity resolution and record linkage pipelines across multiple data sources
  • Build and evaluate matching algorithms using classical ML, statistical scoring, and fuzzy string-matching techniques
  • Develop attribute fusion logic to construct canonical golden records from conflicting multi-source data
  • Analyze data quality issues, document findings, and propose remediation strategies
  • Data Source Evaluation
  • Assess new external data sources (open and commercial) for coverage, quality, and applicability to Customer Master use cases
  • Apply existing evaluation criteria and contribute additional quality metrics where relevant
  • Produce structured evaluation reports with recommendations for adoption or rejection
  • Analytics & Reporting
  • Profile source datasets and track match quality metrics (precision, recall, F1, coverage)
  • Build dashboards and analytical summaries to communicate pipeline performance to stakeholders
  • Document data lineage, matching logic, and provenance for audit and reproducibility
  • Design and implement entity resolution and record linkage pipelines across multiple data sources
  • Build and evaluate matching algorithms using classical ML, statistical scoring, and fuzzy string-matching techniques
  • Develop attribute fusion logic to construct canonical golden records from conflicting multi-source data
  • Analyze data quality issues, document findings, and propose remediation strategies
  • Data Source Evaluation
  • Assess new external data sources (open and commercial) for coverage, quality, and applicability to Customer Master use cases
  • Apply existing evaluation criteria and contribute additional quality metrics where relevant
  • Produce structured evaluation reports with recommendations for adoption or rejection
  • Analytics & Reporting
  • Profile source datasets and track match quality metrics (precision, recall, F1, coverage)
  • Build dashboards and analytical summaries to communicate pipeline performance to stakeholders
  • Document data lineage, matching logic, and provenance for audit and reproducibility
  • Python - Pandas, NumPy, scikit-learn, rapidfuzz / jellyfish
  • SQL - Complex queries, window functions, aggregations; Hadoop/Hive or Presto/Trino
  • Classical ML & Statistics - Supervised/unsupervised models, probabilistic scoring, clustering, feature engineering
  • String matching & NLP - Fuzzy matching (Jaro-Winkler, Levenshtein, TF-IDF), text normalization, tokenization
  • Entity Resolution - Record linkage concepts: blocking, scoring, deduplication, cluster evaluation
  • Data Quality Assessment - Completeness, consistency, coverage metrics; source profiling
  • Data Analysis - Exploratory analysis, hypothesis testing, statistical reasoning

This job is found at InterviewStack.io

Skills

algorithmsanalyticsdashboardspythonpandasnumpyscikit-learnsqlhadoophiveprestotrinostatisticsnlpdata analysisdata qualityfeature engineeringhypothesis testingdata lineage