• +91 9902461116
  • +91 9742461116
  • enquiry@cambridgeinfotech.io

Top 50 Machine Learning Interview Questions in India 2026

June 12, 2026
Top 50 Machine Learning Interview Questions in India 2026 featured image

Top 50 Machine Learning Interview Questions 2026 — With Detailed Answers

By Cambridge Infotech  |  Updated June 2026  |  20 min read  |
Verified by industry hiring professionals

Quick Answer

The most-asked ML interview questions in India cover: bias-variance tradeoff, overfitting/underfitting, cross-validation, precision vs recall, gradient descent, regularisation, feature engineering, handling imbalanced data, and model evaluation metrics. Fresher interviews focus on theory + Python (Pandas, Scikit-learn). Senior interviews add system design, MLOps, and business case studies.

Whether you are preparing for your first data science job at TCS or Infosys, or targeting a senior ML role at Flipkart or Google India, this guide covers the machine learning interview questions that actually get asked in Indian companies in 2026 — not generic US-centric lists.

Each question includes the answer hiring managers expect, the level it applies to, and common mistakes candidates make. Use this guide alongside the Machine Learning course at Cambridge Infotech to build both theory and practical skills.

Typical ML Interview Structure in Indian Companies (2026)

Round 1
Online Screening
MCQs: Python, ML, SQL
20–40 mins
Round 2
Technical Round 1
ML Theory & Stats
45–60 mins
Round 3
Coding + Case Study
Python + problem solving
60–90 mins
Round 4
HR + Managerial
Salary, culture, career
30–45 mins

Q1–10: Machine Learning Fundamentals

Q1. What is the bias-variance tradeoff?

FresherVery common

Expected answer:

Bias is the error from wrong assumptions in the learning algorithm — a high-bias model is too simple and underfits the data (e.g., linear regression on non-linear data). Variance is the sensitivity to small fluctuations in the training set — a high-variance model overfits and performs poorly on new data.

The tradeoff: reducing bias typically increases variance and vice versa. The goal is to find the sweet spot with low bias and low variance — achieved through techniques like cross-validation, regularisation, ensemble methods, and appropriate model complexity.

Common mistake: Saying “just use a complex model to reduce bias” without addressing the overfitting risk that comes with it.

Q2. What is overfitting? How do you prevent it?

FresherAlways asked

Overfitting occurs when a model learns the training data too well — including noise — and fails to generalise to new, unseen data. Signs: training accuracy is very high but test/validation accuracy is significantly lower.

Prevention methods:

  • Regularisation — L1 (Lasso) penalises coefficients to zero; L2 (Ridge) shrinks coefficients
  • Cross-validation — k-fold CV gives a more reliable performance estimate
  • Early stopping — stop training when validation loss starts increasing
  • Dropout — randomly deactivates neurons during neural network training
  • More training data — the most reliable fix when available
  • Reduce model complexity — fewer layers, shallower trees, fewer features
Common mistake: Only listing one technique. Interviewers want to see you understand multiple approaches and when to use each.

Q3. What is the difference between supervised, unsupervised, and reinforcement learning?

Fresher

Type Training data Goal Examples
Supervised Labelled (input + output) Predict output for new inputs Linear regression, SVM, decision trees
Unsupervised Unlabelled (input only) Find hidden patterns/structure K-means, PCA, DBSCAN
Reinforcement Reward/penalty signals Maximise cumulative reward AlphaGo, game AI, robotics

Q4. Explain cross-validation and why it is used.

FresherFrequently asked

Cross-validation is a resampling technique used to evaluate how well an ML model generalises to independent data. The most common method is k-fold cross-validation: the dataset is split into k equal subsets (folds). The model is trained on k-1 folds and tested on the remaining fold. This process repeats k times, each time using a different fold as the test set. The final performance metric is the average across all k iterations.

Why it matters: A single train-test split can give misleadingly good or bad results depending on which samples end up in the test set. k-fold CV provides a more reliable, less biased estimate of model performance — especially important with small datasets.

Q5. What is gradient descent? Explain its variants.

Mid-levelAlways asked

Gradient descent is an optimisation algorithm used to minimise the loss function of an ML model by iteratively moving in the direction of steepest descent (negative gradient). At each step: θ = θ − α × ∇J(θ) where θ are model parameters, α is the learning rate, and ∇J(θ) is the gradient of the cost function.

Variant Batch size Speed Use when
Batch GD All data Slow Small datasets
Stochastic GD (SGD) 1 sample Noisy but fast Online learning
Mini-batch GD 32–512 samples Balanced Most DL training (standard)
Follow-up you must be ready for: “What happens if the learning rate is too high/too low?” Too high → overshoots minimum, loss diverges. Too low → very slow convergence or gets stuck in local minima.

Q6–10: Quick-reference fundamentals

Q6. What is regularisation? Difference between L1 and L2?

L1 (Lasso): Adds absolute value of coefficients to loss. Drives some weights to exactly zero → automatic feature selection. Good for sparse models. L2 (Ridge): Adds squared value of coefficients. Shrinks weights toward zero but rarely to exactly zero → keeps all features but reduces their impact. Use L2 when all features matter; L1 when you suspect many are irrelevant.

Q7. What is the difference between classification and regression?

Classification: Predicts discrete category labels (spam/not spam, cat/dog). Metrics: accuracy, precision, recall, F1, AUC-ROC. Regression: Predicts continuous numerical values (house price, temperature). Metrics: MAE, MSE, RMSE, R². Key distinction: output type determines which algorithms and metrics apply.

Q8. What is a confusion matrix? How do you read it?

A confusion matrix shows actual vs predicted classes. For binary classification: TP (correctly predicted positive), TN (correctly predicted negative), FP (incorrectly predicted positive — Type I error), FN (incorrectly predicted negative — Type II error). From it you derive: Precision = TP/(TP+FP), Recall = TP/(TP+FN), F1 = 2×(P×R)/(P+R).

Q9. When would you use precision vs recall as your primary metric?

Precision when false positives are costly — e.g., spam detection (marking a genuine email as spam is bad). Recall when false negatives are costly — e.g., cancer screening (missing a cancer case is far worse than a false alarm). In most real-world cases, use F1-score (harmonic mean) or AUC-ROC for a balanced view.

Q10. What is the curse of dimensionality?

As the number of features increases, the volume of feature space grows exponentially — making data increasingly sparse and making distance-based algorithms (KNN, SVM with RBF kernel) perform poorly. Fix: feature selection (remove irrelevant features), dimensionality reduction (PCA, t-SNE), or regularisation. In Indian interviews, this is often followed by “how would you apply PCA to this problem?”

Q11–20: ML Algorithms

Q11. How does a decision tree work? What are its pros and cons?

Fresher

A decision tree splits data recursively based on feature values to maximise information gain (or minimise Gini impurity). At each node it asks a yes/no question about a feature; at leaf nodes it assigns a class label or value.

✓ Pros

  • Easy to interpret and visualise
  • No feature scaling needed
  • Handles both numerical and categorical data

✗ Cons

  • Prone to overfitting (deep trees)
  • Unstable — small data changes = different tree
  • Biased toward features with more levels

Q12. What is Random Forest? Why is it better than a single decision tree?

FresherVery common

Random Forest is an ensemble method that builds multiple decision trees on different random subsets of the data (bagging) and random subsets of features at each split. Predictions are made by majority vote (classification) or averaging (regression).

Why it is better: Individual trees overfit; averaging their predictions reduces variance without increasing bias. The randomness in feature selection also de-correlates the trees, making the ensemble more robust. In practice, Random Forest consistently outperforms single decision trees on most tabular datasets.

Q13–20: Algorithm quick answers

Q13. What is gradient boosting? How does XGBoost differ?

Gradient boosting builds trees sequentially — each tree corrects the errors of the previous. Unlike Random Forest (parallel trees), it is sequential and slower but often more accurate. XGBoost adds: second-order derivatives for better optimisation, built-in regularisation (L1+L2), column and row subsampling, and parallel processing — making it 10–50x faster than vanilla gradient boosting.

Q14. Explain SVM (Support Vector Machine) in simple terms.

SVM finds the hyperplane that best separates classes by maximising the margin — the distance between the hyperplane and the nearest data points from each class (support vectors). For non-linearly separable data, the kernel trick (RBF, polynomial) projects data into higher dimensions where separation is possible. SVM works well on high-dimensional data and small datasets but is slow on very large datasets.

Q15. What is logistic regression? Why is it called “regression” if it is a classifier?

Logistic regression predicts the probability that an input belongs to a class using the sigmoid function: σ(z) = 1/(1+e⁻ᶻ). It is called “regression” because it models a linear relationship between features and the log-odds (logit) of the outcome — the regression happens in log-odds space. A threshold (typically 0.5) converts probability output to a binary class prediction.

Q16. What is K-Nearest Neighbours (KNN)? What are its limitations?

KNN classifies a new data point based on the majority class among its k nearest neighbours (by Euclidean or other distance). Limitations: computationally expensive at prediction time (computes distances to all training points), sensitive to irrelevant features and feature scale (requires normalisation), and suffers severely from the curse of dimensionality with high-dimensional data.

Q17. Explain k-means clustering. How do you choose k?

K-means assigns n data points to k clusters by minimising within-cluster variance. Steps: (1) initialise k centroids randomly, (2) assign each point to the nearest centroid, (3) recompute centroids, (4) repeat until centroids stabilise. Choosing k: use the Elbow Method — plot inertia (within-cluster sum of squares) vs k and look for the “elbow” where improvement slows. Alternatively, use the Silhouette Score to measure cluster cohesion.

Q18. What is PCA (Principal Component Analysis)? When do you use it?

PCA reduces dimensionality by projecting data onto the directions (principal components) of maximum variance, ordered by explained variance. It decorrelates features and retains the most important structure. Use when: features are highly correlated, you need to visualise high-dimensional data (reduce to 2D/3D), or you want to speed up training by reducing feature count. Important: PCA makes the model less interpretable — don’t use it if explainability matters.

Q19. What is a neural network? Explain forward and backward propagation.

Forward propagation: Input data passes through weighted connections and activation functions layer by layer until the output layer produces a prediction. Loss is calculated by comparing prediction to actual label. Backward propagation: The gradient of the loss is computed with respect to each weight using the chain rule, flowing backward through the network. Weights are updated using gradient descent to reduce the loss.

Q20. What is the vanishing gradient problem?

In deep networks, gradients become exponentially smaller as they propagate backward through layers with sigmoid or tanh activations — early layers learn very slowly or not at all. Solutions: Use ReLU activation (gradients do not saturate for positive values), use batch normalisation, use residual connections (ResNets), or initialise weights with Xavier/He initialisation.

Q21–28: Model Evaluation Metrics

Q21. What is AUC-ROC? How do you interpret it?

The ROC (Receiver Operating Characteristic) curve plots True Positive Rate vs False Positive Rate at all classification thresholds. AUC (Area Under the Curve) = probability that the model ranks a random positive example higher than a random negative example. AUC = 1.0 is perfect; AUC = 0.5 is random guessing. AUC is threshold-independent and works well for imbalanced datasets.

Q22. What is the difference between MAE, MSE, and RMSE?

MAE (Mean Absolute Error): Average of absolute differences. Robust to outliers. Easy to interpret (same units as target). MSE (Mean Squared Error): Average of squared differences. Penalises large errors heavily. Not in same units as target. RMSE (Root MSE): Square root of MSE — same units as target, still penalises large errors. Use RMSE when large errors are especially bad; MAE when all errors should be treated equally.

Q23. How do you handle imbalanced datasets? (Very common in Indian interviews)

Resampling techniques: Oversampling minority class (SMOTE — Synthetic Minority Oversampling Technique) or undersampling majority class. Algorithm-level: Use class weights in the algorithm (class_weight=’balanced’ in Scikit-learn). Metric choice: Use F1, AUC-ROC, or PR curve instead of accuracy (accuracy is misleading for imbalanced data — a model predicting all majority class can get 95% accuracy but useless). Real answer expected: Mention all three approaches and state when you would use each.

Q24. What is R² (R-squared) in regression? What are its limitations?

R² measures the proportion of variance in the target explained by the model (1 = perfect, 0 = model no better than mean). Limitations: Always increases when you add more features — even irrelevant ones — so use Adjusted R² for comparing models with different numbers of features. R² alone does not indicate whether predictions are accurate in absolute terms — a model with R²=0.9 could still be far off in real units.

Q25. Explain the difference between Type I and Type II errors.

Type I error (False Positive): Model predicts positive when it is actually negative. Example: flagging a genuine email as spam. Cost = unnecessary action. Type II error (False Negative): Model predicts negative when it is actually positive. Example: missing a fraud transaction. Cost = missed detection. Which matters more depends on business context — the interviewer often asks you to give an example from a real domain like healthcare or finance.

Q26. What is Silhouette Score? When do you use it?

Silhouette Score measures how similar a data point is to its own cluster compared to other clusters — ranges from -1 (misclassified) to +1 (well-clustered). Used to evaluate the quality of unsupervised clustering (K-means, DBSCAN) when there are no ground truth labels. Score > 0.5 = reasonable clustering; > 0.7 = strong clustering.

Q27. What is data leakage? How do you detect and prevent it?

Data leakage occurs when information from outside the training set is used to create the model — giving unrealistically high performance during training/validation but poor performance in production. Common causes: using future data in time-series models, applying scaler/imputer fit on the full dataset before splitting, or including the target variable’s proxy in features. Prevention: always fit preprocessing only on training data, use pipelines in Scikit-learn, use time-aware train-test splits for temporal data.

Q28. What is the log loss (cross-entropy loss)? Why is it used in classification?

Log loss penalises confident wrong predictions heavily. Formula: -[y·log(p) + (1-y)·log(1-p)]. If the model is 90% confident about the wrong class, it suffers much more than if it were 60% confident. This makes log loss better at training probabilistic classifiers than accuracy (which only cares if the prediction is right or wrong, not how confident it was).

Q29–35: Feature Engineering

Q29. How do you handle missing values in a dataset?

Deletion: Drop rows (if <5% missing) or columns (if >50% missing and not critical). Imputation: Mean/median for numerical (median for skewed data), mode for categorical. Advanced: KNN imputation (uses similar rows), iterative imputation (predicts missing values from other features), or keeping a “missing” indicator column (sometimes missingness itself is informative). Always impute after train-test split to prevent leakage.

Q30. What is feature scaling? When is it necessary?

Normalisation (MinMaxScaler): Scales features to [0,1]. Use when you know the distribution is not Gaussian. Standardisation (StandardScaler): Scales to zero mean, unit variance. Use when distribution is approximately Gaussian. When necessary: Gradient descent-based models (linear regression, neural networks), distance-based models (KNN, SVM, KMeans). Not necessary: Tree-based models (decision trees, Random Forest, XGBoost) — they are scale-invariant.

Q31. How do you handle categorical variables in ML?

Label encoding: Assigns integer to each category. Only use for ordinal features (low/medium/high) or tree models — linear models will assume false ordering. One-hot encoding: Creates binary columns for each category. Use for nominal features with linear models. Can cause high dimensionality with many categories. Target encoding: Replace category with mean of target — powerful but risks overfitting, use with cross-validation. Frequency encoding: Replace with count/frequency of category occurrence.

Q32. What is feature importance? How do you compute it in Random Forest vs XGBoost?

Random Forest: Mean decrease in impurity (Gini) — average reduction in node impurity when a feature is used for splitting across all trees. XGBoost: Gain importance (contribution to reducing loss), Coverage (number of samples affected), and Frequency (how often the feature appears in splits). Both can be accessed via .feature_importances_ attribute. Also use SHAP values for more reliable, model-agnostic feature importance.

Q33. How do you handle outliers in your data?

Detection: IQR method (values below Q1-1.5×IQR or above Q3+1.5×IQR), Z-score (>3 std devs), visual (box plots, scatter plots). Handling: Remove (if data entry error), cap (winsorisation — replace with 95th/5th percentile), transform (log/sqrt transformation to reduce effect), or use robust algorithms (Random Forest, tree models are less sensitive to outliers than linear models).

Q34. What is the difference between feature selection and feature extraction?

Feature selection: Selects a subset of the original features to keep. Preserves interpretability. Methods: filter (correlation, chi-square), wrapper (RFE, forward selection), embedded (L1 regularisation). Feature extraction: Creates new features from original ones. Reduces dimensionality but loses interpretability. Methods: PCA, t-SNE, autoencoders, LDA.

Q35. What is SMOTE and when do you use it?

SMOTE (Synthetic Minority Oversampling Technique) generates synthetic minority class samples by interpolating between existing minority samples and their k-nearest minority neighbours. Use it when: class imbalance is severe (>10:1 ratio), simple oversampling leads to overfitting, and undersampling would lose too much data. Always apply SMOTE only on the training set — never before train-test split.

Q36–42: Practical Python & Coding Questions

Q36. Write Python code to train a Random Forest classifier on a dataset and print its accuracy.

This is a common screening question. The expected code demonstrates a clean scikit-learn pipeline:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
from sklearn.preprocessing import LabelEncoder

# Load data
df = pd.read_csv('data.csv')

# Basic preprocessing
X = df.drop('target', axis=1)
y = df['target']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Train model
rf = RandomForestClassifier(
    n_estimators=100,
    max_depth=10,
    class_weight='balanced',   # handles class imbalance
    random_state=42
)
rf.fit(X_train, y_train)

# Evaluate
y_pred = rf.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.4f}")
print(classification_report(y_test, y_pred))
Bonus points: Mentioning stratify=y in train_test_split (maintains class ratio), using class_weight=’balanced’, and printing classification_report rather than just accuracy.

Q37. How do you detect and remove duplicate rows and missing values using Pandas?

import pandas as pd

df = pd.read_csv('data.csv')

# Check missing values
print(df.isnull().sum())
print(f"Missing %: {df.isnull().mean() * 100}")

# Drop duplicates
df = df.drop_duplicates()

# Fill numerical columns with median
num_cols = df.select_dtypes(include='number').columns
df[num_cols] = df[num_cols].fillna(df[num_cols].median())

# Fill categorical columns with mode
cat_cols = df.select_dtypes(include='object').columns
df[cat_cols] = df[cat_cols].fillna(df[cat_cols].mode().iloc[0])

print(f"Cleaned shape: {df.shape}")

Q38–42: Short practical answers

Q38. How do you check if features are correlated using Python?

df.corr() to get correlation matrix. sns.heatmap(df.corr(), annot=True) to visualise. Features with |correlation| > 0.85 are highly correlated — consider dropping one to reduce multicollinearity in linear models.

Q39. How do you apply feature scaling in a Scikit-learn pipeline?

Use Pipeline([('scaler', StandardScaler()), ('model', LogisticRegression())]). This ensures the scaler is fit only on training data during cross-validation — preventing data leakage. Always use pipelines in production code.

Q40. What is hyperparameter tuning? How do you do it in Python?

GridSearchCV: Exhaustive search over specified parameter values. Reliable but slow. RandomizedSearchCV: Samples parameter combinations randomly. Faster for large search spaces. Optuna / Bayesian Optimisation: Uses past results to focus search on promising regions. Best for expensive-to-train models.

Q41. SQL: Write a query to find the top 5 data scientists by salary in each city.

SELECT city, name, salary FROM (
  SELECT city, name, salary,
    RANK() OVER (PARTITION BY city ORDER BY salary DESC) AS rnk
  FROM employees WHERE role = 'Data Scientist'
) t WHERE rnk <= 5;

Q42. How do you save and load a trained ML model in Python?

Use joblib.dump(model, 'model.pkl') to save and joblib.load('model.pkl') to load. Prefer joblib over pickle for large NumPy arrays (more efficient). For deployment, use MLflow model registry or save as ONNX format for cross-platform serving.

Q43–50: Advanced & Case Study Questions

Q43. A model has 99% accuracy on training but 60% on test. What is happening and what do you do?

Diagnosis: Severe overfitting. The model has memorised training data. Steps: (1) Check if data leakage is causing artificially high training accuracy, (2) Reduce model complexity (fewer layers, shallower trees, less max_depth), (3) Apply regularisation (L1/L2 for linear models, max_depth/min_samples_split for trees, dropout for neural nets), (4) Get more training data if possible, (5) Use cross-validation to get a more honest performance estimate, (6) Apply feature selection to reduce noise features.

Q44. A bank asks you to build a fraud detection model. What is your approach? (Case study)

Expected answer structure: (1) Understand the problem — What is fraud rate? Consequences of FP vs FN? Real-time or batch predictions? (2) Data exploration — transaction features, customer history, temporal patterns, class imbalance (fraud ~0.1–2% typically) (3) Handle imbalance — SMOTE + class_weight, use PR curve not just AUC-ROC (4) Feature engineering — velocity features (transactions in last hour/day), location anomalies, deviation from spending patterns (5) Model — XGBoost + SHAP for interpretability, required by RBI guidelines (6) Threshold tuning — tune decision threshold based on business cost of FP vs FN (7) Monitoring — concept drift detection as fraud patterns evolve.

Q45. What is SHAP? Why is it important for explainability in India’s regulated industries?

SHAP (SHapley Additive exPlanations) computes the contribution of each feature to a single prediction by averaging over all possible feature orderings — grounded in game theory. Unlike feature importance (global average), SHAP gives per-prediction explanations. In India’s BFSI sector, RBI and SEBI increasingly require models to explain individual loan decisions, credit scores, and trading flags — SHAP values provide the audit trail needed for regulatory compliance.

Q46. What is MLOps? Why does it matter in production?

MLOps is the practice of automating and monitoring the ML lifecycle in production — model versioning (MLflow), CI/CD pipelines for model deployment, data and model drift detection, A/B testing, and rollback mechanisms. In practice, 90% of ML projects never reach production; MLOps closes the gap. Key tools: MLflow (experiment tracking), DVC (data versioning), Airflow (pipeline orchestration), Docker (containerisation), Evidently AI (drift monitoring).

Q47. What is concept drift? How do you detect and handle it?

Concept drift occurs when the statistical relationship between input features and the target variable changes over time — a fraud model trained in 2024 may underperform in 2026 because fraud patterns evolved. Detection: Monitor prediction accuracy/distribution over time, use statistical tests (PSI — Population Stability Index, KS test for distribution shift). Handling: Periodic retraining on recent data, online learning (updating model incrementally), or ensemble of models trained at different time windows.

Q48. What is a recommendation system? Explain collaborative filtering vs content-based.

Collaborative filtering: Recommends items based on what similar users liked — “users like you also liked X.” Works without item content knowledge. Problem: cold start (new users/items have no history). Content-based: Recommends items similar to what a user has liked before based on item features. Not affected by cold start but limited to known item space. Hybrid: Netflix, Amazon, Swiggy use both — content-based for new users, collaborative for established users.

Q49. How would you A/B test an ML model change?

Setup: Define the metric you are optimising (CTR, revenue, conversion rate). Randomly split traffic (50/50 control vs treatment). Ensure sample size is sufficient for statistical significance (use power analysis). Run: Serve model A to control group and model B to treatment group simultaneously. Analyse: Use hypothesis testing (t-test for continuous metrics, chi-square for proportions). Check p-value < 0.05 before concluding. Also check for novelty effects and segment-level impacts before full rollout.

Q50. Tell me about a ML project you built. Walk me through your approach. (Most important question)

Structure your answer with STAR: Situation — what problem did it solve? Task — what was your specific contribution? Action — describe data collection, EDA, feature engineering, model selection, evaluation (mention specific metrics and what you chose). Result — quantify the outcome (accuracy improved from X to Y, or reduced false positives by Z%). Most common mistake: Describing what the model does theoretically instead of what YOU specifically did. Interviewers want to see your decision-making process, not just the outcome.

Interview Tips for ML Roles in India (2026)

✓ What Indian interviewers value

  • Practical experience over theory — deploy one project before interviewing
  • Clean Python code — practise on HackerRank India, StrataScratch
  • Business context — connect every ML answer to a business metric
  • SQL proficiency — tested in almost every Indian DS interview
  • Honest about limitations — no interviewer believes a perfect project

✗ Common interview mistakes

  • Memorising definitions without understanding — interviewers test understanding, not memory
  • No GitHub profile or deployed projects to show
  • Saying “I would use neural networks for everything”
  • Not knowing which metric to use for which problem type
  • Not practising coding on paper/whiteboard beforehand

Crack Your ML Interview with Real Project Experience

Cambridge Infotech’s Machine Learning course in Bangalore includes mock interviews, real project work, and placement drives. Students practice these exact questions with trainers who conduct industry hiring interviews.

View ML Course
Data Science Course

FAQ

What are the most common ML interview questions in India?

Bias-variance tradeoff, overfitting prevention, cross-validation, precision vs recall, gradient descent, regularisation (L1/L2), feature engineering, handling imbalanced datasets, and model evaluation metrics. SQL window functions and Python coding (Pandas + Scikit-learn) are tested in almost every interview regardless of company.

How many rounds are in a data science interview at Indian companies?

Typically 3–4 rounds: online screening (Python/SQL MCQs) → technical round 1 (ML theory and statistics) → technical round 2 (coding + case study) → HR/managerial. Product companies like Flipkart and Razorpay often add a system design or data architecture round for senior roles.

Where can I practise ML interview questions online?

StrataScratch (real interview questions from Flipkart, Amazon, and other companies), LeetCode (SQL and Python), Kaggle Learn (practical notebooks), and GeeksforGeeks ML section (theory questions).

Leave a Comment

Drop a Query

Whether to upskill or for any other query, please drop us a line and we'll be happy to get back to you.

Drop a Query NEW

Request A Call Back

Please leave us your contact details and our team will call you back.

Request A Call Back

By tapping Submit, you agree to Cambridge infotech Privacy Policy and Terms & Conditions

Enquiry Now

Enquiry popup