Q1. What is linear regression, and how does it find the best-fit line?

Linear regression models the relationship between input variables and a continuous target by fitting a straight line.

The model minimizes the sum of squared residuals between predicted and actual values. This optimization is done using Ordinary Least Squares (OLS).

The slope and intercept represent the learned parameters. It is one of the most fundamental mathematical models in data science.

Q2. What is gradient descent, and why is it used in optimization?

Gradient descent is an iterative algorithm that minimizes a cost function by moving toward the negative gradient.

At each step, model parameters are updated to reduce error. The learning rate controls the step size, affecting convergence speed.

Gradient descent is widely used in training ML models like regression and neural networks. It is essential for optimizing large and complex functions.

Q3. How does logistic regression classify data?

Logistic regression converts linear combinations of inputs into probabilities using the sigmoid function.

The output ranges from 0 to 1, making it ideal for classification. A threshold (usually 0.5) assigns class labels.

The model is trained by maximizing likelihood or minimizing cross-entropy loss. It is simple yet effective for many binary classification tasks.

Q4. What is a decision tree, and how does it split data?

Decision trees split data based on mathematical impurity measures like Gini or Entropy. At each node, the feature that produces the most homogeneous groups is chosen.

The process continues until a stopping criterion is reached. Trees are easy to interpret because they mimic human decision logic. They form the basis of advanced models like Random Forest and XGBoost.

Q5. Compare mean, median, and mode.

 

Measure Meaning Best Used For
Mean Average value Normal/clean data
Median Middle value Skewed data
Mode Most frequent value Categorical data

These central tendency measures help summarize data in algorithmic preprocessing.

Q6. Compare Euclidean distance and Manhattan distance.

 

Distance Type Formula When to Use
Euclidean √((x₂−x₁)² + (y₂−y₁)²) Continuous geometric space
Manhattan x₂−x₁

These distances are essential for algorithms like KNN and clustering.

Q7. Compare correlation vs covariance.

 

Metric Meaning Range Use Case
Covariance How two variables vary together (−∞ to +∞) Scale-dependent analysis
Correlation Normalized covariance −1 to +1 Relationship strength

Correlation is standardized, making comparisons easier across datasets.

Q8. Compare classification and regression algorithms.

 

Algorithm Type Output Examples Evaluation Metrics
Classification Categories Logistic Regression, SVM Accuracy, F1
Regression Numeric values Linear Regression RMSE, MAE

Choosing the right type depends on whether the target variable is categorical or numeric.

Q9. What is overfitting, and how can it be prevented?

 

Overfitting occurs when a model learns noise instead of patterns. It performs well on training data but poorly on unseen data.

Prevention methods include regularization, pruning (for trees), early stopping, and cross-validation. Increasing training data also helps. Overfitting is a key issue in algorithm design.

Q10. What is regularization in machine learning models?

 

Regularization adds penalties to large weight values, reducing model complexity. L1 regularization encourages sparsity, while L2 smooths weights.

It prevents overfitting and improves generalization. Regularization is widely used in linear and logistic regression. It stabilizes models when dealing with high-dimensional data.

Q11. What is the bias-variance trade-off?

 

Bias refers to error from overly simple models, and variance refers to error from overly complex ones. The trade-off aims to balance both for optimal performance.

High bias leads to underfitting; high variance leads to overfitting. Techniques like cross-validation help manage this trade-off. Achieving balance is essential for reliable modeling.

Q12. What is a cost function in machine learning?

 

A cost function measures the error between predicted and actual values. Algorithms optimize this function to improve performance.

Common examples include MSE for regression and cross-entropy for classification. The cost function directly guides gradient descent updates. Choosing an appropriate cost function is crucial for successful training.

Q13. What is normalization, and why is it important?

 

Normalization scales numeric features to a similar range. It improves algorithm performance, especially for gradient descent and distance-based models.

Common techniques include Min-Max scaling and Standardization. Normalization helps speed up convergence and stabilizes training. It is a key preprocessing step in data pipelines.

Q14. What is PCA (Principal Component Analysis)?

 

PCA reduces dimensionality by projecting data onto components that maximize variance. It helps remove noise and simplify datasets.

PCA is useful when dealing with many correlated variables. It improves model performance and visualization. PCA is widely used in preprocessing for ML.

Q15. What is the purpose of K-Means clustering?

 

K-Means groups data into K clusters by minimizing distance to cluster centroids. It is an unsupervised algorithm used for segmentation.

The algorithm iterates between assigning points and updating centroids. It works best for spherical, evenly sized clusters. K-Means is simple and widely used in practical analytics.

Q16. What is entropy in decision trees?

 

Entropy measures the impurity or randomness in a dataset. Lower entropy means a more pure split. Decision trees choose splits that maximize information gain (reduce entropy). It helps create meaningful partitions during training. Entropy is a core concept in classification trees.

Q17. What is the sigmoid function, and where is it used?

 

The sigmoid maps real numbers into a 0–1 range. It is used to convert outputs to probabilities in logistic regression. Its smooth gradient makes it suitable for optimization.

However, it may saturate and cause slow learning. Despite limitations, it remains fundamental in binary classification.

Q18. What is the difference between training data and test data?

 

Training data teaches the model, while test data evaluates performance. The test set must remain unseen to prevent bias.

Splitting ensures that results reflect real-world generalization. Typical splits include 70/30 or 80/20. Maintaining separation prevents data leakage.

Q19. What is KNN (K-Nearest Neighbors) used for?

 

KNN classifies or predicts based on the closest neighbors in feature space.

It is non-parametric and easy to understand. Distance metrics determine similarity. KNN works best when features are scaled properly. It’s commonly used for baseline models.

Q20. What is a confusion matrix?

 

A confusion matrix summarizes classification performance by showing correct and incorrect predictions.

It includes True Positive, False Positive, True Negative, and False Negative counts. Metrics like precision and recall are derived from it. It helps diagnose model weaknesses. Analysts rely on it for model evaluation.

Need Help? Talk to us at +91-8448-448523 or WhatsApp us at +91-9001-991813 or REQUEST CALLBACK
Enquire Now