Big Data and ML Tools | Model Evaluation | Machine Learning Workflow | Data Preparation and Cleaning | Einstein Discovery |
---|---|---|---|---|
What is Spark?
Scalable in memory compute engine for generic data processing
|
What is an ROC curve?
A plot of the false positive rate versus the true positive rate for a binary classifier.
|
What is classification?
Predicting qualitative outputs
|
||
What is scikit learn?
The most popular Python framework for building machine learning pipelines
|
What is overfitting?
A model that performs well on training data, but poorly on test data.
|
What is regression?
Predicting quantitative outputs
|
||
What is deep learning?
This type of machine learning algorithm is useful on image, textual, and sound data.
|
What is a model A/B test?
Comparing two models on live production data
|
What is a holdout set?
A portion of labeled data that is not used during model training or tuning.
|
||
What is underfitting?
A model fails to produce good results on training data.
|
||||
What is regularization?
A common technique to reduce model complexity and improve interpretability.
|
What is 5-fold (or n-fold) cross validation?
Split your data into 5 segments and hold out each segment once for validation.
|
|||