Python Fundamentals Research Design Statistics Supervised Learning
100
Float
"What will be the output of the following command?
print(type(1/3))
100
1) Calling people on landlines
Give an example of a survey technique that is vulnerable to selection bias.
100
1) Number of children people have
2) Number of cars people own
What is an example of a distribution that is commonly skewed left? How is this distribution represented in a histogram?
100
Machine Learning is the training of a model from data that generalizes a decision against a performance measure.
What is machine learning?
200
Blue
colors = [‘red’, ‘white’, ‘blue’, ‘green’]
What will be the value of colors[2]?
200
In regression, you are measuring the distance between the predicted and actual values. For classification, you can't measure the distance because the outcomes are categorical. Instead, you are measuring the percentage of values that were predicted correctly.
What is the difference between evaluating a regression and classification model? Why is there a difference?
200
Mean > median
If a distribution is skewed right, which of the following describes the relationship between the distribution's mean and median and why? 1) Mean < median, 2) Mean > median, 3) Mean = median
200
When the x-variable is perfectly X times y, and X does not exist without y. E.g., y = 2x, when x equals number of sandwiches and y equals number of slices of bread.
Under what circumstances would the intercept in a linear regression equal zero and contain a non-zero coefficient? What's an example?
300
df[(df["first_name"].notnull()) & (df["nationality"] == "UK") & (df["age") < 30)]
Select all rows in df where nationality is UK, age is less than 30, and name is NOT NULL.

name nationality age
Jason USA 42
Molly USA 52
NaN France 36
NaN UK 24
NaN UK 70
300
If you "train" the model on the first 2/3 of your dataset vs. the second 2/3, your scores will be different because you are using different data to test and train. K-folds ensures that we use ALL of the data in both a testing and training capacity.
What is the "problem" with splitting data into training and testing sets that k-folds cross validation helps alleviate?
300
Absolutely nothing!
If 10 is added to every value of a data set, what happens to the standard deviation?
300
Normal distribution - histogram. No collinearity - correlation matrix. Residuals have equal variance - residual plot. Observations are independent - rigorous data collection.
What are the assumptions required for a linear regression model? How do we check for them?
400
18, 18
x = 12
y = 18
x = y
y = x
print(x,y)

What will be the output?
400
If you have very large errors, they will be exacerbated by RMSE or mean squared error because you are squaring the values
Under what circumstances would the root mean squared error (RMSE) or the mean squared error be higher than the mean absolute error?
400
Data is likely non-linear and we should not be using OLS
What conclusions can we draw from this residual plot?
https://onlinecourses.science.psu.edu/stat501/sites/onlinecourses.science.psu.edu.stat501/files/05model_check/residual_fitted_alpha_plot.gif
400
R2 will INCREASE.
What is the effect on R2 in an OLS model if outliers are removed?
500
for element in x:
i = 0
w = element + i
print (w)
Write a function that loops through the following list and prints the cumulative sum for each element.

x = [10, 25, 70, 100, 500]
500
Thus, for a male, the odds of being admitted are 5.44 times larger than the odds for a female being admitted.
How would we interpret the following logarithmic regression equation?

https://stats.idre.ucla.edu/stata/faq/how-do-i-interpret-odds-ratios-in-logistic-regression/
500
The price is predicted to increase 1767.292 when the foreign variable goes up by one, decrease by 294.1955 when mpg goes up by one, and is predicted to be 11905.42 when both mpg and foreign are zero.
How would we interpret the coefficients of the following table?

http://dss.princeton.edu/online_help/analysis/interpreting_regression.htm
500
100%. Every point would be the "nearest neighbor" of itself.
If we used our training data as testing data in KNN, what would be our accuracy rate and why?






Katie's Super Awesome Data Science Jeopardy

Press F11 for full screen mode



Limited time offer: Membership 25% off


Clone | Edit | Download / Play Offline