500 Data Science Question 61-65

Question 61. throughout analysis, however does one treat missing values?

Answer: The extent of the missing values is known when distinguishing the variables with missing values. If any patterns ar known the analyst needs to consider them because it may lead to attention-grabbing and pregnant business insights. If there ar no patterns known, then the missing values are often substituted with mean or median values (imputation) or they'll merely be unnoticed. There ar varied factors to be thought-about once respondent this question-

  • Understand the matter statement, perceive the info so provide the answer. Assigning a default worth which may be mean, minimum or most value. going in the info is vital.
  •  If it's a categorical variable, the default worth is assigned . The missing worth is assigned  a default worth.
  • If you have got a distribution of knowledge coming back, for statistical distribution provide the mean value.
  • Should we have a tendency to even treat missing values is another necessary purpose to consider? If 80% of the values for a variable ar missing then you'll be able to answer that you simply would be dropping the variable rather than treating the missing values.


Question 62. make a case for regarding the box cox transformation in regression models.

Answer: For some reason or the opposite, the response variable for a multivariate analysis might not satisfy one or a lot of assumptions of a normal statistical procedure regression. The residuals might either curve because the prediction will increase or follow skewed distribution. In such situations, it's necessary to rework the response variable so the info meets the specified assumptions. A Box cox transformation may be a applied mathematics technique to rework non-mornla dependent variables into a traditional form. If the given knowledge isn't traditional then most of the statistical techniques assume normality. Applying a box cox transformation means that you'll be able to run a broader variety of tests.




Question 63. are you able to use machine learning for statistic analysis?

Answer: Yes, it are often used however it depends on the applications.


Question 64 Write a perform that takes in 2 sorted lists and outputs a sorted list that's their union.

Answer: First resolution which can return to your mind is to merge 2 lists and short them afterwards

Python code-

def return_union(list_a, list_b):

return sorted(list_a + list_b)

R code-

return_union <- function(list_a, list_b)

{

list_c<-list(c(unlist(list_a),unlist(list_b)))

return(list(list_c[[1]][order(list_c[[1]])]))

}

Generally, the tricky part of the question is not to use any sorting or orderingfunction. In that case you will have to write your own logic to answer the question and impress your interviewer.

Python code=

def return_union(list_a, list_b):

len1 = len(list_a)

len2 = len(list_b)

final_sorted_list = []

j = 0

k = 0

for i in range(len1+len2):

      if k == len1:

           final_sorted_list.extend(list_b[j:])

           break

      elif j == len2:

          final_sorted_list.extend(list_a[k:])

          break

     elif list_a[k] < list_b[j]:

         final_sorted_list.append(list_a[k])

         k += 1

     else:

         final_sorted_list.append(list_b[j])

         j += 1

     return final_sorted_list


Similar function can be returned in R as well by following the similar steps.

return_union <- function(list_a,list_b)

{

#Initializing length variables

len_a <- length(list_a)

len_b <- length(list_b)

len <- len_a + len_b

#initializing counter variables

j=1

k=1

#Creating an empty list which has length equal to sum of both the lists

list_c <- list(rep(NA,len))

#Here goes our for loop

for(i in 1:len)

    {

        if(j>len_a)

          {

               list_c[i:len] <- list_b[k:len_b]

               break

          }

        else if(k>len_b)

         {

           list_c[i:len] <- list_a[j:len_a]

           break

         }

       else if(list_a[[j]] <= list_b[[k]])

        {

            list_c[[i]] <- list_a[[j]]

            j <- j+1

        }

      else if(list_a[[j]] > list_b[[k]])

       {

           list_c[[i]] <- list_b[[k]]

           k <- k+1

        }

  }

return(list(unlist(list_c)))

}


Question 65. what's the distinction between theorem Estimate and most Likelihood Estimation (MLE)?

Answer: In theorem estimate we've some data regarding the data/problem (prior).There could also be many values of the parameters that justify knowledge and thence we tend to can hunt for multiple parameters like five gammas and five lambdas that try this. As a result of theorem Estimate, we tend to get multiple models for creating multiple predictions i.e. one for every combine of parameters however with an equivalent previous. So, if a new example got to be foretold than computing the weighted total of those predictions serves the aim.

Maximum probability doesn't take previous into thought (ignores the prior) therefore it is like being a theorem whereas exploitation some reasonably a flat previous.

Post a Comment

Featured Post

500 Data Science Question 76-80