Hot Topics
Fractal Analytics Solution
Technical Round
- Question 1
There are 3 ants on an equilateral triangle and they can move in either direction, what is the probability that they will never intersect and what is the probability that any two of them will intersect.
- Answer
The probability that the ants will never intersect is very low, as it is extremely unlikely for all three ants to move in such a way that they never cross paths. The probability that any two of the ants will intersect is much higher, as there are many ways for two of the ants to cross paths while still avoiding a collision with the third ant. However, without a more specific context for the ants’ movements, it would be difficult to provide an exact probability for either scenario.
- Question 2
Ok Suppose My Daughter said to me that in 2016, her age is same as the last two digits of her birthday year and then her Grandfather came into the room and he also said the same thing that his age is also the same as last two digits of his birthday year. Find out what is age of Daughter and her Grandfather.
- Answer
Let’s assume the age of the daughter is x
and the birth year of the daughter is y
. Then, according to the daughter’s statement, x
is equal to y % 100
.
Similarly, let’s assume the age of the grandfather is y
and the birth year of the grandfather is z
. Then, according to the grandfather’s statement, y
is equal to z % 100
.
Since the daughter and grandfather have the same age, we can equate x
and y
:
x = y % 100
y = z % 100
Solving for x
and y
:
x = z % 100
y = z % 100
Therefore, the daughter and grandfather both have an age equal to the last two digits of their respective birth year.
We cannot determine their exact ages or birth years with this information alone
- Question 3
There is a 100 storied building, find me the highest floor from which if you throw an egg down on the concrete, it won’t crack.
- Answer
This problem is commonly referred to as the “Egg Drop Problem.” The idea is to find the highest floor from which you can drop an egg without cracking it.
One approach is to use a binary search algorithm to find the floor. Start by dropping an egg from the 50th floor. If it doesn’t crack, move up to the 75th floor. If it does crack, move down to the 25th floor. Repeat this process, narrowing the search each time until you find the highest floor that doesn’t crack the egg.
This way, you can minimize the number of drops needed to find the solution. The number of drops required can be calculated as follows:
T(n) = T(n/2) + 1
T(n) = log2(n) + 1
This means that, on average, you will need log2(100) + 1 = 7 drops to find the solution.
- Question 4
You have two jugs one 5 liters and other 3 liters you have to measure 4 liters of water?
- Answer
Fill the 3 liter jug to full.
Pour 3 liter jug into 5 liter jug.
Again fill up 3 liter jug.
Pour 3 liter jug into 5 liter jug.
After filling up 5 liter jug fully you will have 1 liter left in 3 liter jug.
Empty the 5 liter jug and pour 1 liter from 3 liter jug.
Fill up the 3 liter jug again and pour it into 5 liter jug.
Now we got total 1+3 liter i.e. 4 liters in 5 liters jug.
- Question 5
How to reduce number of variables in Logistic regression and random forest?
- Answer
Reducing the number of variables in a logistic regression or random forest model can improve the model’s performance, speed up computation, and reduce overfitting. Here are a few techniques to reduce the number of variables in these models:
Feature selection: Select a subset of the most relevant features based on their importance or correlation with the target variable. You can use techniques such as recursive feature elimination, feature importance, or correlation analysis.
Feature engineering: Combine or create new features from existing ones to capture non-linear relationships. This can also reduce the number of variables by creating a more informative representation of the data.
Regularization: Regularization methods such as L1 or L2 penalize large weights for less important features, which effectively reduces the number of variables that are used by the model.
Principal Component Analysis (PCA): PCA is a dimensionality reduction technique that transforms the data into a new feature space with fewer variables. The new features are combinations of the original features, and they capture the most important information in the data.
Variable clustering: Group similar variables together and represent them as a single variable, reducing the number of variables in the model.
Choosing the right technique depends on the nature of the data, the size of the dataset, and the complexity of the model. It’s important to carefully evaluate the performance of the model after reducing the number of variables, as too many reductions may lead to loss of important information and poor model performance.
- Question 6
How you will decide the number of clusters in k means?
- Answer
There is no definitive way to determine the optimal number of clusters in a k-means clustering algorithm. However, there are several methods that can be used to make an informed estimate, including:
Elbow Method: Plot the within-cluster sum of squares (WCSS) against the number of clusters. The WCSS measures the sum of squared distances between each data point and its nearest cluster center. The elbow point on the plot is the point where the WCSS decreases at a slower rate. The number of clusters corresponding to the elbow point is typically used as the number of clusters for the k-means algorithm.
Silhouette Score: The silhouette score measures the similarity of each data point to its own cluster compared to other clusters. The optimal number of clusters is the one that maximizes the silhouette score.
Gap Statistics: The gap statistic measures the difference between the observed WCSS and the expected WCSS for randomly generated data. The optimal number of clusters is the one where the gap between the observed and expected WCSS is the largest.
Bayesian Information Criteria (BIC) or the Akaike Information Criteria (AIC): BIC and AIC are model selection criteria that penalize models with more parameters. They can be used to select the optimal number of clusters by choosing the one with the lowest BIC or AIC score.
It’s important to keep in mind that these methods are just guidelines and may not always provide the optimal solution. In practice, the number of clusters is often determined based on the underlying business problem, the nature of the data, and expert knowledge. It may also require multiple iterations with different numbers of clusters to find the best solution.
- Question 7
How you will handle class imbalance problem? What are various approaches?
- Answer
the distribution of classes by either oversampling the minority class or undersampling the majority class. This can improve the performance of machine learning algorithms on imbalanced data.
Synthetic Data Generation: Synthetic data generation involves generating new samples for the minority class to balance the distribution of classes. This can be done using techniques such as Synthetic Minority Over-sampling Technique (SMOTE).
Cost-Sensitive Learning: Cost-sensitive learning involves adjusting the loss function used in the learning algorithm to give more importance to the minority class. This helps to
overcome the bias towards the majority class.
Ensemble Methods: Ensemble methods such as bagging and boosting can be used to improve the performance of machine learning algorithms on imbalanced data. For example, using a combination of oversampled and undersampled datasets in a bagging or boosting ensemble can help improve performance.
Modifying the Classifier: Some machine learning algorithms, such as decision trees, can be modified to handle class imbalance by weighting the classes differently or using a threshold for classifying samples.
It is important to choose the best approach based on the specific problem and data set. It may also be necessary to use a combination of methods to effectively handle class imbalance.
- Question 8
Why do we normalize data before performing K-means clustering?
- Answer
Normalizing the data before performing K-means clustering helps to scale the features to similar ranges, so that the clustering algorithm is not biased towards larger magnitude features. Normalization helps to give equal importance to all features, so that the clustering results are not distorted by the presence of a feature with a very large magnitude.
- Question 9
Why do we remove outliers before performing K-means clustering? How does it affect number of K-means cluster?
- Answer
Removing outliers before performing K-means clustering can have a significant impact on the quality of the clustering results. Outliers can distort the distribution of the data and influence the location of the cluster centers. This can result in clusters that are biased towards the outliers or the creation of additional clusters to account for the outliers.
In terms of the number of clusters, the presence of outliers can cause the algorithm to form more clusters to account for the outliers, resulting in an over-segmentation of the data. Removing the outliers can help to prevent this issue and lead to more meaningful clusters that better reflect the underlying structure of the data.
However, it’s important to keep in mind that the removal of outliers should be done with care, as it may also remove important information about the data. Therefore, it’s important to consider the context of the data and the goals of the clustering analysis when deciding whether or not to remove outliers.
- Question 10
Explain the basic idea of your projects?
- Answer
Explain the key points of your project.
- Question 11
Cloud platform experience and exposure
- Answer
Having experience and exposure to cloud platforms can be beneficial for individuals and organizations in various ways:
Scalability: Cloud platforms offer the ability to scale computing resources on demand, allowing organizations to handle increased traffic or processing needs without having to make large upfront investments in hardware.
Cost savings: Cloud platforms can reduce capital expenditures, as organizations only pay for what they use and do not have to invest in expensive hardware and maintenance.
Flexibility: Cloud platforms offer a wide range of computing resources and services, allowing organizations to quickly spin up new resources or change their infrastructure as needed.
Reliability: Cloud platforms are designed with high availability and disaster recovery in mind, ensuring that services and data remain available even in the event of a failure.
Innovation: Cloud platforms provide access to cutting-edge technologies, allowing organizations to take advantage of new developments and integrate them into their operations more quickly.
Overall, having experience and exposure to cloud platforms can help individuals and organizations to be more agile, efficient, and innovative in their operations.