Unit - 1
Well defined learning problems
Q1) Explain well-defined learning problems
A1)
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance in tasks T, as measured by P, improves with experience E.
Features in a Learning Problem
- The class of tasks (T)
- The measure of performance to be improved (P)
- The source of experience (E)
Examples of Well Defined Learning Problem
Checkers Learning Problem
- Task (T): Playing Checkers
- Performance Measure (P): Percent of games won against opponents.
- Training Experience (E): Playing practice games against itself.
Handwriting Recognition Learning Problem
- Task (T): Recognizing and classifying handwritten words within images.
- Performance Measure (P): Percent of words correctly classified.
- Training Experience (E): A dataset of handwritten words with given classifications.
Robot Driving License Problem
- Task (T): Driving on public four-lane highways using vision cameras.
- Performance Measure (P): Average distance traveled before an error (as judged by a human observer).
- Training Experience (E): A sequence of images and steering commands recorded while observing a human driver.
Q2) Give a stepwise explanation of the find-S algorithm
A2)
The step of the find-S algorithm:
- The process starts with initializing ‘h’ with the most specific hypothesis, generally, it is the first example in the dataset.
- We check for each positive example. If the example is negative, we will move on to the next example but if it a positive example we will consider it for the next step.
- We will check if each attribute in the example is equal to the hypothesis value.
- If the value matches, then no changes are made.
- If the value does not match, the value is changed to “?”.
- We do this until we reach the last positive example in the dataset.
Q3) Explain the Candidate Elimination Algorithm in detail.
A3)
The Candidate Elimination Algorithm -
- Initialize G to the set of maximally general hypotheses in H.
- Initialize S to the set of maximally specific hypotheses in H.
- For each training example d
- If d is a positive example
- Remove from G any hypothesis that does not include.
- For each hypothesis s in S that does not include d, remove s from S.
- Add to S all minimal generalizations h of s such that h includes d, and
- Some member of G is more general than h
- Remove from S any hypothesis that is more general than another hypothesis in S.
- For each training example d
- If d is a negative example
- Remove from S any hypothesis that does not include.
- For each hypothesis g in G that does not include d
- Remove g from G
- Add to G all minimal generalizations h of g such that
- h does not include d and
- Some member of S is more specific than h
- Remove from G any hypothesis that is less general than another hypothesis in G.
- If G or S, ever becomes empty, data not consistent (with H).
Q4) Differentiate b/w classification, regression, and clustering & rule extraction
A4)
Classification
- In classification, data is labeled i.e., it is assigned a class, for example, spam/non-spam or fraud/non-fraud.
- The decision being modeled is to assign labels to new unlabelled pieces of data.
- This can be thought of as a discrimination problem, modeling the differences or similarities between groups.
Regression
- Regression data is labeled with real value rather than a categorical label.
- The decision being modeled is what value to predict for new unpredicted data.
Clustering
- In clustering, data is not labeled but can be divided into groups based on similarity and other measures of natural structure in the data.
- For example, organizing pictures by faces without names, where the human user has to assign names to groups, like iPhoto on the Mac.
Rule Extraction
- In rule extraction, data is used as the basis for the extraction of propositional rules.
- These rules discover statistically supportable relationships between attributes in the data.
Q5) Explain the disadvantages and advantages of different ML algorithms
A5)
Advantages of Supervised Machine Learning Algorithms
- Classes represent the features on the ground.
- Training data is reusable unless features change.
Disadvantages of Supervised Machine Learning Algorithms
- Classes may not match spectral classes.
- Varying consistency in classes.
- Cost and time are involved in selecting training data.
Advantages of Unsupervised Machine Learning Algorithms
- No previous knowledge of the image area is required.
- The opportunity for human error is minimized.
- It produces unique spectral classes.
- Relatively easy and fast to carry out.
Disadvantages of Unsupervised Machine Learning Algorithms
- The spectral classes do not necessarily represent the features on the ground.
- It does not consider spatial relationships in the data.
- It can take time to interpret the spectral classes.
Advantages of Semi-supervised Machine Learning Algorithms
- It is easy to understand.
- It reduces the amount of annotated data used.
- It is a stable algorithm.
- It is simple.
- It has high efficiency.
Disadvantages of Semi-supervised Machine Learning Algorithms
- Iteration results are not stable.
- It does not apply to network-level data.
- It has low accuracy.
Advantages of Reinforcement Machine Learning Algorithms
- Reinforcement Learning is used to solve complex problems that cannot be solved by conventional techniques.
- This technique is preferred to achieve long-term results which are very difficult to achieve.
- This learning model is very similar to the learning of human beings. Hence, it is close to achieving perfection.
Disadvantages of Reinforcement Machine Learning Algorithms
- Too much reinforcement learning can lead to an overload of states which can diminish the results.
- This algorithm is not preferable for solving simple problems.
- This algorithm needs a lot of data and a lot of computation.
- The curse of dimensionality limits reinforcement learning for real physical systems.
Q6) Elaborate steps involved in building a Machine Learning System
A6)
Following are the various steps involved in building a Machine Learning System–
- Choose the training experience, i.e. the data that will be used to train the machine learning model. We represent the data in the form of features or attributes.
- Choose the target function to be learned, of which a model would be generated.
- Choose how to represent the target function, also known as the hypothesis. Rich representations are more difficult to learn.
- Choose from a set of algorithms, a learning algorithm to infer the target function.
Q7) Elaborate steps involved in The LIST-THEN-ELIMINATE Algorithm.
A7)
The LIST-THEN-ELIMINATE Algorithm:
Following are the steps for the LIST-THE-ELIMINATE algorithm:
- Version Space ← a list containing every hypothesis in H
- For each training example, <z, c(z)> remove from Version Space any hypothesis for which h(z) ≠ c(z)
- Output the list of hypotheses in Version Space
Q8) Elaborate on what Is Learning?
A8)
There are several definitions of "learning." One of the simplest definitions is:
“The activity or process of gaining knowledge or skill by studying, practicing, being taught, or experiencing something.”
Just as there are various definitions of learning, there are various categories of learning methods.
As a human, we learn a lot of things throughout our life. Some of them are based on our experience and some of them are based on memorization. Based on that, we can divide learning methods into five parts:
- Rote learning (memorization): Memorizing things without knowing the concept/logic behind them.
- Passive learning (instructions): Learning from a teacher/expert.
- Analogy (experience): Learning new things from our experience.
- Inductive learning (experience): Based on experience, formulating a generalized concept.
- Deductive learning: Deriving new facts from past facts.
Inductive learning is based on formulating a generalized concept after observing examples of the concept. For example, if a kid is asked to write an answer to 2*8=x, they can either use the rote learning method to memorize the answer or use inductive learning (i.e. thinking how 2*1=2, 2*2=4, and so on) to formulate a concept to calculate the results. In this way, the kid will be able to solve similar types of questions using the same concept.
Similarly, we can make our machine learn from past data and make them identify whether an object falls into a specific category of interest.
Q9) Write a short note on Inductive Learning Hypothesis
A9)
Inductive Learning Hypothesis
The ultimate goal of concept learning is to identify a hypothesis H identical to target concept cover data set X with the only available information about C being its value over X. Our algorithm can guarantee that it best fits the training data. In other words:
"Any hypothesis found approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples."
For example, whether a person goes to a movie is based on four binary features with two values (true or false):
- Has money
- Has free time
- It’s a holiday
- Has pending work
With the training data, we have two data objects as positive samples and one as negative:
- x1: <true, true, false, false> : +ve
- x2: <true, false, false, true> : +ve
- x3:<true, false, false, true> : -ve
Hypothesis Notations
Each of the data objects represents a concept and hypotheses. Considering a hypothesis <true, true, false, false> is more specific because it can cover only one sample. Generally, we can add some notations to this hypothesis. We have the following notations:
- ⵁ (represents a hypothesis that rejects all)
- < ? , ? , ? , ? > (accepts all)
- <true, false, ? , ? > (accepts some)
The hypothesis ⵁ will reject all the data samples. The hypothesis <? , ? , ? , ? > will accept all the data samples. The ? notation indicates that the values of this specific feature do not affect the result.
The total number of the possible hypothesis is (3 * 3 * 3 * 3) + 1 — 3 because one feature can have either true, false, or ? and one hypothesis for rejects all (ⵁ).
Q10) Write a short note on General to Specific concept
A10) Many machine learning algorithms rely on the concept of the general-to-specific ordering of hypotheses.
- h1 = < true, true, ?, ? >
- h2 = < true, ? , ? , ? >
Any instance classified by h1 will also be classified by h2. We can say that h2 is more general than h1. Using this concept, we can find a general hypothesis that can be defined over the entire dataset X.
To find a single hypothesis defined on X, we can use the concept of being more general than partial ordering. One way to do this is to start with the most specific hypothesis from H and generalize this hypothesis each time it fails to classify and observe positive training data object as positive.
- The first step in the Find-S algorithm is to start with the most specific hypothesis, which can be denoted by h <- <ⵁ, ⵁ, ⵁ, ⵁ>.
- This step involves picking up the next training sample and applying Step 3 on the sample.
- The next step involves observing the data sample. If the sample is negative, the hypothesis remains unchanged and we pick the next training sample by processing Step 2 again. Otherwise, we process Step 4.
- If the sample is positive and we find that our initial hypothesis is too specific because it does not cover the current training sample, then we need to update our current hypothesis. This can be done by the pairwise conjunction (logical and operation) of the current hypothesis and training sample.
If the next training sample is <true, true, false, false> and the current hypothesis is <ⵁ, ⵁ, ⵁ, ⵁ>, then we can directly replace our existing hypothesis with the new one.
If the next positive training sample is <true, true, false, true> and the current hypothesis is <true, true, false, false>, then we can perform a pairwise conjunctive. With the current hypothesis and next training sample, we can find a new hypothesis by putting ? in the place where the result of the conjunction is false:
<true, true, false, true> ⴷ <true, true, false, false> = <true, true, false, ?>
Now, we can replace our existing hypothesis with the new one: h <-<true, true, false, ?>
5. This step involves the repetition of Step 2 until we have more training samples.
6. Once there are no training samples, the current hypothesis is the one we wanted to find. We can use the final hypothesis to classify real objects.