Unit - 4
Computational Learning Theory
Computational learning theory also known as statistical learning theory is a mathematical framework for quantifying learning tasks and algorithms. It is one of the many sub-fields of machine learning.
“One can extend statistical learning theory by taking computational complexity of the learner into account. This field is called computational learning theory or COLT”
4.1.1 Sample Complexity for Finite Hypothesis spaces
Sample complexity is growing in some required training examples with its defined problem size of a learning problem.
- It considers only consistent learners, which are those that maintain a training error of 0.
- It derives a bound on the number of training examples required by any consistent learners
- Motivational fact: all consistent learners give output as a hypothesis belonging to its own version space.
- Hence it needs to bound the number of examples needed to assure that the version space contains no unacceptable hypothesis
4.1.2 Sample Complexity for Infinite Hypothesis spaces
Motivation: classification is to distinguish subsets.
• A set of instances 𝑆 and a hypothesis space 𝐻.
• Can 𝐻 distinguish all subsets of 𝑆?
• The number of concepts over 𝑆: 2 𝑆
• (Def) 𝑆 is shattered by 𝐻 if all the concepts over 𝑆 are included in 𝐻.
• I.e., for any bi-partition (𝑆1, 𝑆2) of 𝑆, there exists one ℎ in 𝐻, such that ℎ(𝑠) = 0 for every 𝑠 ∈ 𝑆1 and ℎ(𝑠) = 1 for each 𝑠 ∈ 𝑆2.
• Corollary: 𝐻 ≥ 2 𝑆 if 𝐻 can shatter 𝑆.
4.1.3The Mistake Bound Model of Learning
Mistake Bound Model of Learning algorithms for learning are:
- a linear function over the feature space
- Perceptron (with many variants)
- General Gradient Descent view
- Put Importance of Representation
- Evaluate the Complexity of Learning
- Setting
- Instance space: χ (dimensionality n)
- Target f: χ → {0, 1}, f ∈ C the concepts class (parameterized by n)
- Learning Protocol:
- Learner is given x ∈ χ, randomly chosen.
- Learner predicts h(x) and is then given f(x) ← the feedback
- Performance: learner makes a mistake when h(x) ≠ f(x)
- : Number of mistakes algorithm A makes on sequence S of examples for the target function .
- The maximum possible number of mistakes made by A for any target function in and any sequence of examples.
KEY TAKEAWAYS
- Computational learning theory also known as statistical learning theory is a mathematical framework for quantifying learning tasks and algorithms.
- Sample complexity is growth in number of required training examples with its defined problem size of a learning problem.
Instance-based learning methods simply store the training examples instead of learning an explicit description of the target function.
- Generalizing the examples is postponed until a new instance must be classified.
- When a new instance is encountered, its relationship to the stored examples is examined to assign a target function value for the new instance.
Instance-based learning includes
- Nearest neighbour
- Locally weighted regression and
- Case-based reasoning methods.
Instance-based methods are sometimes referred to as lazy learning methods because they delay processing until a new instance must be classified.
A key advantage of lazy learning is it can estimate target function locally and differently for each new instance to be classified instead of estimating it only once for the entire instance space.
4.2.1k-Nearest Neighbour Learning
k-Nearest Neighbor Learning algorithm assumes all instances correspond to points in the n-dimensional space Rn
- The nearest neighbors of an instance are defined in terms of Euclidean distance.
- Euclidean distance between the instances xi = and xj = are:
- For a given query instance xq, f(xq) is calculated the function values of a k-nearest neighbor of xq
The KNN algorithm assumes that similar things exist in close proximity. In other words, similar things stay near to each other.
- Load the data
- Initialize K to your chosen number of neighbours
3. For each example in the data
(I) Calculate the distance between the query example and the current example from the data.
(II) Add the distance and the index of the example to an ordered collection
4. Sort the ordered collection of distances and indices from smallest to largest (in ascending order) by the distances
5. Pick the first K entries from the sorted collection
6. Get the labels of the selected K entries
7. If regression, return the mean of the K labels
8. If classification, return the mode of the K labels
When to apply KNN
- Instances map to points in Rn
- Less than 20 attributes per instance
- Lots of training data
Advantages
- Training is very fast
- Learn complex target functions
- Can handle noisy data
- Does not lose any information
Disadvantages
- Slow at query time
- Easily fooled by irrelevant attributes
4.2.2Locally Weighted Regression
KNN forms local approximation to f for each query point xq
So to form an explicit approximation f(x) for the region surrounding xq
Locally Weighted Regression
Given a new query instance xq, the general approach in locally weighted regression is to construct an approximation f that fits the training examples in the neighborhood surrounding xq.
- This approximation is then used to calculate the value f(xq), which is output as the estimated target value for the query instance.
- Locally weighted regression uses nearby or distance-weighted training examples to form this local approximation to f.
- We might approximate the target function in the neighbourhood surrounding x, using a linear function, a quadratic function, a multilayer neural network.
- The phrase "locally weighted regression" is called local because the function is approximated based only on data near the query point, weighted because the contribution of each training example is weighted by its distance from the query point, and Regression because this is the term used widely in the statistical learning community for the problem of approximating real-valued functions.
4.2.3Radial basis function networks
One approach to function approximation that is closely related to distance-weighted regression and also to artificial neural networks is learning with radial basis functions.
The learned hypothesis is a function of the form
Where each is an instance from χ and where the kernel function is defined so that it decreases as the distance increases. Here k is a user-provided constant that specifies the number of kernel functions to be included. Even though is a global approximation to f(x), the contribution from each of the terms is localized to a region nearby the point . It is common of the terms is localized to a region nearby the point . It is common to choose each function to be Gaussian function centered at the with some .
In Radial Basis Function Networks Each hidden unit produces an activation determined by a Gaussian function centered at some instance xu. Therefore, its activation will be close to zero unless the input x is near xu. The output unit produces a linear combination of the hidden unit activations.
4.2.4Case-based learning
Case-Based Reasoning classifiers (CBR) exploits a database of problem solutions to solve new problems. It stores tuples or cases for problem-solving as complex symbolic descriptions.
Working:
When a new case arises to classify, a CBR will first check if an identical training case exists. If found, then the accompanying solution to that case is returned. If no identical case is found, then the CBR will search for training cases having components that are similar to that new case. Assuming that these training cases may be considered as neighbours of the new case.
If cases are represented as graphs, it involves searching for sub-graphs that are similar to sub-graphs of new cases. The CBR efforts to combine solutions of the neighbouring training cases to propose a solution for the new case. If compatibilities are located within the individual solutions, then backtracking to search for other solutions may be necessary.
The CBR may employ background knowledge and problem-solving strategies to propose a feasible solution.
Applications of CBR includes:
- Problem resolution for customer service help desks, where cases describe product-related diagnostic problems.
- It is also applied to areas such as engineering and law, where cases are either technical designs or legal rulings, respectively.
- Medical educations, where patient case histories and treatments are used to help diagnose and treat new patients.
Issues in CBR
- Finding a good similarity metric (i.e. for matching sub-graphs) and suitable methods for combining solutions.
- Selecting salient features for indexing training cases and the development of efficient indexing techniques.
CBR becomes more intelligent as the number of the trade-off between accuracy and efficiency evolves as the number of stored cases becomes very large. But after a certain point, the system’s efficiency will suffer as the time required to search for and process relevant cases increases.
KEY TAKEAWAYS
- Computational learning theory also known as statistical learning theory is a mathematical framework for quantifying learning tasks and algorithms.
- Sample complexity is growth in number of required training examples with its defined problem size of a learning problem.
References:
- Tom M. Mitchell, ―Machine Learning, McGraw-Hill Education (India) Private Limited, 2013.
- Ethem Alpaydin, ―Introduction to Machine Learning (Adaptive Computation And Machine Learning), The MIT Press 2004.
- Stephen Marsland, ―Machine Learning: An Algorithmic Perspective, CRC Press, 2009.
- Bishop, C., Pattern Recognition and Machine Learning. Berlin: Springer- Verlag.