# Linear classifier

 related topics {math, number, function} {rate, high, increase} {service, military, aircraft} {household, population, female}

In the field of machine learning, the goal of statistical classification is to use an object's characteristics to identify which class (or group) it belongs to. A linear classifier achieves this by making a classification decision based on the value of a linear combination of the characteristics. An object's characteristics are also known as feature values and are typically presented to the machine in a vector called a feature vector.

## Contents

### Definition

If the input feature vector to the classifier is a real vector $\vec x$, then the output score is

where $\vec w$ is a real vector of weights and f is a function that converts the dot product of the two vectors into the desired output. (In other words, $\vec{w}$ is a one-form or linear functional mapping $\vec x$ onto R.) The weight vector $\vec w$ is learned from a set of labeled training samples. Often f is a simple function that maps all values above a certain threshold to the first class and all other values to the second class. A more complex f might give the probability that an item belongs to a certain class.

For a two-class classification problem, one can visualize the operation of a linear classifier as splitting a high-dimensional input space with a hyperplane: all points on one side of the hyperplane are classified as "yes", while the others are classified as "no".

A linear classifier is often used in situations where the speed of classification is an issue, since it is often the fastest classifier, especially when $\vec x$ is sparse. However, decision trees can be faster. Also, linear classifiers often work very well when the number of dimensions in $\vec x$ is large, as in document classification, where each element in $\vec x$ is typically the number of occurrences of a word in a document (see document-term matrix). In such cases, the classifier should be well-regularized.

### Generative models vs. discriminative models

There are two broad classes of methods for determining the parameters of a linear classifier $\vec w$ [1][2]. Methods of the first class model conditional density functions $P(\vec x|{\rm class})$. Examples of such algorithms include: