Skip to main content

Posts

Showing posts from April, 2011

Naive Bayes (and author detection)

I've been playing around with various classification algorithms lately, so I wrote a really simplified discrete naive bayes classifier in Python. No emphasis on sample correction, simplicity was key here, but it still works quite well. from operator import itemgetter from collections import defaultdict class BayesClassifier: def __init__(self): self.total_count = 0 # Observations of individual attributes self.class_count = defaultdict(int) # Observations of cls self.attrs_count = defaultdict(int) # Observations of (cls, attrs) self.correction = 0.0001 # Prevent multiplication by 0.0 def train(self, cls, attrs): ''' Add observation of 'attrs' as being an instance of 'cls' ''' self.class_count[cls] += 1 for attr in attrs: self.attrs_count[(cls, attr)] += 1 self.total_count += 1 def rate(self, cls, attrs): ''' Return probability rating of 'attrs' bei