Keen Learner and Exp... • 15h
Day 8 of learning AI/ML as a beginner. Topic: Bag of Words (BOW) Yesterday I told you guys about One Hot Encoding which is one way to convert text into vector however with serious disadvantages and to cater to those disadvantages there's another one know as Bag of words (BOW). Bag of words is an NLP technique used to convert text into collection of words and represent it numerically by counting the frequency of word (highest frequency words come first in vocabulary) it ignores grammar and order of the words. There are two types of Bag of Words (BOW): 1. Binary BOW: it converts words into binary form (1 and 0). 2. Normal BOW: This will count the frequency and update the count. Just like One Hot Encoder, Bag of Words also have some advantages and disadvantages. It's advantages are that it is simple and intuitive to use and it has fixed size inputs i.e. it can convert a text of any length into a numerical vector of fixed length (using vocabulary) this help ML algorithms to process text data efficiently and uniformly. It's disadvantages include the problem of sparse matrix and overfitting i.e. the computer is just memorizing the data and not learning the bigger picture. As BOW don't care about the order of the words it changes it according to the vocabulary which can completely change the meaning of the text and also it means that no real semantic meaning is captured as it will still considered both the text meaning as similar. And it also have the problem of out of vocabular i.e. the word outside the vocabulary will get ignored. Here are my notes which will help you understand Bag of Words (BOW) in more details.
Download the medial app to read full posts, comements and news.