Keen Learner and Exp... • 21d
Day 8 of learning AI/ML as a beginner. Topic: Bag of Words (BOW) Yesterday I told you guys about One Hot Encoding which is one way to convert text into vector however with serious disadvantages and to cater to those disadvantages there's another one know as Bag of words (BOW). Bag of words is an NLP technique used to convert text into collection of words and represent it numerically by counting the frequency of word (highest frequency words come first in vocabulary) it ignores grammar and order of the words. There are two types of Bag of Words (BOW): 1. Binary BOW: it converts words into binary form (1 and 0). 2. Normal BOW: This will count the frequency and update the count. Just like One Hot Encoder, Bag of Words also have some advantages and disadvantages. It's advantages are that it is simple and intuitive to use and it has fixed size inputs i.e. it can convert a text of any length into a numerical vector of fixed length (using vocabulary) this help ML algorithms to process text data efficiently and uniformly. It's disadvantages include the problem of sparse matrix and overfitting i.e. the computer is just memorizing the data and not learning the bigger picture. As BOW don't care about the order of the words it changes it according to the vocabulary which can completely change the meaning of the text and also it means that no real semantic meaning is captured as it will still considered both the text meaning as similar. And it also have the problem of out of vocabular i.e. the word outside the vocabulary will get ignored. Here are my notes which will help you understand Bag of Words (BOW) in more details.
Keen Learner and Exp... • 18d
Day 11 of learning AI/ML as a beginner. Topic: TF-IDF (Term Frequency - Inverse Document Frequency). Yesterday I have talked about N-grams and how they are useful in Bag of Words (BOW) however it has some serious drawbacks and for that reason I am
See MoreKeen Learner and Exp... • 19d
Day 10 of learning AI/ML as a beginner. Topic: N-Grams in Bag of Words (BOW). Yesterday I have talked about an amazing text to vector converter in machine learning i.e. Bag of Words (BOW). N-Gram is just a part of BOW. In BOW the program sees sente
See MoreKeen Learner and Exp... • 16d
Day 13 of learning AI/ML as a beginner. Topic: Word Embedding. I have discussed about one hot encoding, Bag of words and TF-IDF in my recent posts. These are the count or frequency tools that are a part of word embedding but before moving forward l
See MoreKeen Learner and Exp... • 20d
Day 9 of learning AI/ML as a beginner. Topic: Bag of Words practical. Yesterday I shared the theory about bag of words and now I am sharing about the practical I did I know there's still a lot to learn and I am not very much satisfied with the topi
See MoreKeen Learner and Exp... • 22d
Day 7 of learning AI/ML as a beginner. Topic: One Hot Encoding and Future roadmap. Now that I have learnt how to clean up the text input a little its time for converting that data into vectors (I am so glad that I have learned it despite getting cr
See MoreDownload the medial app to read full posts, comements and news.