Back

Sanskar

Keen Learner and Exp...ย โ€ขย 23d

Day 2 of learning AI/ML as a beginner. Topic: text preprocessing (tokenization) in NLP. I have moved further and decided to learn about Natural Language Process(NLP) which is used especially for translations, chatbots, and help them to generate human like responses (in human readable language). I have also created a roadmap of learning NLP which I will be following to learn it in a more structured manner. I have already started with text preprocessing theory more specifically of tokenization. Tokenization is the process of breaking down text into smaller units called tokens. These tokens can be sentences or even words depending upon the level of tokenization applied. Tokenization have four main technical jargons namely: 1. Corpus - this refers to paragraphs. 2. Documents - this refers to sentences. 3. Vocabulary - these are the unique words used in a sentence or paragraph. 4. Words - these are the normal words we use. Tokenization typically depends upon the use of punctuation in order to create tokens. I have scratched the surface of NLP and will most probably apply this practically in my python code. I will warmly welcome all the questions, suggestions, recommendations and "constructive" criticism (the one which contains the problem and its likely solution, I will research the rest). And also here are my notes which I made while learning this.

Reply
4

More like this

Recommendations from Medial

Image Description
Image Description

Sanskar

Keen Learner and Exp...ย โ€ขย 22d

Day 3 of learning AI/ML as a beginner. Topic: NLP (Tokenization) Tokenization is breaking paragraph (corpus) or sentence (document) into smaller units called tokens. In order to perform tokenization we use nltk (natural language toolkit) python li

See More
10 Replies
22
42
1
Image Description

Sanskar

Keen Learner and Exp...ย โ€ขย 21d

Day 4 of learning AI/ML as a beginner. Topic: text preprocessing stemming using NLTK. I have learned about tokenization and now I am learning about text preprocessing in ML. Text preprocessing is cleaning up of raw text (raw text is the one entered

See More
2 Replies
10

Sanskar

Keen Learner and Exp...ย โ€ขย 17d

Day 8 of learning AI/ML as a beginner. Topic: Bag of Words (BOW) Yesterday I told you guys about One Hot Encoding which is one way to convert text into vector however with serious disadvantages and to cater to those disadvantages there's another on

See More
Reply
1
12

Sanskar

Keen Learner and Exp...ย โ€ขย 20d

Day 5 of learning AI/ML as a beginner. Topic: lemmatization and stopwords. Lemmatization is same as stemming however in lemmatization a word is reduced to its base form also known as lemma. This is a dictionary based process. This is accurate then

See More
Reply
2
Image Description
Image Description

Sanskar

Keen Learner and Exp...ย โ€ขย 16d

Day 9 of learning AI/ML as a beginner. Topic: Bag of Words practical. Yesterday I shared the theory about bag of words and now I am sharing about the practical I did I know there's still a lot to learn and I am not very much satisfied with the topi

See More
4 Replies
20
1
Image Description
Image Description

Sanskar

Keen Learner and Exp...ย โ€ขย 15d

Day 10 of learning AI/ML as a beginner. Topic: N-Grams in Bag of Words (BOW). Yesterday I have talked about an amazing text to vector converter in machine learning i.e. Bag of Words (BOW). N-Gram is just a part of BOW. In BOW the program sees sente

See More
3 Replies
8
Image Description
Image Description

Pravin Bhosale

Hey I am on Medialย โ€ขย 1y

Unableto post due to text limitations of 1000 words, uploading a screenshot. Thank you for understanding!

5 Replies
3
Image Description

Rakesh More

Build Brands First. ...ย โ€ขย 5m

How many of you are aware of the concept called Tokenization of real world assets (RWA)? Get in touch. It's the future.

1 Reply
2
Image Description

Mohammed Zaid

Shitposter of Medialย โ€ขย 5m

what are the requirements of being NLP Scientist

1 Reply
2

Binit shrestha

Believe in yourselfย โ€ขย 1y

Kirill Bensonoff, co-founder of New Silver, highlights how blockchain technology is revolutionizing real estate through smart contracts and tokenization. Smart contracts automate and speed up transactions like property transfers by executing predefin

See More
Reply
3

Download the medial app to read full posts, comements and news.