Back

Sanskar

Keen Learner and Exp...ย โ€ขย 21d

Day 4 of learning AI/ML as a beginner. Topic: text preprocessing stemming using NLTK. I have learned about tokenization and now I am learning about text preprocessing in ML. Text preprocessing is cleaning up of raw text (raw text is the one entered by the user) to make it usable in Natural Language processing (NLP) and in Machine Learning (ML) models. Stemming is the process of removing prefix and suffix from a word in order to achieve its root word. For example: eating consists of a suffix "ing" and its root word is eat. We use stemming to group similar meanings words and to reduce the size of vocabulary (unique word in a document or corpus). Stemming can be achieved using various libraries in Natural Language Tool Kit (NLTK). Such libraries includes: 1. PorterStemmer: this is one of the oldest and most popular stemmer used in removing common suffix however it's performance decline as the level of words increases (sometimes this messes up the words and produce results which may not be real). 2. RegexpStemmer: this is a very simple yet a powerful rule based stemmer. This uses regular expression's rules to identify the prefix and suffix in a word and removes it in order to find the root word. This is flexible and better than PorterStemmer however it also makes some mistakes. 3. SnowballStemmer a.k.a Porter2 Stemmer: as the name suggests this is an improved version of PorterStemmer. This is more consistent and accurate as compare to PorterStemmer and also supports multiple languages. I welcome all the questions and suggestions which will help me understand these concepts more clearly and develop a deeper understanding. Also here's my code and it's result.

2 Replies
10
Replies (2)

More like this

Recommendations from Medial

Image Description
Image Description

Sanskar

Keen Learner and Exp...ย โ€ขย 22d

Day 3 of learning AI/ML as a beginner. Topic: NLP (Tokenization) Tokenization is breaking paragraph (corpus) or sentence (document) into smaller units called tokens. In order to perform tokenization we use nltk (natural language toolkit) python li

See More
10 Replies
22
42
1
Image Description
Image Description

Sanskar

Keen Learner and Exp...ย โ€ขย 16d

Day 9 of learning AI/ML as a beginner. Topic: Bag of Words practical. Yesterday I shared the theory about bag of words and now I am sharing about the practical I did I know there's still a lot to learn and I am not very much satisfied with the topi

See More
4 Replies
20
1

Sanskar

Keen Learner and Exp...ย โ€ขย 20d

Day 5 of learning AI/ML as a beginner. Topic: lemmatization and stopwords. Lemmatization is same as stemming however in lemmatization a word is reduced to its base form also known as lemma. This is a dictionary based process. This is accurate then

See More
Reply
2

Sanskar

Keen Learner and Exp...ย โ€ขย 19d

Day 6 of learning AI/ML as a beginner. Topic: pos tagging and name entity recognition. Pos (Part of Speech) tagging is process of labeling each word in a sentence(document with its role). Name entity recognition is the process where the system ide

See More
Reply
2
Image Description

Comet

#freelancerย โ€ขย 1y

Text Generation What It Is: Text generation involves using AI models to create humanlike text based on input prompts. How It Works: Models like GPT-3 use Transformer architectures. Theyโ€™re pre-trained on vast text datasets to learn grammar, conte

See More
1 Reply
4

punna bhagath

MERN STACKย โ€ขย 9m

Which cv is best Latex or word file And what is the perfect editor for latex and word file

Reply
2

Sanskar

Keen Learner and Exp...ย โ€ขย 23d

Day 2 of learning AI/ML as a beginner. Topic: text preprocessing (tokenization) in NLP. I have moved further and decided to learn about Natural Language Process(NLP) which is used especially for translations, chatbots, and help them to generate hum

See More
Reply
4
Image Description
Image Description

T.K.ANJANAA SREE

Hey I am on Medialย โ€ขย 1y

Why should create a pdf viewer app integrated with AI So everytime we come across a new sentence or new word we come out and search for it.. so using AI selecting the text and double taking leads to result of meaning of the sentence Why shouldn't we

See More
7 Replies
1
5

gray man

I'm just a normal gu...ย โ€ขย 4m

The Delhi High Court has overturned an arbitration tribunal's decision in the continuing legal battle stemming from the unsuccessful merger attempt between hotel giant OYO and competitor Zostel Hospitality. OYO stated that the High Court sided with

See More
Reply
9

Download the medial app to read full posts, comements and news.