Back

Sanskar

Keen Learner and Exp... • 2d

Day 5 of learning AI/ML as a beginner. Topic: lemmatization and stopwords. Lemmatization is same as stemming however in lemmatization a word is reduced to its base form also known as lemma. This is a dictionary based process. This is accurate then stemming however on the cost of speed (i.e. it is slower as compared to stemming). Lemmatization also involve parts of speech(pos) where "v" stands for verb, "n" stands for nouns, "a" stands for adjectives, "r" stands for adverb. Lemmatization works well when you use the more suitable pos although it also had some tagging feature which is yet to be learned by me so no comments on it for this time. Then there is stop words which consists of all those very commonly used words in a language (for example in English they can be referred to as is, am, are, was, were, the etc.) Stop words are usually removed in order to reduce noise in the text, to speed up processing and to sort out the important words in a document(sentence). I used lemmatization and stop words together to clean a corpus (paragraph). and take out the main words from every document (I also used sent_tokenize to break the corpus into documents i.e. sentences and those sentences are further broken into word tokens). These words are then put in a new sentences. I have also used PosterStemmer and SnowballStemmer with a motive to compare results and to practice what I have learnt in a few days. Here's my code and its result.

Reply
2

More like this

Recommendations from Medial

Sanskar

Keen Learner and Exp... • 1d

Day 6 of learning AI/ML as a beginner. Topic: pos tagging and name entity recognition. Pos (Part of Speech) tagging is process of labeling each word in a sentence(document with its role). Name entity recognition is the process where the system ide

See More
Reply
2
Image Description

Sanskar

Keen Learner and Exp... • 3d

Day 4 of learning AI/ML as a beginner. Topic: text preprocessing stemming using NLTK. I have learned about tokenization and now I am learning about text preprocessing in ML. Text preprocessing is cleaning up of raw text (raw text is the one entered

See More
2 Replies
10

Sanskar

Keen Learner and Exp... • 5d

Day 2 of learning AI/ML as a beginner. Topic: text preprocessing (tokenization) in NLP. I have moved further and decided to learn about Natural Language Process(NLP) which is used especially for translations, chatbots, and help them to generate hum

See More
Reply
4
Image Description
Image Description

Sanskar

Keen Learner and Exp... • 4d

Day 3 of learning AI/ML as a beginner. Topic: NLP (Tokenization) Tokenization is breaking paragraph (corpus) or sentence (document) into smaller units called tokens. In order to perform tokenization we use nltk (natural language toolkit) python li

See More
10 Replies
22
42
1
Image Description

Anirban Kanjilal

Divine Architect of ... • 2m

I now have a corpus to build a decent chatbot. Scaling up is going to be a challenge.

2 Replies
3

Prince Singh Chouhan

Front-end Developer • 1y

What is a Consensus algorithm in Blockchain? A consensus algorithm is a protocol used in blockchain networks, to ensure agreement among nodes on transaction validity and ledger state. It enables decentralized operation by providing agreement through

See More
Reply
5
Image Description
Image Description

Udyamee

Baki sab thik ? • 1y

Don't know if whatever he's saying is true or not, but recently he has made some bold statements. If he stands by his words, then who is Elon? 🙌

18 Replies
11

SHARDUL NARKE

Rising engineer • 1y

Less use of chatgpt for writing descriptions or long form sentences is good for our self. Chatgpt and other ai model should be only used when you are stuck at a point. Thats my opinion on ai.

Reply
5

Syed Mohammad Abbas Rizvi

 • 

PhysicsWallah • 4m

What is Vibe Coding ? Vibe coding (also vibecoding) is an AI-dependent programming technique where a person describes a problem in a few sentences as a prompt to a large language model (LLM) tuned for coding. The LLM generates software, shifting the

See More
Reply
2
Image Description
Image Description

Vedant Tiwari

Founder of VedspaceA... • 7m

One bad question and you f***ed up.... The Ranveer-Samay controversy is a masterclass in what NOT to say on a public platform. Those who don't know (first of all, are you ok??) I short.... Samay run a show India's got latent, Ranveer was invited a

See More
2 Replies
1
13

Download the medial app to read full posts, comements and news.