Back

Sanskar

Keen Learner and Exp...ย โ€ขย 22d

Day 3 of learning AI/ML as a beginner. Topic: NLP (Tokenization) Tokenization is breaking paragraph (corpus) or sentence (document) into smaller units called tokens. In order to perform tokenization we use nltk (natural language toolkit) python library. nltk is not a built in library and therefore needed to be installed locally in the desktop. Therefore I first used pip to install nltk and the from nltk I imported all those things which I needed in order to perform tokenization. I required sent_tokenize, word_tokenize, wordpuct_tokenize and TreebankWordTokenizer. Sent_tokenize: this breaks a corpus (paragraph) into document (sentences). Word_tokenize: this breaks a document into words. Wordpunct_tokenize: this does the same thing as word tokenize however this also considers punctuations ("'" "." "!" etc). TreebankWordTokenizer: This does not assume "." as a new word, it assumes it a new word only when it is present with the very last word. And here's my code and it's result. I warmly welcome all the suggestions and questions regarding this as they will help me deepen up my knowledge while also help me improve my learning process.

10 Replies
22
42
1
Replies (10)
Image Description
1 Reply

More like this

Recommendations from Medial

Image Description

Sanskar

Keen Learner and Exp...ย โ€ขย 21d

Day 4 of learning AI/ML as a beginner. Topic: text preprocessing stemming using NLTK. I have learned about tokenization and now I am learning about text preprocessing in ML. Text preprocessing is cleaning up of raw text (raw text is the one entered

See More
2 Replies
10

Sanskar

Keen Learner and Exp...ย โ€ขย 23d

Day 2 of learning AI/ML as a beginner. Topic: text preprocessing (tokenization) in NLP. I have moved further and decided to learn about Natural Language Process(NLP) which is used especially for translations, chatbots, and help them to generate hum

See More
Reply
4

Sanskar

Keen Learner and Exp...ย โ€ขย 20d

Day 5 of learning AI/ML as a beginner. Topic: lemmatization and stopwords. Lemmatization is same as stemming however in lemmatization a word is reduced to its base form also known as lemma. This is a dictionary based process. This is accurate then

See More
Reply
2

Sanskar

Keen Learner and Exp...ย โ€ขย 12d

Day 13 of learning AI/ML as a beginner. Topic: Word Embedding. I have discussed about one hot encoding, Bag of words and TF-IDF in my recent posts. These are the count or frequency tools that are a part of word embedding but before moving forward l

See More
Reply
3

Hiral Jain

CA interย โ€ขย 8m

Ever wondered how is country's budget prepared? Budget-1 Fact: there is no such word as budget in our Indian constitution Acc to Article 112 of the Indian constitution it is known as an โ€œAnnual Financial statementโ€ In our case though the budget

See More
Reply
1
2
Image Description
Image Description

Vrishank

Startups/VC/techย โ€ขย 1y

Choosing a name for a startup is one of the most crucial initial tasks, Indian startups have some impressive names, let's explore the meaning behind a few of them: 1) Zepto - The name "Zepto" is derived from 'zeptosecond' which is the shortest uni

See More
11 Replies
24
82
Image Description
Image Description

Niket Raj Dwivedi

ย โ€ขย 

Medialย โ€ขย 1y

Medial's first pitch deck was a two page word document. Here's what it had in total - 1) About the founders. (3-4 lines each with LinkedIN links). 2) About the product and "Why Now?". 3) Our vision in longterm. 4) How'd we utilise the capital.

See More
35 Replies
18
33
Image Description
Image Description

Sanskar

Keen Learner and Exp...ย โ€ขย 13d

Day 12 of learning AI/ML as a beginner. Topic: TF-IDF practical. Yesterday I shared my theory notes and today I have done the practical of TF-IDF. For the practical I reused my spam classifier code and for TF-IDF I first imported it from the sklear

See More
2 Replies
1
13
1

Sanskar

Keen Learner and Exp...ย โ€ขย 19d

Day 6 of learning AI/ML as a beginner. Topic: pos tagging and name entity recognition. Pos (Part of Speech) tagging is process of labeling each word in a sentence(document with its role). Name entity recognition is the process where the system ide

See More
Reply
2
Image Description
Image Description

Ms Dhoni

Hii I am Dhoni saroj...ย โ€ขย 8m

1. Concept ๐Ÿ’ก The Transparent Kitchen features a glass-enclosed kitchen where customers can watch their meals being prepared in real-time. 2. Transparency ๐Ÿ” This transparency builds trust by ensuring high food quality, hygiene, and cleanliness durin

See More
6 Replies
6

Download the medial app to read full posts, comements and news.