Back

Sanskar

Keen Learner and Exp... • 1d

Day 3 of learning AI/ML as a beginner. Topic: NLP (Tokenization) Tokenization is breaking paragraph (corpus) or sentence (document) into smaller units called tokens. In order to perform tokenization we use nltk (natural language toolkit) python library. nltk is not a built in library and therefore needed to be installed locally in the desktop. Therefore I first used pip to install nltk and the from nltk I imported all those things which I needed in order to perform tokenization. I required sent_tokenize, word_tokenize, wordpuct_tokenize and TreebankWordTokenizer. Sent_tokenize: this breaks a corpus (paragraph) into document (sentences). Word_tokenize: this breaks a document into words. Wordpunct_tokenize: this does the same thing as word tokenize however this also considers punctuations ("'" "." "!" etc). TreebankWordTokenizer: This does not assume "." as a new word, it assumes it a new word only when it is present with the very last word. And here's my code and it's result. I warmly welcome all the suggestions and questions regarding this as they will help me deepen up my knowledge while also help me improve my learning process.

4 Replies
20
40
1
Replies (4)

More like this

Recommendations from Medial

Sanskar

Keen Learner and Exp... • 7h

Day 4 of learning AI/ML as a beginner. Topic: text preprocessing stemming using NLTK. I have learned about tokenization and now I am learning about text preprocessing in ML. Text preprocessing is cleaning up of raw text (raw text is the one entered

See More
Reply
3

Sanskar

Keen Learner and Exp... • 2d

Day 2 of learning AI/ML as a beginner. Topic: text preprocessing (tokenization) in NLP. I have moved further and decided to learn about Natural Language Process(NLP) which is used especially for translations, chatbots, and help them to generate hum

See More
Reply
4

Hiral Jain

Content writer • 7m

Ever wondered how is country's budget prepared? Budget-1 Fact: there is no such word as budget in our Indian constitution Acc to Article 112 of the Indian constitution it is known as an “Annual Financial statement” In our case though the budget

See More
Reply
1
2
Image Description
Image Description

Vrishank

Startups/VC/tech • 1y

Choosing a name for a startup is one of the most crucial initial tasks, Indian startups have some impressive names, let's explore the meaning behind a few of them: 1) Zepto - The name "Zepto" is derived from 'zeptosecond' which is the shortest uni

See More
11 Replies
24
82
Image Description
Image Description

Niket Raj Dwivedi

 • 

Medial • 12m

Medial's first pitch deck was a two page word document. Here's what it had in total - 1) About the founders. (3-4 lines each with LinkedIN links). 2) About the product and "Why Now?". 3) Our vision in longterm. 4) How'd we utilise the capital.

See More
35 Replies
18
33
Image Description
Image Description

Ms Dhoni

Hii I am Dhoni saroj... • 7m

1. Concept 💡 The Transparent Kitchen features a glass-enclosed kitchen where customers can watch their meals being prepared in real-time. 2. Transparency 🔍 This transparency builds trust by ensuring high food quality, hygiene, and cleanliness durin

See More
6 Replies
6

Sanskar

Keen Learner and Exp... • 23d

Day 22 of learning python as a beginner. Topic: speech_recognition, webbrowser, pyttsx3 speech_recognition: this is a popular library used to convert audio to text. it helps in capturing audio from microphone or audio files. I am using google's web

See More
Reply

Rishabh Raj Pathak

Polymath • 12d

She dumped me last night. Not because I don't listen. Not because I'm always on my phone. Not even because I forgot our anniversary (twice). But because, in her exact words: "You only pay attention to the parts of what I say that you think are

See More
Reply
1
Image Description

Samanth Shetty

Building Nestsure • 2m

🧵 Post 1️⃣3️⃣ : So previously we had spoken about incorporating laws and also spoken why and how to incorporate in USA and UAE in post 3 & 4, so now I'll say about Why and how we can incorporate our company in Singapore. Global incorporation : Sing

See More
2 Replies
1
7
Image Description

Harsh Dwivedi

 • 

Medial • 9m

Top News of the Week: 1. Funding: - Quick commerce major Zepto reaffirmed its position as the most heavily backed startup of the year. Bagging a mega cheque of $350 Mn from investors like Motilal Oswal, Sachin Tendulkar, Mankind Pharma Family Offic

See More
1 Reply
7
31

Download the medial app to read full posts, comements and news.