Keen Learner and Exp... • 1d
Day 3 of learning AI/ML as a beginner. Topic: NLP (Tokenization) Tokenization is breaking paragraph (corpus) or sentence (document) into smaller units called tokens. In order to perform tokenization we use nltk (natural language toolkit) python library. nltk is not a built in library and therefore needed to be installed locally in the desktop. Therefore I first used pip to install nltk and the from nltk I imported all those things which I needed in order to perform tokenization. I required sent_tokenize, word_tokenize, wordpuct_tokenize and TreebankWordTokenizer. Sent_tokenize: this breaks a corpus (paragraph) into document (sentences). Word_tokenize: this breaks a document into words. Wordpunct_tokenize: this does the same thing as word tokenize however this also considers punctuations ("'" "." "!" etc). TreebankWordTokenizer: This does not assume "." as a new word, it assumes it a new word only when it is present with the very last word. And here's my code and it's result. I warmly welcome all the suggestions and questions regarding this as they will help me deepen up my knowledge while also help me improve my learning process.
Hii I am Dhoni saroj... • 7m
1. Concept 💡 The Transparent Kitchen features a glass-enclosed kitchen where customers can watch their meals being prepared in real-time. 2. Transparency 🔍 This transparency builds trust by ensuring high food quality, hygiene, and cleanliness durin
See MoreBuilding Nestsure • 2m
🧵 Post 1️⃣3️⃣ : So previously we had spoken about incorporating laws and also spoken why and how to incorporate in USA and UAE in post 3 & 4, so now I'll say about Why and how we can incorporate our company in Singapore. Global incorporation : Sing
See MoreDownload the medial app to read full posts, comements and news.