🚀 Medial Secures Investment on Shark Tank India - Fueling the Future of Professional Social Networking. 🔥

News

Messages

Try our Valuation Calculator →

Back

Sanskar

Keen Learner and Exp... • 6m

Day 3 of learning AI/ML as a beginner. Topic: NLP (Tokenization) Tokenization is breaking paragraph (corpus) or sentence (document) into smaller units called tokens. In order to perform tokenization we use nltk (natural language toolkit) python library. nltk is not a built in library and therefore needed to be installed locally in the desktop. Therefore I first used pip to install nltk and the from nltk I imported all those things which I needed in order to perform tokenization. I required sent_tokenize, word_tokenize, wordpuct_tokenize and TreebankWordTokenizer. Sent_tokenize: this breaks a corpus (paragraph) into document (sentences). Word_tokenize: this breaks a document into words. Wordpunct_tokenize: this does the same thing as word tokenize however this also considers punctuations ("'" "." "!" etc). TreebankWordTokenizer: This does not assume "." as a new word, it assumes it a new word only when it is present with the very last word. And here's my code and it's result. I warmly welcome all the suggestions and questions regarding this as they will help me deepen up my knowledge while also help me improve my learning process.

10 Replies

Replies (10)

Rahul Agarwal

Founder | Agentic AI... • 6m

Great breakdown of tokenization and your hands-on approach to learning NLP. Keep experimenting and exploring different tools-it’s a solid way to build deep understanding in AI/ML.

1 Reply

Rahul Agarwal

Founder | Agentic AI... • 6m

Great breakdown of tokenization concepts and your learning process. It’s impressive to see you diving deep into NLP fundamentals-keep exploring and sharing your progress!

1 Reply

nerd end

is required • 6m

heyy i am a security engineer but my brother wants to get into AL ML can u tell me the courses u have been following??

1 Reply

Abhishek

Hey I am on Medial • 6m

Sanskar , what is your source of learning.

1 Reply

Roy

Business politics • 6m

🤔😐

1 Reply

Recommendations from Medial

Sanskar

Keen Learner and Exp... • 6m

Day 4 of learning AI/ML as a beginner. Topic: text preprocessing stemming using NLTK. I have learned about tokenization and now I am learning about text preprocessing in ML. Text preprocessing is cleaning up of raw text (raw text is the one entered

2 Replies

Sanskar

Keen Learner and Exp... • 6m

Day 2 of learning AI/ML as a beginner. Topic: text preprocessing (tokenization) in NLP. I have moved further and decided to learn about Natural Language Process(NLP) which is used especially for translations, chatbots, and help them to generate hum

Sanskar

Keen Learner and Exp... • 6m

Day 5 of learning AI/ML as a beginner. Topic: lemmatization and stopwords. Lemmatization is same as stemming however in lemmatization a word is reduced to its base form also known as lemma. This is a dictionary based process. This is accurate then

Sanskar

Keen Learner and Exp... • 5m

Day 13 of learning AI/ML as a beginner. Topic: Word Embedding. I have discussed about one hot encoding, Bag of words and TF-IDF in my recent posts. These are the count or frequency tools that are a part of word embedding but before moving forward l

Vrishank

Startups/VC/tech • 1y

Choosing a name for a startup is one of the most crucial initial tasks, Indian startups have some impressive names, let's explore the meaning behind a few of them: 1) Zepto - The name "Zepto" is derived from 'zeptosecond' which is the shortest uni

11 Replies

Hiral Jain

CA inter • 1y

Ever wondered how is country's budget prepared? Budget-1 Fact: there is no such word as budget in our Indian constitution Acc to Article 112 of the Indian constitution it is known as an “Annual Financial statement” In our case though the budget

Niket Raj Dwivedi
•

Medial • 1y

Medial's first pitch deck was a two page word document. Here's what it had in total - 1) About the founders. (3-4 lines each with LinkedIN links). 2) About the product and "Why Now?". 3) Our vision in longterm. 4) How'd we utilise the capital.

35 Replies

Sanskar

Keen Learner and Exp... • 5m

Day 12 of learning AI/ML as a beginner. Topic: TF-IDF practical. Yesterday I shared my theory notes and today I have done the practical of TF-IDF. For the practical I reused my spam classifier code and for TF-IDF I first imported it from the sklear

2 Replies

Sanskar

Keen Learner and Exp... • 6m

Day 6 of learning AI/ML as a beginner. Topic: pos tagging and name entity recognition. Pos (Part of Speech) tagging is process of labeling each word in a sentence(document with its role). Name entity recognition is the process where the system ide

1 Reply

Ms Dhoni

Hii I am Dhoni saroj... • 1y

1. Concept 💡 The Transparent Kitchen features a glass-enclosed kitchen where customers can watch their meals being prepared in real-time. 2. Transparency 🔍 This transparency builds trust by ensuring high food quality, hygiene, and cleanliness durin