Back

Sanskar

Keen Learner and Exp... • 18h

Day 2 of learning AI/ML as a beginner. Topic: text preprocessing (tokenization) in NLP. I have moved further and decided to learn about Natural Language Process(NLP) which is used especially for translations, chatbots, and help them to generate human like responses (in human readable language). I have also created a roadmap of learning NLP which I will be following to learn it in a more structured manner. I have already started with text preprocessing theory more specifically of tokenization. Tokenization is the process of breaking down text into smaller units called tokens. These tokens can be sentences or even words depending upon the level of tokenization applied. Tokenization have four main technical jargons namely: 1. Corpus - this refers to paragraphs. 2. Documents - this refers to sentences. 3. Vocabulary - these are the unique words used in a sentence or paragraph. 4. Words - these are the normal words we use. Tokenization typically depends upon the use of punctuation in order to create tokens. I have scratched the surface of NLP and will most probably apply this practically in my python code. I will warmly welcome all the questions, suggestions, recommendations and "constructive" criticism (the one which contains the problem and its likely solution, I will research the rest). And also here are my notes which I made while learning this.

Reply
2

More like this

Recommendations from Medial

Image Description
Image Description

Pravin Bhosale

Hey I am on Medial • 12m

Unableto post due to text limitations of 1000 words, uploading a screenshot. Thank you for understanding!

5 Replies
3
Image Description

Rakesh More

Build Brands First. ... • 4m

How many of you are aware of the concept called Tokenization of real world assets (RWA)? Get in touch. It's the future.

1 Reply
2
Image Description

Mohammed Zaid

Shitposter of Medial • 4m

what are the requirements of being NLP Scientist

1 Reply
2

Binit shrestha

Believe in yourself • 1y

Kirill Bensonoff, co-founder of New Silver, highlights how blockchain technology is revolutionizing real estate through smart contracts and tokenization. Smart contracts automate and speed up transactions like property transfers by executing predefin

See More
Reply
3
Image Description

Ahad Rahman

E-commerce Marketpla... • 1y

Mark My Words - "Content Distribution System is the Only Thing which will decide the success of your company"

1 Reply
1
4
Image Description

Shubham Kolhekar

 • 

Testbook • 7m

Someone tell me which are the free video generator AI tools? in which text to video generator and that too totally free of cost ?

1 Reply
2
Image Description

Sharad Gupta

AI Engineer • 4m

Hello everyone this is my first post I am upload the my YouTube channel will be post in video My channel researches various types of Ai It explains how that LLM. was created, like in the latest video, which I am providing you this link, LLM exp

See More
2 Replies
3

Account Deleted

Hey I am on Medial • 9m

🤖 Named Entity Recognition (NER) plays a crucial role in Natural Language Processing by helping machines understand and categorize key information from text. Discover its significance and applications in our latest article! Read more: http://news.e

See More
Reply
1

Swastik Biswas

CTO @OctranTechnolog... • 5m

OpenAI has recently released their latest reasoning model, "o1-pro", in their developer APIs. It is estimated to be the most expensive AI model yet. OpenAI is charging $150 per million tokens (~750,000 words) fed into the model and $600 per million

See More
Reply
5
Image Description
Image Description

Dr Bappa Dittya Saha

We're gonna extinct ... • 1y

Studying content 101 Soon videos will be replaced by Text on ur screens- I mean stories! The Drama, The Music, The lights! The Darks! All in words of same sheds! And the target audience will be far better when it raises equations about their sociod

See More
5 Replies
5

Download the medial app to read full posts, comements and news.