Thank you! 🙌

Introduction 🌟

Natural Language Processing (NLP) is a field of artificial intelligence that focuses on the interaction between computers and humans through natural language💬 In this repository, we explore basic NLP tasks using the NLTK (Natural Language Toolkit) library in Python🐍.

📚 In this repository, you'll find code examples that demonstrate various NLP techniques using NLTK. The examples cover the following topics:

Segmentation: Splitting text into sentences.
Tokenization: Breaking sentences into words.
Removal of Stop Words: Removing common words that don't carry much meaning.
Stemming and Lemmatization: Reducing words to their root forms.
Part of Speech Tagging: Tagging each word with its part of speech.
Named Entity Recognition: Identifying named entities like persons, organizations, locations, etc 🌍

Table of Contents 📜
Segmentation✂️
Punctuation Removal✨
Tokenization 🧙‍♂️
Removal of Stop Words🔇
Stemming and Lemmatization🌱
Part of Speech Tagging🏷️
Named Entity Recognition🌟
Examples🌠
Getting Started🚀

Segmentation ✂️

In NLP, breaking text into sentences and words is a common initial step. NLTK provides tools to facilitate this.

import nltk
nltk.download('punkt')
from nltk.tokenize import sent_tokenize, word_tokenize

text = "Millions of people across the UK and beyond have celebrated..."
sentences = sent_tokenize(text)
words = word_tokenize(sentences[2])
print(sentences)
print(words) python

Punctuation Removal✨

Eliminating punctuation is often necessary for various text processing tasks.

import re

text = re.sub(r"[^a-zA-Z0-9]", " ", sentences[2])
print(text)

Tokenization🧙‍♂️

Tokenization involves splitting text into individual words.

from nltk.tokenize import word_tokenize

words = word_tokenize(text)
print(words)

Removal of Stop Words🔇

Stop words are common words frequently removed in NLP analysis.

nltk.download('stopwords')
from nltk.corpus import stopwords

words = [w for w in words if w not in stopwords.words("english")]
print(words)

Stemming and Lemmatization🌱

Stemming and lemmatization are linguistic processes to reduce words to their base forms.

nltk.download('averaged_perceptron_tagger')
pos_tags = nltk.pos_tag(words)
print(pos_tags)

Part of Speech Tagging🏷️

Part of speech tagging involves labeling words with their grammatical attributes.

nltk.download('averaged_perceptron_tagger')
pos_tags = nltk.pos_tag(words)
print(pos_tags)

Named Entity Recognition🌟

Named Entity Recognition (NER) identifies named entities within text.

nltk.download('words')
from nltk import ne_chunk

ner_tree = ne_chunk(pos_tags)
print(ner_tree)

Examples🌠

Here are a few examples showcasing Named Entity Recognition:

text = "Twitter CEO Elon Musk arrived at the Staples Center..."
ner_tree = ne_chunk(pos_tag(word_tokenize(text)))
print(ner_tree)

Feel free to explore and expand upon these exercises to deepen your understanding of NLP concepts and NLTK library utilization📚✨

Happy learning!

Getting Started🚀

To run the code examples in this repository, make sure you have Python and NLTK installed. You can install NLTK using the following command:

pip install nltk

Thank you! 🙌

If you appreciated this, feel free to follow!🌟🔮

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
nlp_hands_on_exercise.ipynb		nlp_hands_on_exercise.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

nlp_hands_on_exercise.ipynb

nlp_hands_on_exercise.ipynb

Repository files navigation

Introduction 🌟

Table of Contents 📜

Segmentation ✂️

Punctuation Removal✨

Tokenization🧙‍♂️

Removal of Stop Words🔇

Stemming and Lemmatization🌱

Part of Speech Tagging🏷️

Named Entity Recognition🌟

Examples🌠

Getting Started🚀

Thank you! 🙌

About

Releases

Packages

Languages

ThomasHeim11/NLP-Beginner-Guide

Folders and files

Latest commit

History

README.md

README.md

nlp_hands_on_exercise.ipynb

nlp_hands_on_exercise.ipynb

Repository files navigation

Introduction 🌟

Table of Contents 📜

Segmentation ✂️

Punctuation Removal✨

Tokenization🧙‍♂️

Removal of Stop Words🔇

Stemming and Lemmatization🌱

Part of Speech Tagging🏷️

Named Entity Recognition🌟

Examples🌠

Getting Started🚀

Thank you! 🙌

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages