What is Natural Language Processing? - Beginners Guide to NLP
Natural Language Processing (NLP) is a branch of artificial intelligence (AI) that is concerned with providing computers, the ability to understand and generate natural language in the forms of text or audio. It aims to make the interaction between humans and computers more fluent and natural. NLP combines computational linguistics and rule-based modeling of human language with statistical and deep learning models. All together these tools and technologies enable machines to process, understand and generate human language. This article gives a general overview of Natural language processing and we will be diving deeper into NLP and its various domains in our future articles.
Applications of NLP
- Predictive text: Features like autocorrect, autocomplete
- Search Results: suggestion as you are typing when searching
- Smart Assistants: Siri, Alexa, etc.
- Email Filters: spam filtering
- Text Summary and Paraphrasing
- Language Translation
- Sentiment Analysis
- many more.....
Components of NLP
NLP consists of two components as follow:
- Natural Language Understanding (NLU): It refers to that part of NLP where computers make sense of human language. It involves mapping the given input in natural language into useful representations and analyzing them.
- Natural Language Generation (NLG): It is that domain of NLP which creates meaningful phrases and sentences in the form of natural language from some internal representation. It involves various tasks like text planning, sentence planning, and text realization. After NLU is performed NLG seems much easier than NLU.
Steps in NLP
There are mainly five steps involved in NLP:
- Lexical Analysis: It includes recognizing and analyzing the structures of the words. Lexicon of a dialect/language means the collection of words and phrases in a dialect. Lexical analysis is partitioning the entire chunk of texts into sections, paragraphs, sentences, and words.
- Syntactic Analysis (Parsing): It entails analysis of words in the sentence for grammar and arranging those words in such a way that depicts the relationship between words. For instance, the sentence "The hospital goes to the patient" is disqualified by the English syntactic analyzer.
- Semantic Analysis: This step is responsible to sketch the precise meaning or the dictionary meaning from the given content. The content is checked for meaningfulness. This is usually done by mapping the syntactic structures of the content. For Example, a semantic analyzer disqualifies the phrase "hot ice cream".
- Discourse Integration: Any sentence's meaning is determined by the meaning of the sentence preceding it. In addition, it establishes the meaning of the sentence that follows. For instance, in the sentence "It is going to rain today, carry your umbrella". The first and second sentences complete each other.
- Pragmatic Analysis: In this step, what was spoken or given is re-interpreted to reflect its true meaning. It includes deriving those characters of language that necessitate real-world experience. For example the sentence "I can kill you right now.", may mean different things in different scenarios.
Important NLP Terminologies
The whole process of understanding and generating natural language is an extremely complex task and due to this, it is very common to use different techniques to handle different tasks and challenges. Here we collect some commonly used techniques and algorithms in NLP.
- Morphology: Study of construction of a word from primitive meaningful units.
- Morpheme: primitive meaningful unit
- Phonology: Study of organizing sound systematically
- Syntax: arranging and determining the structural role of words to form a sentence
- Semantics: combining words into a meaningful sentence or phrase
- Pragmatics: deals with making sense of a sentence in different scenarios
- Discourse: deals with how meaning is determined by the meaning of the sentence preceding it or following it
- World Knowledge: general knowledge about the natural world
- Bag of Words: It is a technique for text modeling. It is used for feature extraction from text data. It is a representation of occurences of words in the given text. For instance, the sentence "Mary also likes to watch football games." can be represented in the bag of words as {"Mary","also","likes","to","watch","football","games"}.
- Tokenization: It is the process of splitting or tokenizing a sentence into a list of tokens. It is also responsible for removing certain characters and punctuation.
- Stop Words Removal: It deals with getting rid of common language articles, prepositions, and pronouns like "a", "and", "to" etc.
- Stemming: It deals with reducing a word to its root form i.e. affixes to suffixes and prefixes. For instance "faster" to "fast".
- Lemmatization: It deals with reducing a word into its base form. For instance "best" to "good", "went" to "go", etc.
- Topic Modeling: It deals with uncovering hidden structures in a sentence or document.
We will be diving deeper into NLP and its terminologies in our future articles. So if you really want to dive deeper into the domain of NLP stay in touch.
Popular Library Available for NLP
You must be so eager to start coding in this domain, So here are some NLP libraries that you can start to explore right now. We will be diving deeper into these libraries as well, in our future articles. You can explore about these libraries by clicking them.
Post a Comment