POS tagging is the process of tagging words in a text with their appropriate Parts of Speech. Meanwhile parts of speech defines the class of words based on how the word functions in a sentence/text. Parts of speech are also known as word classes or lexical categories. Common parts of speech in english are Noun, Verb, Adjective, Adverb, Pronoun and Conjunction.
POS tags are often taken as features in NLP tasks. In this post, we are going to use Python’s NLTK to create POS tags from text. NLTK has a POS tager that takes tokens of word in order to provide POS tags. Now even though, the input to tagger is individual words, yet it performs analysis on the words like they are in a sequence, thus its accuracy is better as it is able to differentiate between words when are being used as noun and verb.
import nltk
# triple quotes represent multi-line text
text = """Most important thing in this life is love
and I love to eat chocolates."""
tokens = nltk.word_tokenize( text )
pos_tags = nltk.pos_tag( tokens )
[('Most', 'RBS'),
('important', 'JJ'),
('thing', 'NN'),
('in', 'IN'),
('this', 'DT'),
('life', 'NN'),
('is', 'VBZ'),
('love', 'JJ'),
('and', 'CC'),
('I', 'PRP'),
('love', 'VBP'),
('to', 'TO'),
('eat', 'VB'),
('chocolates', 'NNS'),
('.', '.')]
This function returns a list of tuples. Each tuple contains the word and it’s tag. Now these Tag names might be overwhelming on start, but we’ll take a look at them in detail. But first, notice how the word “love” is assigned different tag based on the role it is playing in the sentence.
Now we can take a look at which tags are present in NLTK and what to they mean. Btw, if you want to check the tags yourself, you can use the following fuction:
# prints all the available POS tags and their description
nltk.help.upenn_tagset()
Following are the main POS categories. Now there are about 12 families in this category and you’ll need to use them most of the times. Also, there are minor variations of naming convention and pos tags in different libraries.
Tag | Family | Full form | Description/ Example |
---|---|---|---|
NN | Noun | noun common, singular or mass | A word (other than a pronoun) used to identify any of a class of people, places, or things.
Example: common-carrier, cabbage, knuckle-duster, Casino, Afghan |
NNP | Noun | noun, proper, singular | A word(singular) directly associated with an entity and primarily used to refer to that entity.
Example: San Francisco, Sarah, The Library of Congress |
NNPS | Noun | noun, proper, plural | A word(plural) directly associated with an entity and primarily used to refer to that entity.
Example: Oreos, Johns, Americans, Museums of Mongols |
NNS | Noun | noun, common, plural | A word that is used to name general items rather than specific ones
Example: Big city, Jeans, Stadium |
VB | Verb | verb, base form | A word used to describe an action, state, or occurrence.
Example: Ask, Assemble, Build |
VBD | Verb | verb, past tense | A word used to describe an action, state, or occurrence.
Example: Dipped, Pleaded, Swiped |
VBG | Verb | verb, present participle or gerund | A word that is formed from a verb and can be used as an adjective or used to form verb tense
Example: Telegraphing, Stirring, Focusing |
VBN | Verb | verb, past participle | A word that is formed from a verb and can be used as an adjective or used to form verb tense
Example: Multihulled, Dilapidated, Aerosolized |
VBP | Verb | verb, present tense, not 3rd person singular | A verb in present tense form that is not being used in third person.
Example: Predominate, Wrap, Resort, Sue, Twist |
VBZ | Verb | verb, present tense, 3rd person singular | A verb in present tense form that is being used in third person.
Example: Bases, Reconstructs, Marks, Mixes, Displeases |
CC | Conjunction | conjunction, coordinating | A word that joins two elements of equal grammatical rank and syntactic importance.
Example: &, and, and, but, therefore, yet |
CD | Numeric & Dates | numeral, cardinal | Used to represent numeric values
Example: Mid-1890, Twenty, 271,124, Ten million |
DT | Determiner | determiner | A modifying word that determines the kind of reference a noun or noun group has.
Example: All, An, Both, Half, No, Some, Each |
FW | Foreign word | foreign word | A word that is not present in the language(English).
Example: Gemeinschaft, Objets, Corporis |
IN | Conjunction | preposition or conjunction, subordinating | A word which joins together a dependent (subordinate) clause and an independent clause.
Example: Among, Uppon, Whether, Within |
JJ | Adjective | adjective or numeral, ordinal | A word naming an attribute of a noun.
Example: Third, Ill-mannered, Pre-war, Regrettable, Oiled, Calamitous |
JJR | Adjective | adjective, comparative | A word used to compare one noun to another noun.
Example: Cheaper, Cleaner, Cuter, Calmer |
JJS | Adjective | adjective, superlative | A word is used to compare three or more objects
Example: Cheapest, Cleanest, Cutest, Calmest |
PRP | Pronoun | pronoun, personal | A word that take the place of specific nouns that name people, places and things.
Example: Hers, Herself, It, Itself, Ourselves |
PRP$ | Pronoun | pronoun, possessive | A word that is a pronoun and is used to refer to possession or ‘belonging’.
Example: Her, His, Mine, My, Our, Ours |
RB | Adverb | adverb | A word that describes or gives more information about a verb.
Example: Unabatingly, Amazingly, Swiftly, Adventurously |
RBR | Adverb | adverb, comparative | A word that compares two actions.
Example: Greater, Heavier, Higher, Lonelier |
RBS | Adverb | adverb, superlative | A word is used to compare three or more actions.
Example: Greatest, Heaviest, Highest, Loneliest |
UH | Interjection | interjection | A word that demonstrates the emotion or feeling of the author.
Example: Goodbye, Dammit, Heck, Whodunnit |
MD | Modal auxiliary | modal auxiliary | A modal is a type of auxiliary (helping) verb that is used to express: ability, possibility, permission or obligation
Example: Can, Cannot, Could, Couldn’t |
RP | Particle | particle | A word that has a grammatical function but does not fit into the main parts of speech
Example: Aboard, By, Off, Upon |
SYM | Symbol | symbol | Special characters.
Example: *, +, ., <, =, >, @, A, ,[fj] |
Some other not so common categories
Tag | Family | Full form | Description/ Example |
---|---|---|---|
WDT | WH-words | WH-determiner | When used as determiners, what, which, or whose can be used to ask questions.
Example: Which, Whichever, What, That |
WP | WH-words | WH-pronoun | The pronouns who, whose, which, and what can be the subject or object of a verb.
Example: That, What, Whatever, Whatsoever |
WP$ | WH-words | WH-pronoun, possessive | Used in this way, WH-pronoun represent possession,
Example: whose |
WRB | Wh-words | Wh-adverb | WH-words used as adverbs.
Example: How, However, Whence, Whenever, Where |
PDT | Pre-determiner | pre-determiner | A word that is sometimes used before a determiner to give more information about a noun in a noun phrase
Example: All, Both, Half, Many, Quite |
EX | Existential there | existential there | “There” …it’s a word that has no meaning by itself other than to fill out a syntactic position.
Example: There |
TO | To | “to” as preposition or infinitive marker | “To” can be used as both preposition and infinitive
Example: To |
LS | List item marker | list item marker | Represents the item number of a list
Example: A, A., First, 1, 1. |
POS | Genitive marker | genitive marker | The suffix -‘s on nouns is a marker of genitive case in English
Example: ‘s |
Additionally there are also tags for dollar sign, opening and closing quotation marks, opening and closing parenthesis, comma, dash, sentence terminator and colon. You will be able to find them on Treebank POS Tag link.