#!pip install spacy#!python -m spacy download en
nlp=spacy.load('en')
sentence="Ashok killed the snake with a stick"for token in nlp(sentence):
print(token,token.pos_)
五、命名实体消歧(Named Entity Disambiguation)
命名实体消岐是值识别句子中的实体的过程。例如,句子:
“Apple earned a revenue of 200 Billion USD in 2016”
命名实体消歧的目标是认出Apple是一个公司名字而不是水果名。
命名实体一般需要一个实体库,它可以将句子中的实体链接到实体库中。
这篇论文使用了基于深度神经网络的Deep Semantic Relatedness技术来进行命名实体消歧。效果不错。它使用了知识库。
这篇论文则利用了词向量模型,使用 Local Neural Attention 来进行命名实体消歧。
六、命名实体识别(Named Entity Recognition)
命名实体识别是要识别出句子中的实体,并将实体划分到某个类别中,例如人、组织、日期等。例如,句子:
“Ram of Apple Inc. travelled to Sydney on 5th October 2017”
返回的结果是:
Ram
of
Apple ORG
Inc. ORG
travelled
to
Sydney GPE
on
5th DATE
October DATE
2017 DATE2017 DATE
import spacy
nlp=spacy.load('en')sentence="Ram of Apple Inc. travelled to Sydney on 5th October 2017"for token in nlp(sentence):
print(token, token.ent_type_)
Dataset 1: Multi-Domain sentiment dataset version 2.0
Dataset 2: Twitter Sentiment analysis Dataset
Competition: A very good competition where you can check the performance of your models on the sentiment analysis task of movie reviews of rotten tomatoes.
from gensim.summarization import summarize
sentence="Automatic summarization is the process of shortening a text document with software, in order to create a summary with the major points of the original document. Technologies that can make a coherent summary take into account variables such as length, writing style and syntax.Automatic data summarization is part of machine learning and data mining. The main idea of summarization is to find a subset of data which contains the information of the entire set. Such techniques are widely used in industry today. Search engines are an example; others include summarization of documents, image collections and videos. Document summarization tries to create a representative summary or abstract of the entire document, by finding the most informative sentences, while in image summarization the system finds the most representative and important (i.e. salient) images. For surveillance videos, one might want to extract the important events from the uneventful context.There are two general approaches to automatic summarization: extraction and abstraction. Extractive methods work by selecting a subset of existing words, phrases, or sentences in the original text to form the summary. In contrast, abstractive methods build an internal semantic representation and then use natural language generation techniques to create a summary that is closer to what a human might express. Such a summary might include verbal innovations. Research to date has focused primarily on extractive methods, which are appropriate for image collection summarization and video summarization."
summarize(sentence)