NLTK和停用词失败 #lookuperror

15 浏览2023年1月24日

匿名的 2023年1月25日

0 Comments

我正在尝试开始一个情感分析的项目，我将使用停用词方法。我做了一些研究，发现nltk有停用词，但当我执行命令时出现错误。

我所做的是，为了了解nltk使用的是哪些词（就像你在这里找到的http://www.nltk.org/book/ch02.html第4.1节）：

from nltk.corpus import stopwords
stopwords.words('english')

但是当我按回车键时，我得到的是

---------------------------------------------------------------------------
LookupError                               Traceback (most recent call last)
 in ()
----> 1 stopwords.words('english')
C:\Users\Usuario\Anaconda\lib\site-packages\nltk\corpus\util.pyc in __getattr__(self, attr)
 66
 67     def __getattr__(self, attr):
---> 68         self.__load()
 69         # This looks circular, but its not, since __load() changes our
 70         # __class__ to something new:
C:\Users\Usuario\Anaconda\lib\site-packages\nltk\corpus\util.pyc in __load(self)
 54             except LookupError, e:
 55                 try: root = nltk.data.find('corpora/%s' % zip_name)
---> 56                 except LookupError: raise e
 57
 58         # Load the corpus.
LookupError:
**********************************************************************
  找不到资源'corpora/stopwords'。请使用NLTK下载器获取资源：>>> nltk.download()
  搜索路径：
- 'C:\\Users\\Meru/nltk_data'
- 'C:\\nltk_data'
- 'D:\\nltk_data'
- 'E:\\nltk_data'
- 'C:\\Users\\Meru\\Anaconda\\nltk_data'
- 'C:\\Users\\Meru\\Anaconda\\lib\\nltk_data'
- 'C:\\Users\\Meru\\AppData\\Roaming\\nltk_data'
**********************************************************************

而且，因为这个问题，类似这样的事情无法正常运行（得到相同的错误）：

>>> from nltk.corpus import stopwords
>>> stop = stopwords.words('english')
>>> sentence = "this is a foo bar sentence"
>>> print [i for i in sentence.split() if i not in stop]

你知道可能是什么问题吗？我必须使用西班牙语单词，你推荐另一种方法吗？我还考虑使用Goslate软件包和英语数据集

谢谢阅读！

附：我使用的是Ananconda

如何使用nltk或python去除停用词

Resource u'tokenizers/punkt/english.pickle' not found 找不到资源 u'tokenizers/punkt/english.pickle'

在nltk中是否有英语单词语料库？

NLTK Python word_tokenize

使用NLTK进行停用词去除

Python NLTK：SyntaxError：文件中的非ASCII字符'\xc3'（情感分析-NLP）

NLTK Lookup Error

在Heroku上找不到资源'corpora/wordnet'

UnicodeDecodeError: 'ascii'编解码器无法解码位置13的字节0xe2：超出范围(128)。

使用nltk.download()下载时出现错误。

为什么一些应该在nltk语料库中的单词缺失？

在Python中更快地移除停用词的方法

如何对文本语料进行分词？

将单词添加到scikit-learn的CountVectorizer的停用词列表中

将一个段落在NLTK中分割为句子，然后再将句子分割为单词。

nltk没有将$NLTK_DATA添加到搜索路径中吗？

python -m textblob.download_corpora - CERTIFICATE_VERIFY_FAILED

从字符串中删除一系列单词

NLTK python错误: "TypeError: 'dict_keys'对象不可索引"

使用NLTK创建一个新的语料库

NLTK和停用词失败 #lookuperror

0 答案