forked from nltk/nltk.github.com
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathdata.html
More file actions
158 lines (143 loc) · 10.7 KB
/
data.html
File metadata and controls
158 lines (143 loc) · 10.7 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Installing NLTK Data — NLTK 3.0 documentation</title>
<link rel="stylesheet" href="_static/agogo.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script type="text/javascript">
var DOCUMENTATION_OPTIONS = {
URL_ROOT: './',
VERSION: '3.0',
COLLAPSE_INDEX: false,
FILE_SUFFIX: '.html',
HAS_SOURCE: true
};
</script>
<script type="text/javascript" src="_static/jquery.js"></script>
<script type="text/javascript" src="_static/underscore.js"></script>
<script type="text/javascript" src="_static/doctools.js"></script>
<link rel="top" title="NLTK 3.0 documentation" href="index.html" />
<link rel="next" title="Contribute to NLTK" href="contribute.html" />
<link rel="prev" title="Installing NLTK" href="install.html" />
</head>
<body>
<div class="header-wrapper">
<div class="header">
<div class="headertitle"><a
href="index.html">NLTK 3.0 documentation</a></div>
<div class="rel">
<a href="install.html" title="Installing NLTK"
accesskey="P">previous</a> |
<a href="contribute.html" title="Contribute to NLTK"
accesskey="N">next</a> |
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |
<a href="genindex.html" title="General Index"
accesskey="I">index</a>
</div>
</div>
</div>
<div class="content-wrapper">
<div class="content">
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body">
<div class="section" id="installing-nltk-data">
<h1>Installing NLTK Data<a class="headerlink" href="#installing-nltk-data" title="Permalink to this headline">¶</a></h1>
<p>NLTK comes with many corpora, toy grammars, trained models, etc. A complete list is posted at: <a class="reference external" href="http://nltk.org/nltk_data/">http://nltk.org/nltk_data/</a></p>
<p>To install the data, first install NLTK (see <a class="reference external" href="http://nltk.org/install.html">http://nltk.org/install.html</a>), then use NLTK’s data downloader as described below.</p>
<p>Apart from individual data packages, you can download the entire collection (using “all”), or just the data required for the examples and exercises in the book (using “book”), or just the corpora and no grammars or trained models (using “all-corpora”).</p>
<div class="section" id="interactive-installer">
<h2>Interactive installer<a class="headerlink" href="#interactive-installer" title="Permalink to this headline">¶</a></h2>
<p><em>For central installation on a multi-user machine, do the following from an administrator account.</em></p>
<p>Run the Python interpreter and type the commands:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">nltk</span>
<span class="gp">>>> </span><span class="n">nltk</span><span class="o">.</span><span class="n">download</span><span class="p">()</span>
</pre></div>
</div>
<p>A new window should open, showing the NLTK Downloader. Click on the File menu and select Change Download Directory. For central installation, set this to <tt class="docutils literal"><span class="pre">C:\nltk_data</span></tt> (Windows), or <tt class="docutils literal"><span class="pre">/usr/share/nltk_data</span></tt> (Mac, Unix). Next, select the packages or collections you want to download.</p>
<p>If you did not install the data to one of the above central locations, you will need to set the <tt class="docutils literal"><span class="pre">NLTK_DATA</span></tt> environment variable to specify the location of the data. (On a Windows machine, right click on “My Computer” then select <tt class="docutils literal"><span class="pre">Properties</span> <span class="pre">></span> <span class="pre">Advanced</span> <span class="pre">></span> <span class="pre">Environment</span> <span class="pre">Variables</span> <span class="pre">></span> <span class="pre">User</span> <span class="pre">Variables</span> <span class="pre">></span> <span class="pre">New...</span></tt>)</p>
<p>Test that the data has been installed as follows. (This assumes you downloaded the Brown Corpus):</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">nltk.corpus</span> <span class="kn">import</span> <span class="n">brown</span>
<span class="gp">>>> </span><span class="n">brown</span><span class="o">.</span><span class="n">words</span><span class="p">()</span>
<span class="go">['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', ...]</span>
</pre></div>
</div>
<div class="section" id="installing-via-a-proxy-web-server">
<h3>Installing via a proxy web server<a class="headerlink" href="#installing-via-a-proxy-web-server" title="Permalink to this headline">¶</a></h3>
<p>If your web connection uses a proxy server, you should specify the proxy address as follows. In the case of an authenticating proxy, specify a username and password. If the proxy is set to None then this function will attempt to detect the system proxy.</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="n">nltk</span><span class="o">.</span><span class="n">set_proxy</span><span class="p">(</span><span class="s">'http://proxy.example.com:3128'</span><span class="p">,</span> <span class="p">(</span><span class="s">'USERNAME'</span><span class="p">,</span> <span class="s">'PASSWORD'</span><span class="p">))</span>
<span class="gp">>>> </span><span class="n">nltk</span><span class="o">.</span><span class="n">download</span><span class="p">()</span>
</pre></div>
</div>
</div>
</div>
<div class="section" id="command-line-installation">
<h2>Command line installation<a class="headerlink" href="#command-line-installation" title="Permalink to this headline">¶</a></h2>
<p>The downloader will search for an existing <tt class="docutils literal"><span class="pre">nltk_data</span></tt> directory to install NLTK data. If one does not exist it will attempt to create one in a central location (when using an administrator account) or otherwise in the user’s filespace. If necessary, run the download command from an administrator account, or using sudo. The default system location on Windows is <tt class="docutils literal"><span class="pre">C:\nltk_data</span></tt>; and on Mac and Unix is <tt class="docutils literal"><span class="pre">/usr/share/nltk_data</span></tt>. You can use the <tt class="docutils literal"><span class="pre">-d</span></tt> flag to specify a different location (but if you do this, be sure to set the <tt class="docutils literal"><span class="pre">NLTK_DATA</span></tt> environment variable accordingly).</p>
<p>Python 2.5-2.7: Run the command <tt class="docutils literal"><span class="pre">python</span> <span class="pre">-m</span> <span class="pre">nltk.downloader</span> <span class="pre">all</span></tt>. To ensure central installation, run the command <tt class="docutils literal"><span class="pre">sudo</span> <span class="pre">python</span> <span class="pre">-m</span> <span class="pre">nltk.downloader</span> <span class="pre">-d</span> <span class="pre">/usr/share/nltk_data</span> <span class="pre">all</span></tt>.</p>
<p>Windows: Use the “Run...” option on the Start menu. Windows Vista users need to first turn on this option, using <tt class="docutils literal"><span class="pre">Start</span> <span class="pre">-></span> <span class="pre">Properties</span> <span class="pre">-></span> <span class="pre">Customize</span></tt> to check the box to activate the “Run...” option.</p>
<p>Test the installation: Check that the user environment and privileges are set correctly by logging in to a user account,
starting the Python interpreter, and accessing the Brown Corpus (see the previous section).</p>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="sidebar">
<h3>Table Of Contents</h3>
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="news.html">NLTK News</a></li>
<li class="toctree-l1"><a class="reference internal" href="install.html">Installing NLTK</a></li>
<li class="toctree-l1 current"><a class="current reference internal" href="">Installing NLTK Data</a></li>
<li class="toctree-l1"><a class="reference internal" href="contribute.html">Contribute to NLTK</a></li>
<li class="toctree-l1"><a class="reference external" href="https://github.com/nltk/nltk/wiki/FAQ">FAQ</a></li>
<li class="toctree-l1"><a class="reference external" href="https://github.com/nltk/nltk/wiki">Wiki</a></li>
<li class="toctree-l1"><a class="reference internal" href="api/nltk.html">API</a></li>
<li class="toctree-l1"><a class="reference external" href="http://www.nltk.org/howto">HOWTO</a></li>
</ul>
<h3 style="margin-top: 1.5em;">Search</h3>
<form class="search" action="search.html" method="get">
<input type="text" name="q" />
<input type="submit" value="Go" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
<p class="searchtip" style="font-size: 90%">
Enter search terms or a module, class or function name.
</p>
</div>
<div class="clearer"></div>
</div>
</div>
<div class="footer-wrapper">
<div class="footer">
<div class="left">
<a href="install.html" title="Installing NLTK"
>previous</a> |
<a href="contribute.html" title="Contribute to NLTK"
>next</a> |
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |
<a href="genindex.html" title="General Index"
>index</a>
<br/>
<a href="_sources/data.txt"
rel="nofollow">Show Source</a>
</div>
<div class="right">
<div class="footer">
© Copyright 2013, NLTK Project.
Last updated on Jul 22, 2014.
Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.2.2.
</div>
</div>
<div class="clearer"></div>
</div>
</div>
</body>
</html>