estwaver
ESTWAVER(3) Hyper Estraier ESTWAVER(3)
NAME
estwaver - command line interface of web crawler
SYNOPSIS
estwaver init [-apn|-acc] [-xs|-xl|-xh] [-sv|-si|-sa] rootdir
estwaver crawl [-restart|-revisit|-revcont] rootdir
estwaver unittest rootdir
estwaver fetch [-proxy hostr port] [-tout num] [-il lang] url
DESCRIPTION
estwaver is an aggregation of sub commands. The name of a sub command
is specified by the first argument. Other arguments are parsed accord-
ing to each sub command. The argument rootdir specifies the crawler
root directory which contains configuration file and so on.
estwaver init [-apn|-acc] [-xs|-xl|-xh] [-sv|-si|-sa] rootdir
Create the crawler root directory.
If -apn is specified, N-gram analysis is performed against Euro-
pean text also.
If -acc is specified, character category analysis is performed
instead of N-gram analysis.
If -xs is specified, the index is tuned to register less than
50000 documents.
If -xl is specified, the index is tuned to register more than
300000 documents.
If -xh is specified, the index is tuned to register more than
1000000 documents.
If -sv is specified, scores are stored as void.
If -si is specified, scores are stored as 32-bit integer.
If -sa is specified, scores are stored as-is and marked not to
be tuned when search.
estwaver crawl [-restart|-revisit|-revcont] rootdir
Start crawling.
If -restart is specified, crawling is restarted from the seed
documents.
If -revisit is specified, collected documents are revisited.
If -revcont is specified, collected documents are revisited and
then crawling is continued.</dd>
estwaver unittest rootdir
Perform unit tests.
estwaver fetch [-proxy hostr port] [-tout num] [-il lang] url
Fetch a document.
url specifies the URL of a document.
-proxy specifies the host name and the port number of the proxy
server.
-tout specifies timeout in seconds.
-il specifies the preferred language. By default, it is Eng-
lish.
All sub commands return 0 if the operation is success, else return 1.
A running crawler finishes with closing the database when it catches
the signal 1 (SIGHUP), 2 (SIGINT), 3 (SIGQUIT), or 15 (SIGTERM).
When crawling finishes, there is a directory _index in the crawler root
directory. It is an index available by estcmd and so on.
SEE ALSO
estconfig(1), estcmd(1), estmaster(1), estcall(1), estraier(3), estn-
ode(3)
Man Page 2007-03-06 ESTWAVER(3)
Man Pages Copyright Respective Owners. Site Copyright (C) 1994 - 2024
Hurricane Electric.
All Rights Reserved.