Package
Class
Use
Tree
Deprecated
Index
Help
PREV NEXT
FRAMES
NO FRAMES
All Classes
C
G
H
I
P
S
T
W
Z
C
checkBytesWritten(StateProvider)
- Method in class com.powerset.heritrix.writer.
HBaseWriterProcessor
Check bytes written.
close()
- Method in class com.powerset.heritrix.writer.
HBaseWriterProcessor
com.powerset.heritrix.writer
- package com.powerset.heritrix.writer
Provides
HBase
writer for
heritrix
.
CONTENT_COLUMN_FAMILY
- Static variable in class com.powerset.heritrix.writer.
HBaseWriter
The Constant CONTENT_COLUMN_FAMILY.
CONTENT_COLUMN_NAME
- Static variable in class com.powerset.heritrix.writer.
HBaseWriter
The Constant CONTENT_COLUMN.
CONTENT_MAX_SIZE
- Static variable in class com.powerset.heritrix.writer.
HBaseWriterProcessor
Maximum allowable content size.
createCrawlTable(HBaseConfiguration, String)
- Method in class com.powerset.heritrix.writer.
HBaseWriter
Creates the crawl table in HBase.
CURI_COLUMN_FAMILY
- Static variable in class com.powerset.heritrix.writer.
HBaseWriter
The Constant CURI_COLUMN_FAMILY.
G
getByteArrayFromInputStream(ReplayInputStream, int)
- Method in class com.powerset.heritrix.writer.
HBaseWriter
Read the ReplayInputStream and write it to the given BatchUpdate with the given column.
getClient()
- Method in class com.powerset.heritrix.writer.
HBaseWriter
Gets the HTable client.
getHostAddress(ProcessorURI)
- Method in class com.powerset.heritrix.writer.
HBaseWriterProcessor
Return IP address of given URI suitable for recording (as in a classic ARC 5-field header line).
getMaxActive()
- Method in class com.powerset.heritrix.writer.
HBaseWriterProcessor
Gets the max active.
getMaxWait()
- Method in class com.powerset.heritrix.writer.
HBaseWriterProcessor
Gets the max wait.
getPool()
- Method in class com.powerset.heritrix.writer.
HBaseWriterProcessor
Gets the pool.
getTable()
- Method in class com.powerset.heritrix.writer.
HBaseWriterProcessor
Gets the table.
getTotalBytesWritten()
- Method in class com.powerset.heritrix.writer.
HBaseWriterProcessor
Gets the total bytes written.
getZKClientPort()
- Method in class com.powerset.heritrix.writer.
HBaseWriterProcessor
Gets the zookeeper client port.
getZKQuorum()
- Method in class com.powerset.heritrix.writer.
HBaseWriterProcessor
Gets the zookeeper quorum.
H
HBaseWriter
- Class in
com.powerset.heritrix.writer
Write crawled content as records to an HBase table.
HBaseWriter(String, int, String)
- Constructor for class com.powerset.heritrix.writer.
HBaseWriter
Instantiates a new HBaseWriter for the WriterPool to use in heritrix2.
HBaseWriterPool
- Class in
com.powerset.heritrix.writer
A pool of HBaseWriters.
HBaseWriterPool(String, int, String, int, int)
- Constructor for class com.powerset.heritrix.writer.
HBaseWriterPool
Constructor.
HBaseWriterProcessor
- Class in
com.powerset.heritrix.writer
An
heritrix2
processor that writes to
Hadoop HBase
.
HBaseWriterProcessor()
- Constructor for class com.powerset.heritrix.writer.
HBaseWriterProcessor
Instantiates a new HBaseWriterProcessor.
I
initialTasks(StateProvider)
- Method in class com.powerset.heritrix.writer.
HBaseWriterProcessor
innerProcess(ProcessorURI)
- Method in class com.powerset.heritrix.writer.
HBaseWriterProcessor
innerProcessResult(ProcessorURI)
- Method in class com.powerset.heritrix.writer.
HBaseWriterProcessor
P
POOL_MAX_ACTIVE
- Static variable in class com.powerset.heritrix.writer.
HBaseWriterProcessor
Maximum active files in pool.
POOL_MAX_WAIT
- Static variable in class com.powerset.heritrix.writer.
HBaseWriterProcessor
Maximum time to wait on pool element (milliseconds).
PROCESS_ONLY_NEW_RECORDS
- Static variable in class com.powerset.heritrix.writer.
HBaseWriterProcessor
If set to true, then only process urls that are new rowkey records.
processContent(Put, ReplayInputStream, int)
- Method in class com.powerset.heritrix.writer.
HBaseWriter
This is a stub method and is here to allow extension/overriding for custom content parsing, data manipulation and to populate new columns.
S
SERVER_CACHE
- Static variable in class com.powerset.heritrix.writer.
HBaseWriterProcessor
The Constant SERVER_CACHE.
setPool(WriterPool)
- Method in class com.powerset.heritrix.writer.
HBaseWriterProcessor
Sets the pool.
setTotalBytesWritten(long)
- Method in class com.powerset.heritrix.writer.
HBaseWriterProcessor
Sets the total bytes written.
setupPool()
- Method in class com.powerset.heritrix.writer.
HBaseWriterProcessor
Setup pool.
shouldProcess(ProcessorURI)
- Method in class com.powerset.heritrix.writer.
HBaseWriterProcessor
shouldWrite(ProcessorURI)
- Method in class com.powerset.heritrix.writer.
HBaseWriterProcessor
Whether the given ProcessorURI should be written to archive files.
T
TABLE
- Static variable in class com.powerset.heritrix.writer.
HBaseWriterProcessor
HBase tableName to crawl into.
TOTAL_BYTES_TO_WRITE
- Static variable in class com.powerset.heritrix.writer.
HBaseWriterProcessor
Total file bytes to write to disk.
W
write(ProcessorURI, String, RecordingOutputStream, RecordingInputStream)
- Method in class com.powerset.heritrix.writer.
HBaseWriter
Write the crawled output to the configured HBase table.
write(ProcessorURI, long, InputStream, String)
- Method in class com.powerset.heritrix.writer.
HBaseWriterProcessor
Write.
WRITE_ONLY_NEW_RECORDS
- Static variable in class com.powerset.heritrix.writer.
HBaseWriterProcessor
If set to true, then only write urls that are new rowkey records.
Z
ZKCLIENTPORT
- Static variable in class com.powerset.heritrix.writer.
HBaseWriterProcessor
The port that clients should connect on to contact their zk quorum hsots.
ZKQUORUM
- Static variable in class com.powerset.heritrix.writer.
HBaseWriterProcessor
Commas-seperated list of Hostnames in the zookeeper quorum.
C
G
H
I
P
S
T
W
Z
Package
Class
Use
Tree
Deprecated
Index
Help
PREV NEXT
FRAMES
NO FRAMES
All Classes
Copyright © 2007-2009. All Rights Reserved.