C G H I P S T W Z

C

checkBytesWritten(StateProvider) - Method in class com.powerset.heritrix.writer.HBaseWriterProcessor
Check bytes written.
close() - Method in class com.powerset.heritrix.writer.HBaseWriterProcessor
 
com.powerset.heritrix.writer - package com.powerset.heritrix.writer
Provides HBase writer for heritrix.
CONTENT_COLUMN_FAMILY - Static variable in class com.powerset.heritrix.writer.HBaseWriter
The Constant CONTENT_COLUMN_FAMILY.
CONTENT_COLUMN_NAME - Static variable in class com.powerset.heritrix.writer.HBaseWriter
The Constant CONTENT_COLUMN.
CONTENT_MAX_SIZE - Static variable in class com.powerset.heritrix.writer.HBaseWriterProcessor
Maximum allowable content size.
createCrawlTable(HBaseConfiguration, String) - Method in class com.powerset.heritrix.writer.HBaseWriter
Creates the crawl table in HBase.
CURI_COLUMN_FAMILY - Static variable in class com.powerset.heritrix.writer.HBaseWriter
The Constant CURI_COLUMN_FAMILY.

G

getByteArrayFromInputStream(ReplayInputStream, int) - Method in class com.powerset.heritrix.writer.HBaseWriter
Read the ReplayInputStream and write it to the given BatchUpdate with the given column.
getClient() - Method in class com.powerset.heritrix.writer.HBaseWriter
Gets the HTable client.
getHostAddress(ProcessorURI) - Method in class com.powerset.heritrix.writer.HBaseWriterProcessor
Return IP address of given URI suitable for recording (as in a classic ARC 5-field header line).
getMaxActive() - Method in class com.powerset.heritrix.writer.HBaseWriterProcessor
Gets the max active.
getMaxWait() - Method in class com.powerset.heritrix.writer.HBaseWriterProcessor
Gets the max wait.
getPool() - Method in class com.powerset.heritrix.writer.HBaseWriterProcessor
Gets the pool.
getTable() - Method in class com.powerset.heritrix.writer.HBaseWriterProcessor
Gets the table.
getTotalBytesWritten() - Method in class com.powerset.heritrix.writer.HBaseWriterProcessor
Gets the total bytes written.
getZKClientPort() - Method in class com.powerset.heritrix.writer.HBaseWriterProcessor
Gets the zookeeper client port.
getZKQuorum() - Method in class com.powerset.heritrix.writer.HBaseWriterProcessor
Gets the zookeeper quorum.

H

HBaseWriter - Class in com.powerset.heritrix.writer
Write crawled content as records to an HBase table.
HBaseWriter(String, int, String) - Constructor for class com.powerset.heritrix.writer.HBaseWriter
Instantiates a new HBaseWriter for the WriterPool to use in heritrix2.
HBaseWriterPool - Class in com.powerset.heritrix.writer
A pool of HBaseWriters.
HBaseWriterPool(String, int, String, int, int) - Constructor for class com.powerset.heritrix.writer.HBaseWriterPool
Constructor.
HBaseWriterProcessor - Class in com.powerset.heritrix.writer
An heritrix2 processor that writes to Hadoop HBase.
HBaseWriterProcessor() - Constructor for class com.powerset.heritrix.writer.HBaseWriterProcessor
Instantiates a new HBaseWriterProcessor.

I

initialTasks(StateProvider) - Method in class com.powerset.heritrix.writer.HBaseWriterProcessor
 
innerProcess(ProcessorURI) - Method in class com.powerset.heritrix.writer.HBaseWriterProcessor
 
innerProcessResult(ProcessorURI) - Method in class com.powerset.heritrix.writer.HBaseWriterProcessor
 

P

POOL_MAX_ACTIVE - Static variable in class com.powerset.heritrix.writer.HBaseWriterProcessor
Maximum active files in pool.
POOL_MAX_WAIT - Static variable in class com.powerset.heritrix.writer.HBaseWriterProcessor
Maximum time to wait on pool element (milliseconds).
PROCESS_ONLY_NEW_RECORDS - Static variable in class com.powerset.heritrix.writer.HBaseWriterProcessor
If set to true, then only process urls that are new rowkey records.
processContent(Put, ReplayInputStream, int) - Method in class com.powerset.heritrix.writer.HBaseWriter
This is a stub method and is here to allow extension/overriding for custom content parsing, data manipulation and to populate new columns.

S

SERVER_CACHE - Static variable in class com.powerset.heritrix.writer.HBaseWriterProcessor
The Constant SERVER_CACHE.
setPool(WriterPool) - Method in class com.powerset.heritrix.writer.HBaseWriterProcessor
Sets the pool.
setTotalBytesWritten(long) - Method in class com.powerset.heritrix.writer.HBaseWriterProcessor
Sets the total bytes written.
setupPool() - Method in class com.powerset.heritrix.writer.HBaseWriterProcessor
Setup pool.
shouldProcess(ProcessorURI) - Method in class com.powerset.heritrix.writer.HBaseWriterProcessor
 
shouldWrite(ProcessorURI) - Method in class com.powerset.heritrix.writer.HBaseWriterProcessor
Whether the given ProcessorURI should be written to archive files.

T

TABLE - Static variable in class com.powerset.heritrix.writer.HBaseWriterProcessor
HBase tableName to crawl into.
TOTAL_BYTES_TO_WRITE - Static variable in class com.powerset.heritrix.writer.HBaseWriterProcessor
Total file bytes to write to disk.

W

write(ProcessorURI, String, RecordingOutputStream, RecordingInputStream) - Method in class com.powerset.heritrix.writer.HBaseWriter
Write the crawled output to the configured HBase table.
write(ProcessorURI, long, InputStream, String) - Method in class com.powerset.heritrix.writer.HBaseWriterProcessor
Write.
WRITE_ONLY_NEW_RECORDS - Static variable in class com.powerset.heritrix.writer.HBaseWriterProcessor
If set to true, then only write urls that are new rowkey records.

Z

ZKCLIENTPORT - Static variable in class com.powerset.heritrix.writer.HBaseWriterProcessor
The port that clients should connect on to contact their zk quorum hsots.
ZKQUORUM - Static variable in class com.powerset.heritrix.writer.HBaseWriterProcessor
Commas-seperated list of Hostnames in the zookeeper quorum.

C G H I P S T W Z

Copyright © 2007-2009. All Rights Reserved.