C D G H I M O P R S U V W Z

C

CONTENT_COLUMN_FAMILY - Static variable in class org.archive.io.hbase.HBaseParameters
The Constant CONTENT_COLUMN_FAMILY.
CONTENT_COLUMN_NAME - Static variable in class org.archive.io.hbase.HBaseParameters
The Constant CONTENT_COLUMN_NAME.
CURI_COLUMN_FAMILY - Static variable in class org.archive.io.hbase.HBaseParameters
The Constant CURI_COLUMN_FAMILY.

D

DEFAULT_MAX_FILE_SIZE_IN_BYTES - Static variable in class org.archive.io.hbase.HBaseParameters
The Constant DEFAULT_MAX_FILE_SIZE_IN_BYTES.

G

getByteArrayFromInputStream(ReplayInputStream, int) - Method in class org.archive.io.hbase.HBaseWriter
Read the ReplayInputStream and write it to the given BatchUpdate with the given column.
getClient() - Method in class org.archive.io.hbase.HBaseWriter
Gets the HTable client.
getContentColumnFamily() - Method in class org.archive.io.hbase.HBaseParameters
Gets the content column family.
getContentColumnName() - Method in class org.archive.io.hbase.HBaseParameters
Gets the content column name.
getCuriColumnFamily() - Method in class org.archive.io.hbase.HBaseParameters
Gets the curi column family.
getDefaultMaxFileSizeInBytes() - Method in class org.archive.io.hbase.HBaseParameters
Gets the default max file size in bytes.
getHbaseOptions() - Method in class org.archive.io.hbase.HBaseWriter
Gets the hbase options.
getHbaseParameters() - Method in class org.archive.modules.writer.HBaseWriterProcessor
Gets the hbase parameters.
getHbaseTableName() - Method in class org.archive.io.hbase.HBaseParameters
Gets the hbase table name.
getIpColumnName() - Method in class org.archive.io.hbase.HBaseParameters
Gets the ip column name.
getIsSeedColumnName() - Method in class org.archive.io.hbase.HBaseParameters
Gets the checks if is seed column name.
getMetadata() - Method in class org.archive.modules.writer.HBaseWriterProcessor
 
getPathFromSeedColumnName() - Method in class org.archive.io.hbase.HBaseParameters
Gets the path from seed column name.
getRecordIDGenerator() - Method in class org.archive.modules.writer.HBaseWriterProcessor
 
getRequestColumnName() - Method in class org.archive.io.hbase.HBaseParameters
Gets the request column name.
getSerializer() - Method in class org.archive.io.hbase.HBaseParameters
Gets the serializer.
getUrlColumnName() - Method in class org.archive.io.hbase.HBaseParameters
Gets the url column name.
getViaColumnName() - Method in class org.archive.io.hbase.HBaseParameters
Gets the via column name.
getZkPort() - Method in class org.archive.io.hbase.HBaseParameters
Gets the zk port.
getZkQuorum() - Method in class org.archive.io.hbase.HBaseParameters
Gets the zk quorum.
getZookeeperClientPortKey() - Method in class org.archive.io.hbase.HBaseParameters
Gets the zookeeper client port key.

H

HBaseParameters - Class in org.archive.io.hbase
Configures the values of the column family/qualifier used for the crawl.
HBaseParameters() - Constructor for class org.archive.io.hbase.HBaseParameters
 
HBaseWriter - Class in org.archive.io.hbase
HBase implementation.
HBaseWriter(AtomicInteger, WriterPoolSettings, HBaseParameters) - Constructor for class org.archive.io.hbase.HBaseWriter
Instantiates a new h base writer.
HBaseWriterPool - Class in org.archive.io.hbase
The Class HBaseWriterPool.
HBaseWriterPool(AtomicInteger, WriterPoolSettings, int, int, HBaseParameters) - Constructor for class org.archive.io.hbase.HBaseWriterPool
Instantiates a new h base writer pool.
HBaseWriterProcessor - Class in org.archive.modules.writer
A Heritrix 3 processor that writes to Hadoop HBase.
HBaseWriterProcessor() - Constructor for class org.archive.modules.writer.HBaseWriterProcessor
 

I

initializeCrawlTable(Configuration, String) - Method in class org.archive.io.hbase.HBaseWriter
Creates the crawl table in HBase.
innerProcessResult(CrawlURI) - Method in class org.archive.modules.writer.HBaseWriterProcessor
 
IP_COLUMN_NAME - Static variable in class org.archive.io.hbase.HBaseParameters
The Constant IP_COLUMN_NAME.
IS_SEED_COLUMN_NAME - Static variable in class org.archive.io.hbase.HBaseParameters
The Constant IS_SEED_COLUMN_NAME.
isMd5Key() - Method in class org.archive.io.hbase.HBaseParameters
Checks if is md5 key.
isOnlyProcessNewRecords() - Method in class org.archive.io.hbase.HBaseParameters
Checks if is only process new records.
isOnlyWriteNewRecords() - Method in class org.archive.io.hbase.HBaseParameters
Checks if is only write new records.

M

makeWriter() - Method in class org.archive.io.hbase.HBaseWriterPool
 

O

org.archive.io.hbase - package org.archive.io.hbase
Provides HBase writer for heritrix.
org.archive.modules.writer - package org.archive.modules.writer
 

P

PATH_FROM_SEED_COLUMN_NAME - Static variable in class org.archive.io.hbase.HBaseParameters
The Constant PATH_FROM_SEED_COLUMN_NAME.
processContent(Put, ReplayInputStream, int) - Method in class org.archive.io.hbase.HBaseWriter
This is a stub method and is here to allow extension/overriding for custom content parsing, data manipulation and to populate new columns.

R

REQUEST_COLUMN_NAME - Static variable in class org.archive.io.hbase.HBaseParameters
The Constant REQUEST_COLUMN_NAME.

S

serialize(byte[]) - Method in class org.archive.io.hbase.HBaseWriter
 
serialize(byte[]) - Method in interface org.archive.io.hbase.Serializer
Implement if you want to serialize bytes in a custom manner.
Serializer - Interface in org.archive.io.hbase
The Interface Serializer.
setContentColumnFamily(String) - Method in class org.archive.io.hbase.HBaseParameters
Sets the content column family.
setContentColumnName(String) - Method in class org.archive.io.hbase.HBaseParameters
Sets the content column name.
setCuriColumnFamily(String) - Method in class org.archive.io.hbase.HBaseParameters
Sets the curi column family.
setDefaultMaxFileSizeInBytes(long) - Method in class org.archive.io.hbase.HBaseParameters
Sets the default max file size in bytes.
setHbaseParameters(HBaseParameters) - Method in class org.archive.modules.writer.HBaseWriterProcessor
Sets the hbase parameters.
setHbaseTableName(String) - Method in class org.archive.io.hbase.HBaseParameters
Sets the hbase table name.
setIpColumnName(String) - Method in class org.archive.io.hbase.HBaseParameters
Sets the ip column name.
setIsSeedColumnName(String) - Method in class org.archive.io.hbase.HBaseParameters
Sets the checks if is seed column name.
setMd5Key(boolean) - Method in class org.archive.io.hbase.HBaseParameters
Sets the md5 key.
setOnlyProcessNewRecords(boolean) - Method in class org.archive.io.hbase.HBaseParameters
Sets the only process new records.
setOnlyWriteNewRecords(boolean) - Method in class org.archive.io.hbase.HBaseParameters
Sets the only write new records.
setPathFromSeedColumnName(String) - Method in class org.archive.io.hbase.HBaseParameters
Sets the path from seed column name.
setRequestColumnName(String) - Method in class org.archive.io.hbase.HBaseParameters
Sets the request column name.
setSerializer(Serializer) - Method in class org.archive.io.hbase.HBaseParameters
Sets the serializer.
setupPool(AtomicInteger) - Method in class org.archive.modules.writer.HBaseWriterProcessor
 
setUrlColumnName(String) - Method in class org.archive.io.hbase.HBaseParameters
Sets the url column name.
setViaColumnName(String) - Method in class org.archive.io.hbase.HBaseParameters
Sets the via column name.
setZkPort(int) - Method in class org.archive.io.hbase.HBaseParameters
Sets the zk port.
setZkQuorum(String) - Method in class org.archive.io.hbase.HBaseParameters
Sets the zk quorum.
shouldProcess(CrawlURI) - Method in class org.archive.modules.writer.HBaseWriterProcessor
 
shouldWrite(CrawlURI) - Method in class org.archive.modules.writer.HBaseWriterProcessor
Whether the given CrawlURI should be written to archive files.

U

URL_COLUMN_NAME - Static variable in class org.archive.io.hbase.HBaseParameters
The Constant URL_COLUMN_NAME.

V

VIA_COLUMN_NAME - Static variable in class org.archive.io.hbase.HBaseParameters
The Constant VIA_COLUMN_NAME.

W

write(CrawlURI, String, RecordingOutputStream, RecordingInputStream) - Method in class org.archive.io.hbase.HBaseWriter
Write the crawled output to the configured HBase table.
write(CrawlURI, long, InputStream) - Method in class org.archive.modules.writer.HBaseWriterProcessor
Write to HBase.

Z

ZK_PORT - Static variable in class org.archive.io.hbase.HBaseParameters
DEFAULT OPTIONS *.
ZOOKEEPER_CLIENT_PORT - Static variable in class org.archive.io.hbase.HBaseParameters
The ZOOKEEPE r_ clien t_ port.

C D G H I M O P R S U V W Z

Copyright © 2007-2012. All Rights Reserved.