|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectorg.archive.modules.Processor
com.powerset.heritrix.writer.HBaseWriterProcessor
public class HBaseWriterProcessor
An heritrix2 processor that writes to Hadoop HBase.
| Field Summary | |
|---|---|
static org.archive.state.Key<Integer> |
CONTENT_MAX_SIZE
Maximum allowable content size. |
static org.archive.state.Key<Integer> |
POOL_MAX_ACTIVE
Maximum active files in pool. |
static org.archive.state.Key<Integer> |
POOL_MAX_WAIT
Maximum time to wait on pool element (milliseconds). |
static org.archive.state.Key<Boolean> |
PROCESS_ONLY_NEW_RECORDS
If set to true, then only process urls that are new rowkey records. |
static org.archive.state.Key<org.archive.modules.net.ServerCache> |
SERVER_CACHE
The Constant SERVER_CACHE. |
static org.archive.state.Key<String> |
TABLE
HBase tableName to crawl into. |
static org.archive.state.Key<Long> |
TOTAL_BYTES_TO_WRITE
Total file bytes to write to disk. |
static org.archive.state.Key<Boolean> |
WRITE_ONLY_NEW_RECORDS
If set to true, then only write urls that are new rowkey records. |
static org.archive.state.Key<Integer> |
ZKCLIENTPORT
The port that clients should connect on to contact their zk quorum hsots. |
static org.archive.state.Key<String> |
ZKQUORUM
Commas-seperated list of Hostnames in the zookeeper quorum. |
| Fields inherited from class org.archive.modules.Processor |
|---|
DECIDE_RULES, ENABLED |
| Constructor Summary | |
|---|---|
HBaseWriterProcessor()
Instantiates a new HBaseWriterProcessor. |
|
| Method Summary | |
|---|---|
protected org.archive.modules.ProcessResult |
checkBytesWritten(org.archive.state.StateProvider context)
Check bytes written. |
void |
close()
|
protected String |
getHostAddress(org.archive.modules.ProcessorURI curi)
Return IP address of given URI suitable for recording (as in a classic ARC 5-field header line). |
protected int |
getMaxActive()
Gets the max active. |
protected int |
getMaxWait()
Gets the max wait. |
protected org.archive.io.WriterPool |
getPool()
Gets the pool. |
protected String |
getTable()
Gets the table. |
protected long |
getTotalBytesWritten()
Gets the total bytes written. |
protected int |
getZKClientPort()
Gets the zookeeper client port. |
protected String |
getZKQuorum()
Gets the zookeeper quorum. |
void |
initialTasks(org.archive.state.StateProvider context)
|
protected void |
innerProcess(org.archive.modules.ProcessorURI puri)
|
protected org.archive.modules.ProcessResult |
innerProcessResult(org.archive.modules.ProcessorURI puri)
|
protected void |
setPool(org.archive.io.WriterPool pool)
Sets the pool. |
protected void |
setTotalBytesWritten(long b)
Sets the total bytes written. |
protected void |
setupPool()
Setup pool. |
protected boolean |
shouldProcess(org.archive.modules.ProcessorURI uri)
|
protected boolean |
shouldWrite(org.archive.modules.ProcessorURI curi)
Whether the given ProcessorURI should be written to archive files. |
protected org.archive.modules.ProcessResult |
write(org.archive.modules.ProcessorURI curi,
long recordLength,
InputStream in,
String ip)
Write. |
| Methods inherited from class org.archive.modules.Processor |
|---|
flattenVia, getRecordedSize, getURICount, hasRfc2617CredentialAvatar, innerRejectProcess, isSuccess, process, report |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
@Immutable public static final org.archive.state.Key<String> ZKQUORUM
@Immutable public static final org.archive.state.Key<Integer> ZKCLIENTPORT
@Immutable public static final org.archive.state.Key<String> TABLE
@Immutable public static final org.archive.state.Key<Boolean> WRITE_ONLY_NEW_RECORDS
@Immutable public static final org.archive.state.Key<Boolean> PROCESS_ONLY_NEW_RECORDS
@Immutable public static final org.archive.state.Key<Integer> POOL_MAX_ACTIVE
@Immutable public static final org.archive.state.Key<Integer> POOL_MAX_WAIT
@Immutable public static final org.archive.state.Key<org.archive.modules.net.ServerCache> SERVER_CACHE
@Immutable public static final org.archive.state.Key<Integer> CONTENT_MAX_SIZE
@Immutable @Expert public static final org.archive.state.Key<Long> TOTAL_BYTES_TO_WRITE
| Constructor Detail |
|---|
public HBaseWriterProcessor()
| Method Detail |
|---|
public void initialTasks(org.archive.state.StateProvider context)
initialTasks in interface org.archive.state.Initializableprotected org.archive.modules.ProcessResult innerProcessResult(org.archive.modules.ProcessorURI puri)
innerProcessResult in class org.archive.modules.Processorprotected String getHostAddress(org.archive.modules.ProcessorURI curi)
curi - ProcessorURI
protected boolean shouldProcess(org.archive.modules.ProcessorURI uri)
shouldProcess in class org.archive.modules.Processorprotected boolean shouldWrite(org.archive.modules.ProcessorURI curi)
curi - ProcessorURI
protected org.archive.modules.ProcessResult write(org.archive.modules.ProcessorURI curi,
long recordLength,
InputStream in,
String ip)
throws IOException
curi - the curirecordLength - the record lengthin - the inip - the ip
IOException - Signals that an I/O exception has occurred.protected org.archive.modules.ProcessResult checkBytesWritten(org.archive.state.StateProvider context)
context - the context
protected void setupPool()
protected String getZKQuorum()
protected int getZKClientPort()
protected String getTable()
protected int getMaxActive()
protected int getMaxWait()
protected void setPool(org.archive.io.WriterPool pool)
pool - the new poolprotected org.archive.io.WriterPool getPool()
protected long getTotalBytesWritten()
protected void setTotalBytesWritten(long b)
b - the new total bytes writtenprotected void innerProcess(org.archive.modules.ProcessorURI puri)
innerProcess in class org.archive.modules.Processorpublic void close()
close in interface Closeable
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||