public class TikaLuceneContentExtractor extends Object
| Constructor and Description |
|---|
TikaLuceneContentExtractor(List<Parser> parsers,
LuceneDocumentMetadata documentMetadata)
Create new Tika-based content extractor using the provided parser instance and
optional media type validation.
|
TikaLuceneContentExtractor(Parser parser)
Create new Tika-based content extractor using the provided parser instance.
|
TikaLuceneContentExtractor(Parser parser,
boolean validateMediaType)
Create new Tika-based content extractor using the provided parser instance and
optional media type validation.
|
TikaLuceneContentExtractor(Parser parser,
boolean validateMediaType,
LuceneDocumentMetadata documentMetadata)
Create new Tika-based content extractor using the provided parser instance and
optional media type validation.
|
TikaLuceneContentExtractor(Parser parser,
LuceneDocumentMetadata documentMetadata)
Create new Tika-based content extractor using the provided parser instance and
optional media type validation.
|
| Modifier and Type | Method and Description |
|---|---|
org.apache.lucene.document.Document |
extract(InputStream in)
Extract the content and metadata from the input stream.
|
org.apache.lucene.document.Document |
extract(InputStream in,
LuceneDocumentMetadata documentMetadata)
Extract the content and metadata from the input stream.
|
org.apache.lucene.document.Document |
extractContent(InputStream in)
Extract the content only from the input stream.
|
org.apache.lucene.document.Document |
extractMetadata(InputStream in)
Extract the metadata only from the input stream.
|
org.apache.lucene.document.Document |
extractMetadata(InputStream in,
LuceneDocumentMetadata documentMetadata)
Extract the metadata only from the input stream.
|
public TikaLuceneContentExtractor(Parser parser)
parser - parser instancepublic TikaLuceneContentExtractor(Parser parser,
boolean validateMediaType)
parser - parser instancevalidateMediaType - enabled or disable media type validationpublic TikaLuceneContentExtractor(Parser parser,
LuceneDocumentMetadata documentMetadata)
parser - parser instancethis.contentFieldNamedocumentMetadata - documentMetadatapublic TikaLuceneContentExtractor(Parser parser,
boolean validateMediaType,
LuceneDocumentMetadata documentMetadata)
parser - parser instancethis.contentFieldNamevalidateMediaType - enabled or disable media type validationdocumentMetadata - documentMetadatapublic TikaLuceneContentExtractor(List<Parser> parsers, LuceneDocumentMetadata documentMetadata)
parser - parser instancethis.contentFieldNamevalidateMediaType - enabled or disable media type validationdocumentMetadata - documentMetadatapublic org.apache.lucene.document.Document extract(InputStream in)
in - input stream to extract the content and metadata frompublic org.apache.lucene.document.Document extract(InputStream in, LuceneDocumentMetadata documentMetadata)
in - input stream to extract the content and metadata fromdocumentMetadata - documentMetadatapublic org.apache.lucene.document.Document extractContent(InputStream in)
in - input stream to extract the content frompublic org.apache.lucene.document.Document extractMetadata(InputStream in)
in - input stream to extract the metadata frompublic org.apache.lucene.document.Document extractMetadata(InputStream in, LuceneDocumentMetadata documentMetadata)
in - input stream to extract the metadata fromdocumentMetadata - documentMetadataApache CXF