public class TikaLuceneContentExtractor extends Object
Constructor and Description |
---|
TikaLuceneContentExtractor(List<Parser> parsers,
LuceneDocumentMetadata documentMetadata)
Create new Tika-based content extractor using the provided parser instance and
optional media type validation.
|
TikaLuceneContentExtractor(Parser parser)
Create new Tika-based content extractor using the provided parser instance.
|
TikaLuceneContentExtractor(Parser parser,
boolean validateMediaType)
Create new Tika-based content extractor using the provided parser instance and
optional media type validation.
|
TikaLuceneContentExtractor(Parser parser,
boolean validateMediaType,
LuceneDocumentMetadata documentMetadata)
Create new Tika-based content extractor using the provided parser instance and
optional media type validation.
|
TikaLuceneContentExtractor(Parser parser,
LuceneDocumentMetadata documentMetadata)
Create new Tika-based content extractor using the provided parser instance and
optional media type validation.
|
Modifier and Type | Method and Description |
---|---|
org.apache.lucene.document.Document |
extract(InputStream in)
Extract the content and metadata from the input stream.
|
org.apache.lucene.document.Document |
extract(InputStream in,
LuceneDocumentMetadata documentMetadata)
Extract the content and metadata from the input stream.
|
org.apache.lucene.document.Document |
extractContent(InputStream in)
Extract the content only from the input stream.
|
org.apache.lucene.document.Document |
extractMetadata(InputStream in)
Extract the metadata only from the input stream.
|
org.apache.lucene.document.Document |
extractMetadata(InputStream in,
LuceneDocumentMetadata documentMetadata)
Extract the metadata only from the input stream.
|
public TikaLuceneContentExtractor(Parser parser)
parser
- parser instancepublic TikaLuceneContentExtractor(Parser parser, boolean validateMediaType)
parser
- parser instancevalidateMediaType
- enabled or disable media type validationpublic TikaLuceneContentExtractor(Parser parser, LuceneDocumentMetadata documentMetadata)
parser
- parser instancethis.contentFieldNamedocumentMetadata
- documentMetadatapublic TikaLuceneContentExtractor(Parser parser, boolean validateMediaType, LuceneDocumentMetadata documentMetadata)
parser
- parser instancethis.contentFieldNamevalidateMediaType
- enabled or disable media type validationdocumentMetadata
- documentMetadatapublic TikaLuceneContentExtractor(List<Parser> parsers, LuceneDocumentMetadata documentMetadata)
parser
- parser instancethis.contentFieldNamevalidateMediaType
- enabled or disable media type validationdocumentMetadata
- documentMetadatapublic org.apache.lucene.document.Document extract(InputStream in)
in
- input stream to extract the content and metadata frompublic org.apache.lucene.document.Document extract(InputStream in, LuceneDocumentMetadata documentMetadata)
in
- input stream to extract the content and metadata fromdocumentMetadata
- documentMetadatapublic org.apache.lucene.document.Document extractContent(InputStream in)
in
- input stream to extract the content frompublic org.apache.lucene.document.Document extractMetadata(InputStream in)
in
- input stream to extract the metadata frompublic org.apache.lucene.document.Document extractMetadata(InputStream in, LuceneDocumentMetadata documentMetadata)
in
- input stream to extract the metadata fromdocumentMetadata
- documentMetadataApache CXF