public class TikaLuceneContentExtractor extends Object
| Constructor and Description | 
|---|
TikaLuceneContentExtractor(List<org.apache.tika.parser.Parser> parsers,
                          LuceneDocumentMetadata documentMetadata)
Create new Tika-based content extractor using the provided parser instance and
 optional media type validation. 
 | 
TikaLuceneContentExtractor(org.apache.tika.parser.Parser parser)
Create new Tika-based content extractor using the provided parser instance. 
 | 
TikaLuceneContentExtractor(org.apache.tika.parser.Parser parser,
                          boolean validateMediaType)
Create new Tika-based content extractor using the provided parser instance and
 optional media type validation. 
 | 
TikaLuceneContentExtractor(org.apache.tika.parser.Parser parser,
                          boolean validateMediaType,
                          LuceneDocumentMetadata documentMetadata)
Create new Tika-based content extractor using the provided parser instance and
 optional media type validation. 
 | 
TikaLuceneContentExtractor(org.apache.tika.parser.Parser parser,
                          LuceneDocumentMetadata documentMetadata)
Create new Tika-based content extractor using the provided parser instance and
 optional media type validation. 
 | 
| Modifier and Type | Method and Description | 
|---|---|
org.apache.lucene.document.Document | 
extract(InputStream in)
Extract the content and metadata from the input stream. 
 | 
org.apache.lucene.document.Document | 
extract(InputStream in,
       LuceneDocumentMetadata documentMetadata)
Extract the content and metadata from the input stream. 
 | 
org.apache.lucene.document.Document | 
extractContent(InputStream in)
Extract the content only from the input stream. 
 | 
org.apache.lucene.document.Document | 
extractMetadata(InputStream in)
Extract the metadata only from the input stream. 
 | 
org.apache.lucene.document.Document | 
extractMetadata(InputStream in,
               LuceneDocumentMetadata documentMetadata)
Extract the metadata only from the input stream. 
 | 
public TikaLuceneContentExtractor(org.apache.tika.parser.Parser parser)
parser - parser instancepublic TikaLuceneContentExtractor(org.apache.tika.parser.Parser parser,
                                  boolean validateMediaType)
parser - parser instancevalidateMediaType - enabled or disable media type validationpublic TikaLuceneContentExtractor(org.apache.tika.parser.Parser parser,
                                  LuceneDocumentMetadata documentMetadata)
parser - parser instancethis.contentFieldNamedocumentMetadata - documentMetadatapublic TikaLuceneContentExtractor(org.apache.tika.parser.Parser parser,
                                  boolean validateMediaType,
                                  LuceneDocumentMetadata documentMetadata)
parser - parser instancethis.contentFieldNamevalidateMediaType - enabled or disable media type validationdocumentMetadata - documentMetadatapublic TikaLuceneContentExtractor(List<org.apache.tika.parser.Parser> parsers, LuceneDocumentMetadata documentMetadata)
parsers - parsers instancethis.contentFieldNamedocumentMetadata - documentMetadatapublic org.apache.lucene.document.Document extract(InputStream in)
in - input stream to extract the content and metadata frompublic org.apache.lucene.document.Document extract(InputStream in, LuceneDocumentMetadata documentMetadata)
in - input stream to extract the content and metadata fromdocumentMetadata - documentMetadatapublic org.apache.lucene.document.Document extractContent(InputStream in)
in - input stream to extract the content frompublic org.apache.lucene.document.Document extractMetadata(InputStream in)
in - input stream to extract the metadata frompublic org.apache.lucene.document.Document extractMetadata(InputStream in, LuceneDocumentMetadata documentMetadata)
in - input stream to extract the metadata fromdocumentMetadata - documentMetadataApache CXF