public class TikaContentExtractor extends Object
| Modifier and Type | Class and Description | 
|---|---|
static class  | 
TikaContentExtractor.TikaContent
Extracted content, metadata and media type container 
 | 
| Constructor and Description | 
|---|
TikaContentExtractor()
Create new Tika-based content extractor using AutoDetectParser. 
 | 
TikaContentExtractor(List<org.apache.tika.parser.Parser> parsers)
Create new Tika-based content extractor using the provided parser instances. 
 | 
TikaContentExtractor(List<org.apache.tika.parser.Parser> parsers,
                    org.apache.tika.detect.Detector detector)
Create new Tika-based content extractor using the provided parser instances. 
 | 
TikaContentExtractor(org.apache.tika.parser.Parser parser)
Create new Tika-based content extractor using the provided parser instance. 
 | 
TikaContentExtractor(org.apache.tika.parser.Parser parser,
                    boolean validateMediaType)
Create new Tika-based content extractor using the provided parser instance and
 optional media type validation. 
 | 
| Modifier and Type | Method and Description | 
|---|---|
TikaContentExtractor.TikaContent | 
extract(InputStream in)
Extract the content and metadata from the input stream. 
 | 
TikaContentExtractor.TikaContent | 
extract(InputStream in,
       ContentHandler handler)
Extract the content and metadata from the input stream. 
 | 
TikaContentExtractor.TikaContent | 
extract(InputStream in,
       ContentHandler handler,
       javax.ws.rs.core.MediaType mt)
Extract the content and metadata from the input stream with a media type hint. 
 | 
TikaContentExtractor.TikaContent | 
extract(InputStream in,
       ContentHandler handler,
       javax.ws.rs.core.MediaType mtHint,
       org.apache.tika.parser.ParseContext context)
Extract the content and metadata from the input stream with a media type hint
 type of content. 
 | 
TikaContentExtractor.TikaContent | 
extract(InputStream in,
       ContentHandler handler,
       org.apache.tika.parser.ParseContext context)
Extract the content and metadata from the input stream. 
 | 
TikaContentExtractor.TikaContent | 
extract(InputStream in,
       javax.ws.rs.core.MediaType mt)
Extract the content and metadata from the input stream with a media type hint. 
 | 
TikaContentExtractor.TikaContent | 
extractMetadata(InputStream in)
Extract the metadata only from the input stream. 
 | 
SearchBean | 
extractMetadataToSearchBean(InputStream in)
Extract the metadata only from the input stream. 
 | 
public TikaContentExtractor()
public TikaContentExtractor(org.apache.tika.parser.Parser parser)
parser - parser instancepublic TikaContentExtractor(List<org.apache.tika.parser.Parser> parsers)
parsers - parser instancespublic TikaContentExtractor(List<org.apache.tika.parser.Parser> parsers, org.apache.tika.detect.Detector detector)
parsers - parser instancespublic TikaContentExtractor(org.apache.tika.parser.Parser parser,
                            boolean validateMediaType)
parser - parser instancevalidateMediaType - enabled or disable media type validationparserpublic TikaContentExtractor.TikaContent extract(InputStream in)
in - input stream to extract the content and metadata frompublic TikaContentExtractor.TikaContent extract(InputStream in, javax.ws.rs.core.MediaType mt)
in - input stream to extract the content and metadata frommt - JAX-RS MediaType of the stream contentpublic TikaContentExtractor.TikaContent extract(InputStream in, ContentHandler handler)
in - input stream to extract the content and metadata fromhandler - custom ContentHandlerpublic TikaContentExtractor.TikaContent extract(InputStream in, ContentHandler handler, javax.ws.rs.core.MediaType mt)
in - input stream to extract the content and metadata fromhandler - custom ContentHandlermt - JAX-RS MediaType of the stream contentpublic TikaContentExtractor.TikaContent extract(InputStream in, ContentHandler handler, org.apache.tika.parser.ParseContext context)
in - input stream to extract the content and metadata fromhandler - custom ContentHandlercontext - custom contextpublic TikaContentExtractor.TikaContent extract(InputStream in, ContentHandler handler, javax.ws.rs.core.MediaType mtHint, org.apache.tika.parser.ParseContext context)
in - input stream to extract the metadata fromhandler - custom ContentHandlermtHint - JAX-RS MediaType of the stream contentcontext - custom contextpublic TikaContentExtractor.TikaContent extractMetadata(InputStream in)
in - input stream to extract the metadata frompublic SearchBean extractMetadataToSearchBean(InputStream in)
in - input stream to extract the metadata fromApache CXF