public class TikaContentExtractor extends Object
| Modifier and Type | Class and Description |
|---|---|
static class |
TikaContentExtractor.TikaContent
Extracted content, metadata and media type container
|
| Constructor and Description |
|---|
TikaContentExtractor()
Create new Tika-based content extractor using AutoDetectParser.
|
TikaContentExtractor(List<Parser> parsers)
Create new Tika-based content extractor using the provided parser instances.
|
TikaContentExtractor(List<Parser> parsers,
Detector detector)
Create new Tika-based content extractor using the provided parser instances.
|
TikaContentExtractor(Parser parser)
Create new Tika-based content extractor using the provided parser instance.
|
TikaContentExtractor(Parser parser,
boolean validateMediaType)
Create new Tika-based content extractor using the provided parser instance and
optional media type validation.
|
| Modifier and Type | Method and Description |
|---|---|
TikaContentExtractor.TikaContent |
extract(InputStream in)
Extract the content and metadata from the input stream.
|
TikaContentExtractor.TikaContent |
extract(InputStream in,
ContentHandler handler)
Extract the content and metadata from the input stream.
|
TikaContentExtractor.TikaContent |
extract(InputStream in,
ContentHandler handler,
javax.ws.rs.core.MediaType mt)
Extract the content and metadata from the input stream with a media type hint.
|
TikaContentExtractor.TikaContent |
extract(InputStream in,
ContentHandler handler,
javax.ws.rs.core.MediaType mtHint,
ParseContext context)
Extract the content and metadata from the input stream with a media type hint
type of content.
|
TikaContentExtractor.TikaContent |
extract(InputStream in,
ContentHandler handler,
ParseContext context)
Extract the content and metadata from the input stream.
|
TikaContentExtractor.TikaContent |
extract(InputStream in,
javax.ws.rs.core.MediaType mt)
Extract the content and metadata from the input stream with a media type hint.
|
TikaContentExtractor.TikaContent |
extractMetadata(InputStream in)
Extract the metadata only from the input stream.
|
SearchBean |
extractMetadataToSearchBean(InputStream in)
Extract the metadata only from the input stream.
|
public TikaContentExtractor()
public TikaContentExtractor(Parser parser)
parser - parser instancepublic TikaContentExtractor(List<Parser> parsers)
parsers - parser instancespublic TikaContentExtractor(List<Parser> parsers, Detector detector)
parsers - parser instancespublic TikaContentExtractor(Parser parser,
boolean validateMediaType)
parser - parser instancevalidateMediaType - enabled or disable media type validationparserpublic TikaContentExtractor.TikaContent extract(InputStream in)
in - input stream to extract the content and metadata frompublic TikaContentExtractor.TikaContent extract(InputStream in, javax.ws.rs.core.MediaType mt)
in - input stream to extract the content and metadata frommt - JAX-RS MediaType of the stream contentpublic TikaContentExtractor.TikaContent extract(InputStream in, ContentHandler handler)
in - input stream to extract the content and metadata fromhandler - custom ContentHandlerpublic TikaContentExtractor.TikaContent extract(InputStream in, ContentHandler handler, javax.ws.rs.core.MediaType mt)
in - input stream to extract the content and metadata fromhandler - custom ContentHandlermt - JAX-RS MediaType of the stream contentpublic TikaContentExtractor.TikaContent extract(InputStream in, ContentHandler handler, ParseContext context)
in - input stream to extract the content and metadata fromhandler - custom ContentHandlercontext - custom contextpublic TikaContentExtractor.TikaContent extract(InputStream in, ContentHandler handler, javax.ws.rs.core.MediaType mtHint, ParseContext context)
in - input stream to extract the metadata fromhandler - custom ContentHandlermt - JAX-RS MediaType of the stream contentcontext - custom contextpublic TikaContentExtractor.TikaContent extractMetadata(InputStream in)
in - input stream to extract the metadata frompublic SearchBean extractMetadataToSearchBean(InputStream in)
in - input stream to extract the metadata fromApache CXF