public class TikaContentExtractor extends Object
Modifier and Type | Class and Description |
---|---|
static class |
TikaContentExtractor.TikaContent
Extracted content, metadata and media type container
|
Constructor and Description |
---|
TikaContentExtractor()
Create new Tika-based content extractor using AutoDetectParser.
|
TikaContentExtractor(List<org.apache.tika.parser.Parser> parsers)
Create new Tika-based content extractor using the provided parser instances.
|
TikaContentExtractor(List<org.apache.tika.parser.Parser> parsers,
org.apache.tika.detect.Detector detector)
Create new Tika-based content extractor using the provided parser instances.
|
TikaContentExtractor(org.apache.tika.parser.Parser parser)
Create new Tika-based content extractor using the provided parser instance.
|
TikaContentExtractor(org.apache.tika.parser.Parser parser,
boolean validateMediaType)
Create new Tika-based content extractor using the provided parser instance and
optional media type validation.
|
Modifier and Type | Method and Description |
---|---|
TikaContentExtractor.TikaContent |
extract(InputStream in)
Extract the content and metadata from the input stream.
|
TikaContentExtractor.TikaContent |
extract(InputStream in,
ContentHandler handler)
Extract the content and metadata from the input stream.
|
TikaContentExtractor.TikaContent |
extract(InputStream in,
ContentHandler handler,
javax.ws.rs.core.MediaType mt)
Extract the content and metadata from the input stream with a media type hint.
|
TikaContentExtractor.TikaContent |
extract(InputStream in,
ContentHandler handler,
javax.ws.rs.core.MediaType mtHint,
org.apache.tika.parser.ParseContext context)
Extract the content and metadata from the input stream with a media type hint
type of content.
|
TikaContentExtractor.TikaContent |
extract(InputStream in,
ContentHandler handler,
org.apache.tika.parser.ParseContext context)
Extract the content and metadata from the input stream.
|
TikaContentExtractor.TikaContent |
extract(InputStream in,
javax.ws.rs.core.MediaType mt)
Extract the content and metadata from the input stream with a media type hint.
|
TikaContentExtractor.TikaContent |
extractMetadata(InputStream in)
Extract the metadata only from the input stream.
|
SearchBean |
extractMetadataToSearchBean(InputStream in)
Extract the metadata only from the input stream.
|
public TikaContentExtractor()
public TikaContentExtractor(org.apache.tika.parser.Parser parser)
parser
- parser instancepublic TikaContentExtractor(List<org.apache.tika.parser.Parser> parsers)
parsers
- parser instancespublic TikaContentExtractor(List<org.apache.tika.parser.Parser> parsers, org.apache.tika.detect.Detector detector)
parsers
- parser instancespublic TikaContentExtractor(org.apache.tika.parser.Parser parser, boolean validateMediaType)
parser
- parser instancevalidateMediaType
- enabled or disable media type validationparserpublic TikaContentExtractor.TikaContent extract(InputStream in)
in
- input stream to extract the content and metadata frompublic TikaContentExtractor.TikaContent extract(InputStream in, javax.ws.rs.core.MediaType mt)
in
- input stream to extract the content and metadata frommt
- JAX-RS MediaType of the stream contentpublic TikaContentExtractor.TikaContent extract(InputStream in, ContentHandler handler)
in
- input stream to extract the content and metadata fromhandler
- custom ContentHandlerpublic TikaContentExtractor.TikaContent extract(InputStream in, ContentHandler handler, javax.ws.rs.core.MediaType mt)
in
- input stream to extract the content and metadata fromhandler
- custom ContentHandlermt
- JAX-RS MediaType of the stream contentpublic TikaContentExtractor.TikaContent extract(InputStream in, ContentHandler handler, org.apache.tika.parser.ParseContext context)
in
- input stream to extract the content and metadata fromhandler
- custom ContentHandlercontext
- custom contextpublic TikaContentExtractor.TikaContent extract(InputStream in, ContentHandler handler, javax.ws.rs.core.MediaType mtHint, org.apache.tika.parser.ParseContext context)
in
- input stream to extract the metadata fromhandler
- custom ContentHandlermtHint
- JAX-RS MediaType of the stream contentcontext
- custom contextpublic TikaContentExtractor.TikaContent extractMetadata(InputStream in)
in
- input stream to extract the metadata frompublic SearchBean extractMetadataToSearchBean(InputStream in)
in
- input stream to extract the metadata fromApache CXF