public class TikaContentExtractor extends Object
Modifier and Type | Class and Description |
---|---|
static class |
TikaContentExtractor.TikaContent
Extracted content, metadata and media type container
|
Constructor and Description |
---|
TikaContentExtractor(List<Parser> parsers)
Create new Tika-based content extractor using the provided parser instances.
|
TikaContentExtractor(List<Parser> parsers,
Detector detector)
Create new Tika-based content extractor using the provided parser instances.
|
TikaContentExtractor(Parser parser)
Create new Tika-based content extractor using the provided parser instance.
|
TikaContentExtractor(Parser parser,
boolean validateMediaType)
Create new Tika-based content extractor using the provided parser instance and
optional media type validation.
|
Modifier and Type | Method and Description |
---|---|
TikaContentExtractor.TikaContent |
extract(InputStream in)
Extract the content and metadata from the input stream.
|
TikaContentExtractor.TikaContent |
extract(InputStream in,
ContentHandler handler)
Extract the content and metadata from the input stream.
|
TikaContentExtractor.TikaContent |
extract(InputStream in,
ContentHandler handler,
ParseContext context)
Extract the content and metadata from the input stream.
|
TikaContentExtractor.TikaContent |
extractMetadata(InputStream in)
Extract the metadata only from the input stream.
|
SearchBean |
extractMetadataToSearchBean(InputStream in)
Extract the metadata only from the input stream.
|
public TikaContentExtractor(Parser parser)
parser
- parser instancepublic TikaContentExtractor(List<Parser> parsers)
parsers
- parser instancespublic TikaContentExtractor(List<Parser> parsers, Detector detector)
parsers
- parser instancespublic TikaContentExtractor(Parser parser, boolean validateMediaType)
parser
- parser instancevalidateMediaType
- enabled or disable media type validationparserpublic TikaContentExtractor.TikaContent extract(InputStream in)
in
- input stream to extract the content and metadata frompublic TikaContentExtractor.TikaContent extractMetadata(InputStream in)
in
- input stream to extract the metadata frompublic SearchBean extractMetadataToSearchBean(InputStream in)
in
- input stream to extract the metadata frompublic TikaContentExtractor.TikaContent extract(InputStream in, ContentHandler handler)
in
- input stream to extract the metadata fromhandler
- custom ContentHandlerpublic TikaContentExtractor.TikaContent extract(InputStream in, ContentHandler handler, ParseContext context)
in
- input stream to extract the metadata fromhandler
- custom ContentHandlercontext
- custom contextApache CXF