TikaLuceneContentExtractor (Apache CXF JavaDoc 3.5.0 API)

java.lang.Object
- org.apache.cxf.jaxrs.ext.search.tika.TikaLuceneContentExtractor

public class TikaLuceneContentExtractor
extends Object

Constructor Summary

Constructors
Constructor and Description
`TikaLuceneContentExtractor(List<org.apache.tika.parser.Parser> parsers, LuceneDocumentMetadata documentMetadata)` Create new Tika-based content extractor using the provided parser instance and optional media type validation.
`TikaLuceneContentExtractor(org.apache.tika.parser.Parser parser)` Create new Tika-based content extractor using the provided parser instance.
`TikaLuceneContentExtractor(org.apache.tika.parser.Parser parser, boolean validateMediaType)` Create new Tika-based content extractor using the provided parser instance and optional media type validation.
`TikaLuceneContentExtractor(org.apache.tika.parser.Parser parser, boolean validateMediaType, LuceneDocumentMetadata documentMetadata)` Create new Tika-based content extractor using the provided parser instance and optional media type validation.
`TikaLuceneContentExtractor(org.apache.tika.parser.Parser parser, LuceneDocumentMetadata documentMetadata)` Create new Tika-based content extractor using the provided parser instance and optional media type validation.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`org.apache.lucene.document.Document`	`extract(InputStream in)` Extract the content and metadata from the input stream.
`org.apache.lucene.document.Document`	`extract(InputStream in, LuceneDocumentMetadata documentMetadata)` Extract the content and metadata from the input stream.
`org.apache.lucene.document.Document`	`extractContent(InputStream in)` Extract the content only from the input stream.
`org.apache.lucene.document.Document`	`extractMetadata(InputStream in)` Extract the metadata only from the input stream.
`org.apache.lucene.document.Document`	`extractMetadata(InputStream in, LuceneDocumentMetadata documentMetadata)` Extract the metadata only from the input stream.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - TikaLuceneContentExtractor
```
public TikaLuceneContentExtractor(org.apache.tika.parser.Parser parser)
```
    Create new Tika-based content extractor using the provided parser instance.
    
    Parameters:
    
    parser - parser instance
  - TikaLuceneContentExtractor
```
public TikaLuceneContentExtractor(org.apache.tika.parser.Parser parser,
                                  boolean validateMediaType)
```
    Create new Tika-based content extractor using the provided parser instance and optional media type validation. If validation is enabled, the implementation will try to detect the media type of the input and validate it against media typesthis.contentFieldName supported by the parser.
    
    Parameters:
    
    parser - parser instance
    
    validateMediaType - enabled or disable media type validation
  - TikaLuceneContentExtractor
```
public TikaLuceneContentExtractor(org.apache.tika.parser.Parser parser,
                                  LuceneDocumentMetadata documentMetadata)
```
    Create new Tika-based content extractor using the provided parser instance and optional media type validation. If validation is enabled, the implementation will try to detect the media type of the input and validate it against media types supported by the parser.
    
    Parameters:
    
    parser - parser instancethis.contentFieldName
    
    documentMetadata - documentMetadata
  - TikaLuceneContentExtractor
```
public TikaLuceneContentExtractor(org.apache.tika.parser.Parser parser,
                                  boolean validateMediaType,
                                  LuceneDocumentMetadata documentMetadata)
```
    Create new Tika-based content extractor using the provided parser instance and optional media type validation. If validation is enabled, the implementation will try to detect the media type of the input and validate it against media types supported by the parser.
    
    Parameters:
    
    parser - parser instancethis.contentFieldName
    
    validateMediaType - enabled or disable media type validation
    
    documentMetadata - documentMetadata
  - TikaLuceneContentExtractor
```
public TikaLuceneContentExtractor(List<org.apache.tika.parser.Parser> parsers,
                                  LuceneDocumentMetadata documentMetadata)
```
    Create new Tika-based content extractor using the provided parser instance and optional media type validation. If validation is enabled, the implementation will try to detect the media type of the input and validate it against media types supported by the parser.
    
    Parameters:
    
    parsers - parsers instancethis.contentFieldName
    
    documentMetadata - documentMetadata
- Method Detail
  - extract
```
public org.apache.lucene.document.Document extract(InputStream in)
```
    Extract the content and metadata from the input stream. Depending on media type validation, the detector could be run against input stream in order to ensure that parser supports this type of content.
    
    Parameters:
    
    in - input stream to extract the content and metadata from
    
    Returns:
    
    the extracted document or null if extraction is not possible or was unsuccessful
  - extract
```
public org.apache.lucene.document.Document extract(InputStream in,
                                                   LuceneDocumentMetadata documentMetadata)
```
    Extract the content and metadata from the input stream. Depending on media type validation, the detector could be run against input stream in order to ensure that parser supports this type of content.
    
    Parameters:
    
    in - input stream to extract the content and metadata from
    
    documentMetadata - documentMetadata
    
    Returns:
    
    the extracted document or null if extraction is not possible or was unsuccessful
  - extractContent
```
public org.apache.lucene.document.Document extractContent(InputStream in)
```
    Extract the content only from the input stream. Depending on media type validation, the detector could be run against input stream in order to ensure that parser supports this type of content.
    
    Parameters:
    
    in - input stream to extract the content from
    
    Returns:
    
    the extracted document or null if extraction is not possible or was unsuccessful
  - extractMetadata
```
public org.apache.lucene.document.Document extractMetadata(InputStream in)
```
    Extract the metadata only from the input stream. Depending on media type validation, the detector could be run against input stream in order to ensure that parser supports this type of content.
    
    Parameters:
    
    in - input stream to extract the metadata from
    
    Returns:
    
    the extracted document or null if extraction is not possible or was unsuccessful
  - extractMetadata
```
public org.apache.lucene.document.Document extractMetadata(InputStream in,
                                                           LuceneDocumentMetadata documentMetadata)
```
    Extract the metadata only from the input stream. Depending on media type validation, the detector could be run against input stream in order to ensure that parser supports this type of content.
    
    Parameters:
    
    in - input stream to extract the metadata from
    
    documentMetadata - documentMetadata
    
    Returns:
    
    the extracted document or null if extraction is not possible or was unsuccessful

Class TikaLuceneContentExtractor

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

TikaLuceneContentExtractor

TikaLuceneContentExtractor

TikaLuceneContentExtractor

TikaLuceneContentExtractor

TikaLuceneContentExtractor

Method Detail

extract

extract

extractContent

extractMetadata

extractMetadata