Sunday, August 24, 2014

Content Enrichment callout to properly map crawled data format

In combination with SharePoint BCS, SharePoint Enterprise Search can crawl external business systems as datasources. Starting SharePoint 2013, BCS can now also consume REST/OData services. In an OData service response, the type of all data fields is by nature text. The values of SharePoint Search crawled properties are all also of raw text. In the Enterprise Search content processing pipeline this text-based value can be mapped to a managed property of specific datatype, e.g. text, integer, decimal, date and time. In order to successful and meaningful map from text dataformat to specific datatype, the text format must be parsable via current localization / culture into that specific datatype. In case not, the mapping will fail and the value of managed property will be null.
When crawling an external data source, you typically do not have control over the dataformat. In case the dataformat does not match the localization format, the mapping thus fails.
Example: OData returns date-information in format ‘YYYYMMDD’; the default localization datetime formats do not support this and the mapped managed property of type Date and Time contains null value:
In such situation, you can utilize SharePoint 2013 content enrichment capability to explicit parse the crawled non-localized dataformat.
The approach is as follows:
  1. Remove the mapping from the Managed Property of type Date and Time
  2. Create a new Managed Property of type text, and map this to crawled property
  3. Create an implementation of IcontentProcessingEnrichmentService; with the ProcessItem method set to parse ‘YYYYMMDD’ into a Property<DateTime>
  4. Configure SharePoint Search Application to map the new managed property (input) to the datetime managed property (output)
  5. Issue a full crawl on the business data content source.
The result in SharePoint index:

No comments:

Post a Comment