Using Examine to index & search with ANY data source
During CodeGarden 2010 a few people were asking how to use Examine to index and search on data from any data source such as custom database tables, etc… Previously, the only way to do this was to override the Umbraco Examine indexing provider, remove the Umbraco functionality embedded in there, and then do a lot of coding yourself. …But now there’s some great news! As of now you can use all of the Examine goodness with it’s embedded Lucene.Net with any data source and you can do it VERY easily.
Some things you need to know about the new version:
- I haven’t made a release version of this yet as it still needs some more testing, though we are putting this into a production site next week.
- If you want to try this, currently you’ll need to get the latest source from Examine @ CodePlex
- If you are using a previous version of Examine, there’s a few breaking changes as some of the class structures have been moved, however you config file should still work as is… HOWEVER, you should update your config file to reflect the new one with the new class names
- There is now 3 DLLs, not just 2:
- Examine.DLL
- Still pretty much the same… contains the abstraction layer
- Examine.LuceneEngine.DLL
- The new DLL to use to work with data that is not Umbraco specific
- UmbracoExamine.DLL
- The DLL that the Umbraco providers are in
- Examine.DLL
Ok, now on to the good stuff. First, I’ve added a demo project to this post which you can download HERE. This project is a simple console app that contains a sample XML data file that has 5 records in it. Here’s what the app does:
- This re-indexes all data
- Searches the index for node id 1
- Ensures one record is found in the index
- Updates the dateUpdated time stamp for the data record
- Re-indexes the record with node id 1’
So assuming that you have some custom data like a custom database table, xml file, or whatever, there’s really only 3 things that you need to do to get Examine indexing your custom data:
- Create your own ISimpleDataService
- There is only 1 method to implement: IEnumerable<SimpleDataSet> GetAllData(string indexType)
- This is the method that Examine will call to re-index your data
- A SimpleDataSet is a simple object containing a Dictionary<string, string> and a IndexedNode object (which consists of a Node Id and a Node Type)
- For example, if you had a database row, your SimpleDataSet object for the row would be the dictionary of the rows values, it’s node id and type … easy.
- Use the ToExamineXml() extension method to re-index individual nodes/records
- Examine relies on data being in the same XML structure as Umbraco (which we might change in version 2 sometime in the future… like next year) so we need to transform simple data into the XML structure. We’ve made this quite easy for you; all you have to do is get the data from your custom data source into a Dictionary<string, string> object and use this extension method to pass the xml structure in to Examine’s ReIndexNode method.
- For example: ExamineManager.Instance.ReIndexNode(dataSet.ToExamineXml(dataSet["Id"], "CustomData"), "CustomData"); where dataSet is a Dictionary<string, string> .
- Update your Examine config to use the new SimpleDataIndexer index provider and the new LuceneSearcher search provider
If you’re not using Umbraco at all, then you’ll only need to have the 2 Examine DLLs which don’t reference the Umbraco DLLs whatsoever so everything is decoupled.
I’d recommend downloading the demo app and running it as it will show you everything you need to know on how to get Examine running with custom data. However, i know that people just like to see code in blog posts, so here’s the config for the demo app:
<?xml version="1.0" encoding="utf-8" ?> <configuration> <configSections> <section name="Examine" type="Examine.Config.ExamineSettings, Examine"/> <section name="ExamineLuceneIndexSets" type="Examine.LuceneEngine.Config.IndexSets, Examine.LuceneEngine"/> </configSections> <Examine> <ExamineIndexProviders> <providers> <!-- Define the indexer for our custom data. Since we're only indexing one type of data, there's only 1 indexType specified: 'CustomData', however if you have more than one type of index (i.e. Media, Content) then you just need to list them as a comma seperated list without spaces. The dataService is how Examine queries whatever data source you have, in this case it's a custom data service defined in this project. A custom data service only has to implement one method... very easy. --> <add name="CustomIndexer" type="Examine.LuceneEngine.Providers.SimpleDataIndexer, Examine.LuceneEngine" dataService="ExamineDemo.CustomDataService, ExamineDemo" indexTypes="CustomData" runAsync="false"/> </providers> </ExamineIndexProviders> <ExamineSearchProviders defaultProvider="CustomSearcher"> <providers> <!-- A search provider that can query a lucene index, no other work is required here --> <add name="CustomSearcher" type="Examine.LuceneEngine.Providers.LuceneSearcher, Examine.LuceneEngine" /> </providers> </ExamineSearchProviders> </Examine> <ExamineLuceneIndexSets> <!-- Create an index set to hold the data for our index --> <IndexSet SetName="CustomIndexSet" IndexPath="App_Data\CustomIndexSet"> <IndexUserFields> <add Name="name" /> <add Name="description" /> <add Name="dateUpdated" /> </IndexUserFields> </IndexSet> </ExamineLuceneIndexSets> </configuration>