An Examine fix for Umbraco index corruption
A new Examine version 3.3.0 has been released to address a long awaited bug fix for Umbraco websites that use the SyncedFileSystemDirectoryFactory
which is the default setting for Umbraco CMS.
The bug typically means that indexes cannot be used and log entries such as Lucene.Net.Index.CorruptIndexException: invalid deletion count: 2 vs docCount=1
are present.
Understanding the problem:
The SyncedFileSystemDirectoryFactory
directory is there to avoid performance implications of rebuilding indexes on startup when a site is moved to another worker in Azure. The reason an index rebuild would occur is because in Azure, Lucene files need to work off of the local 'fast drive' (C:\%temp%), not the default/shared network 'slow drive' (D:), and whenever a site is moved, or spawned on a new worker in Azure, the local 'fast drive' is empty, meaning no indexes exist. The SyncedFileSystemDirectoryFactory
attempts to work around this challenge by continually synchronizing a copy of the indexes from the 'fast drive' to the 'slow drive' so that when a site is moved to another worker, it can sync (restore) from the 'slow drive' back to the 'fast drive' in order to avoid the index rebuild overhead.
The problem with SyncedFileSystemDirectoryFactory
is that this implementation doesn't take into account what happens if the index files in your main storage ('slow drive') become corrupted which can happen for a number of reasons - misconfiguration, network latency, process termination, etc...
Understanding the solution:
The SyncedFileSystemDirectoryFactory
has been updated to:
- Check the health of the main index if it exists ('slow drive').
- Check the health of the local index if it exists ('fast drive').
- If the main index is unhealthy or doesn't exist and the local index is healthy, it will synchronize the local index to the main index. This can occur only if a site hasn't moved to a new worker.
- If the main index is unhealthy and the local index doesn't exist or is unhealthy, then it will delete the main (corrupted) index.
- Once health checks are done, the index from main is always synced to local. If the main index was deleted due to corruption, this will mean that the local index is empty and an index rebuild will occur.
This change will attempt to keep any healthy index that is available (main vs local), but if nothing can be read, the indexes will be deleted and an index rebuild will occur.
There's also a new option to fix a corrupted index but this is not enabled by default since it can mean a loss of documents.
Understanding the rebuilding overhead
The performance overhead of index rebuilding is due to the Umbraco database queries that need to be executed in order to populate the indexes. The only reason SyncedFileSystemDirectoryFactory
exists is to prevent this overhead when hosting in Azure App Service (which is what Umbraco Cloud uses), and it can only be used on your Umbraco primary node. It does not prevent index rebuilding overhead for non-primary nodes when load balancing or scaling out because the main network 'slow drive' is shared between all workers and an index can only be read/written to be a single process.
This means that it's only useful if you are hosting in Azure App Service without any load balancing while keeping in mind that it does not always prevent index rebuilds (see above).
The index rebuilding overhead can be dramatic when load balancing or scaling out, for example: If you scale out to +5 nodes in a load balancing setup, that means that 5x nodes will be performing index rebuilds around the same time, this means that your DB is going to get hammered by queries to build all of those new indexes. The performance hit isn't the index building - it is the DB queries and this can lead to DB locks and lead to the dreaded SQL Lock Timeout issue in the Umbraco back office. Plus, if search is critical to your front-end, than for a while after your site has started up, there won't be any index which means there won't be any search until the background processing is done.
... Many of these reasons is why ExamineX was created:
- Reliably and centrally persisted indexes means nothing is out of sync between nodes.
- No index rebuilding when your site is moved or scaled = no rebuilding overhead.
- Prevents SQL Timeout locks due to DB rebuilding queries.
- Very easy to setup and seamlessly changes your Lucene based indexes to Azure/Elastic search indexes.
- Ideal when hosting Umbraco on Azure Web Apps (or Umbraco Cloud) and a perfect solution for load balancing and scaling.
- Automatically index Umbraco media file content without the need for additional indexes with support for PDFs, Microsoft Office documents and more.
- (coming soon) Automatically generate Umbraco media image descriptions, tags, locations, and more using AI allowing your editors to quickly find the media/images they need.