Coveo Database Connector: Refresh Schedule
Assuming that you have set up Coveo database connector correctly and that you are able to index data, the next challenge you will probably face is how to set up a refresh schedule. If you are dealing with a database source that’s frequently updated, you need to propagate these changes to your index so that your search results display up-to-date data. This can be challenging if you’re dealing with a large data set. Let’s look at some of the options for setting up a refresh scheduling strategy.
Setting a Refresh Schedule
Incremental Refresh will allow you to propagate new and modified items from your database source to your index. You can set up Incremental Refresh on short intervals to have up-to-date data in your index. As a safety net, you probably want to do a full Rebuild once in a while to handle deleted items and make sure that your index data is consistent.
In the example below, we have set up Incremental Refresh to run every 30 minutes and Rebuild to run once a week, every Sunday.
A prerequisite for this feature is to have a Date Type field in your database source. Also, the date field must be updated in the database each time the record is updated. To set up Incremental Refresh you must use this ‘field in WHERE’ clause when configuring your query. As you can see in the example below, the field name from our database is dateModified. In order to fetch only the last content from our database source, we use @LastRefresh. Coveo will then dynamically populate this variable data based on the time of last refresh.
<Accessor type="query" OrderByFieldName="dateCreated" OrderByFieldType="DateTime" IncrementalRefreshFieldName="dateModified"> <![CDATA[ Select id, title, dateModified, content, author FROM blog WHERE dateModified>=@LastRefresh order by dateModified; ]]> </Accessor>
Don’t Forget to Set a Maintenance Rebuild Schedule
Ok, we are indexing just modified data every 30 minutes and the time has come for a weekly cleanup and we want to schedule a full Rebuild. Coveo will use the same query and populate the value of @LastRefresh to pull all the content from our database source. But what happens if this is a large data set? Trying to fetch it could take a long time and possibly end up timing out your session?
In this case, we want to use pagination and fetch our database records in batches. To be able to do that, we must add an OFFSET clause when configuring our query. Coveo will set the values of @startRow and @endRow to run the query multiple times until it fetches all the records. Now a full index Rebuild can be performed.
<Accessor type="query" OrderByFieldName="dateCreated" OrderByFieldType="DateTime" IncrementalRefreshFieldName="dateModified"> <![CDATA[ Select id, title, dateModified, content, author FROM blog WHERE dateModified>=@LastRefresh order by dateModified /* ADD PAGINATION TO YOUR QUERY */ OFFSET @startRow ROWS FETCH NEXT (@endRow-@startRow) ROWS ONLY; ]]> </Accessor>
Configure Your Batch Page Size
If you are using Coveo On-Prem you are probably wondering how you can set your batch page sizes? The answer is you can add it in your index configuration as an additional parameter – QueryPageSize.
Working with Coveo database connector is fun. It allows you to index the database directly and integrate database content into a unified Coveo index. Setting up your refresh schedule can be tricky, but Coveo’s scheduling options are flexible enough to meet your requirements. I hope you’ll find this blog post useful when choosing the right scheduling strategy.
Happy Coveoing 🙂 Until next time. Zoran