A Job to check if Solr slaves are in sync with master

Scenario:

  • We use Solr for our searches and showing product pages in our Sitecore website.
  • We have multiple Solr slaves behind the load balancer which are replicated from master.
  • These Solr slaves fetch the product data for our searches
  • The issue was that out of our 20 odd CD servers some products were showing latest products but others not.
  • On careful observation we found that one of the multiple SOLR slaves was out of sync and from behind the load balancer when this SOLR server returned results to the CD servers, it was not showing latest products.
  • We decided to create a scheduled job alert which would show an alert message if the document count from slave SOLR server does not match with the master SOLR server.

Solution:

  • Here is the alert script written in PowerShell
  • The script holds master Solr servernames, the list of slaves Solr server names, the list of all the indexes to compare into variables.
  • A Solr query is used to return the number of documents for a particular  index – $docCountFilter = “select?q=*%3A*&rows=1&fl=numfound&wt=json&indent=true”.
  • It loops through the list of indexes and server names to retrieve the number of documents from master and slaves and then compares them.
  • If there is any mismatch or servers are out of sync then we add that server name to the variable $outofsynchServers
  • Finally we return the array of those servers.
  • While you create the alert you can make it run periodically( say every 5-10 mts) the alert will get fired if the return value is not a blank.
  • This exercise helped us to monitor our Solr instances and before business or user can know about any server who are displaying latest products, we come to know and reset the faulty Solr to sync it by taking it out of the mix behind the load balancer. Once fixed we add the that Solr slave back to the mix.
  • $masterServerNames = “mastersolrIP”

    $slaveServerNames = @(“slavesolr1IP”,”slavesolr2IP”,”slavesolr3IP”)

    $solrIndexNames = @(“catalog_index_web”,”sitecore_index_web”)

    $outofsynchServers = [System.Collections.ArrayList]@()

    $docCountFilter = “select?q=*%3A*&rows=1&fl=numfound&wt=json&indent=true”

    foreach($solrIndex in $solrIndexNames){

    $solrMasterNameWithPort = “http://$($masterServerNames)/solr”

    $solrMasterDocCount = Invoke-WebRequest “$($solrMasterNameWithPort)/$solrIndex/$($docCountFilter)” | ConvertFrom-Json

    #write-host “$($solrMasterNameWithPort)/$solrIndex/$($docCountFilter)”

    #write-host “Solr Master $($($masterServerNames)) Catalog Web Doc Count: ” $solrMasterDocCount.response.numfound

    foreach($slave in $slaveServerNames){

    $solrSlaveNameWithPort = “http://$($slave)/solr”

    $solrSlaveDocCount = Invoke-WebRequest “$($solrSlaveNameWithPort)/$solrIndex/$($docCountFilter)” | ConvertFrom-Json

    #write-host “$($solrSlaveNameWithPort)/$solrIndex/$($docCountFilter)”

    #write-host “Solr Slave $($slave) Catalog Web Doc Count: ” $solrSlaveDocCount.response.numfound

    If ($solrMasterDocCount.response.numfound -ne $solrSlaveDocCount.response.numfound)

    {

    $outofsynchServers+=($slave);

    }

    }

    }

    $outofsynchServers | select -uniq

Leave a comment