Tips for optimizing FoxTrot performance (updated October 31st, 2019)

FoxTrot search solutions performance on large data sets, aka “How much is too much ?

There areno hard limitsas to as to how much data Foxtrot can handle. There is no magic number either, as things depend on a variety of factors such as:

  • CPU Power
  • number of cores
  • SSD versus hard disk storage for the index
  • Access speed to the source documents,amount of ram available
  • document types
  • document sizes proportion of indexable text in indexed documents
  • number of distinct documents
  • uniqueness and indexed terms
  • …and so on…

Thesoft limits, however, can be expressed in two words:common sense

We suggest trying out Foxtrot on one or several folders of data before throwing a 450 TB volume added on first go, as we recently were asked (a prior support request record was for a 50 TB volume, so it seems that Moore’s law is accelerating !)

Seriously, it’s worth trying out an index with a somewhat reasonable le amount of data, and optimizing your chances upstream that things will work out well.

  • We recommend against indexing all of your data in the monolithic fashion; instead, define several sets of data i.e “current projects“, “formal projects“, and “attachments”, then create an index for each set of data. Also, avoid indexing a system volume, or the entire user or library folder, at least within the same index else work documents
  • Indexing a large volume of data may require a powerful computer: benefits include index storage on an SSD rather than a hard disk, a powerful CPU, reasonably abundant memory (check that there is low swap space usage), fast networking for the indexation of a NAS. A large number of cores is not necessary, but having at least four is useful.
  • Avoid indexing files containing a large amount of textual non-linguistic data (numerical data, hexadecimal data, base64, encoded data, logs, XML, Json, source code) unless they present specific indexing relevance. This kind of file can often end up identified under the Resource hogs list in Manage Indices; if they are many and/or large, these can have a considerable impact on foxtrots performance.
  • Make sure the PDF documents are correctly interpreted, especially LCRD documents, or scientific documents produced for instance by latex systems. In such cases, you may find it a great number of words have been concatenated, or improperly split

Factors in obtainingmaximum performance:

Some of these points will seem will seem obvious, but all are worth mentioning:

  • a faster computer will produce better results than a slower computer
  • cores matter: a 4-core CPU or greater will offer additional opportunities for concurrent text extraction and indexing
  • Access speed for the drive on which the index is stored is important. Insofar as possible, try to store your indices on an SSD for fastest search speed
  • Access speed to the data being indexed : for large amounts of data especially, the speed at which the files can be moved from the storage medium for text indexing (and subsequent display of hits) can be critical to initial indexing and display more than for actual searching
  • Keep the data set for each index in check. Typically, if building an index on your specific setup claims, after a few minutes of progressing through the document index, that completion is estimated to over 4-6 hours in very large cases, this is a sign that you may be trying to fit too much in a single index. Split the data to be indexed across several indices, and try to flock slow-changing reference documents in different indices than indices that need daily updating: you are making future index maintenance more manageable.
  • Before indexing an entire volume, which may contain useless data, system files, applications and other irrelevant source material, ask yourself whether the files being index might eventually have search value in the future.