😴 🧙🌈 ʕ•ᴥ•ʔ

The following discusses the importance of indexing data for efficient searches and highlights cases where full scans prove to be better engineering choices than implementing indexing. The author presents several examples to support this perspective.

The article concludes by advising that unless dealing with hundreds of millions of records, it is recommended to start with simple scans and only resort to indexing if acceptable performance cannot be achieved. Furthermore, even in scenarios where indexing becomes necessary, if queries are rare and diverse, it may still be more advantageous to perform the work at query time rather than during data ingestion.


Sometimes you have a lot of data, and one approach to support quick searches is pre-processing it to build an index so a search can involve only looking at a small fraction of the total data. The threshold at which it’s worth switching to indexing, though, might be higher than you’d guess. Here are some cases I’ve worked on where full scans were better engineering choices:

Unless you know from the start that you’ll be searching hundreds of millions of records, consider starting with simple scans and only add indexing if you can’t get acceptable performance. And even then, if queries are rare and highly varied you may still do better to do the work at query time instead of ingestion time.

#reads #tech #database #indexes #jeff kaufman