Inverted index


Reading time: about 1 minute

An inverted index is a powerful data structure widely used in information retrieval systems and search engines to enable fast and efficient full-text searches.

It reverses the traditional document-to-word mapping by creating a word-to-document index, allowing for quick identification of documents containing specific terms. This approach significantly reduces search time and improves query performance, especially when dealing with large volumes of text data.

The inverted index works by breaking down documents into individual terms, removing stop words, and applying techniques like stemming to normalize the text.

It then creates a mapping where each unique term points to a list of documents or locations where that term appears. This structure enables rapid look-ups and facilitates complex queries, making it an essential component in modern search technologies and document retrieval systems.

DDL

Here’s how I store my inverted index in the database. Document size and count aren’t needed for searching with the inverted index, I use them for ranking purposes and it’s handy to have in the same table.

CREATE TABLE searchengine.inverted_index
(
  `term` String CODEC(ZSTD(9)),
  `document_id` FixedString(32),
  `doc_size` UInt64 CODEC(T64('bit'), ZSTD(9)),
  `count` UInt64 CODEC(T64('bit'), ZSTD(9))
)
ENGINE = ReplacingMergeTree
ORDER BY (term, document_id)

Citation

If you find this work useful, please cite it as:
@article{yaltirakli,
  title   = "Inverted index",
  author  = "Yaltirakli, Gokberk",
  journal = "gkbrk.com",
  year    = "2024",
  url     = "https://www.gkbrk.com/inverted-index"
}
Not using BibTeX? Click here for more citation styles.
IEEE Citation
Gokberk Yaltirakli, "Inverted index", December, 2024. [Online]. Available: https://www.gkbrk.com/inverted-index. [Accessed Dec. 27, 2024].
APA Style
Yaltirakli, G. (2024, December 27). Inverted index. https://www.gkbrk.com/inverted-index
Bluebook Style
Gokberk Yaltirakli, Inverted index, GKBRK.COM (Dec. 27, 2024), https://www.gkbrk.com/inverted-index

Comments

© 2024 Gokberk Yaltirakli