Gensim

Gensim
Original authorRadim Řehůřek
DeveloperRARE Technologies Ltd.
Initial release2009
Stable release
4.4.0 / 16 October 2025 (16 October 2025)
Written inPython
Operating systemLinux, Windows, macOS
TypeInformation retrieval
LicenseLGPL
Websiteradimrehurek.com/gensim/
Repositorygithub.com/RaRe-Technologies/gensim

Gensim is an open-source library for unsupervised topic modeling, document indexing, retrieval by similarity, and other natural language processing functionalities, using modern statistical machine learning.

Gensim is implemented in Python and Cython for performance. Gensim is designed to handle large text collections using data streaming and incremental online algorithms, which differentiates it from most other machine learning software packages that target only in-memory processing.