semlix 3.0 release notes¶
semlix 3.0.0¶
This is a major release that rebrands the project from Whoosh to semlix and adds powerful semantic search capabilities while maintaining full backward compatibility with existing Whoosh code.
Major Changes¶
Project Rebrand: Complete rebrand from Whoosh to semlix. The name “semlix” stands for Semantic + Lexical + Index (highlighting the S, L, and I letters), reflecting the library’s hybrid search capabilities.
Semantic Search: Added comprehensive semantic search functionality that combines traditional lexical (keyword-based) search with modern vector-based semantic similarity search. This allows semlix to understand meaning and context beyond simple keyword matching.
Hybrid Search: New hybrid search system that intelligently combines lexical and semantic search results using multiple fusion algorithms (RRF, Linear, DBSF).
Backward Compatibility: All existing Whoosh code continues to work without modification. The rebrand is transparent to existing users.
New Features¶
Semantic Search Components¶
semlix.semantic.HybridIndexWriter: Index writer that maintains both lexical (semlix) and semantic (vector) indexes in sync.semlix.semantic.HybridSearcher: Searcher that performs hybrid search combining lexical and semantic results.semlix.semantic.stores.VectorStore: Base interface for vector storage. Implementations include:semlix.semantic.stores.NumpyVectorStore: Pure Python implementation using NumPy arrays.semlix.semantic.stores.FaissVectorStore: High-performance implementation using Facebook’s FAISS library for large-scale deployments.
Embedding Providers¶
semlix.semantic.SentenceTransformerProvider: Uses sentence-transformers library for local embedding generation.semlix.semantic.OpenAIProvider: Integration with OpenAI’s embedding API.semlix.semantic.CohereProvider: Integration with Cohere’s embedding API.semlix.semantic.HuggingFaceInferenceProvider: Uses Hugging Face Inference API for embeddings.
Result Fusion¶
RRF (Reciprocal Rank Fusion): Default fusion method that combines results from multiple sources using reciprocal ranking.
Linear Fusion: Weighted linear combination of scores.
DBSF (Distributed Borda Score Fusion): Advanced fusion algorithm for distributed search scenarios.
API Changes¶
The
whoosh_indexparameter in semantic search classes has been renamed toindexfor consistency and clarity:semlix.semantic.HybridIndexWriter:indexparameter instead ofwhoosh_indexsemlix.semantic.HybridSearcher:indexparameter instead ofwhoosh_indexsemlix.semantic.build_vector_store_from_index():indexparameter instead ofwhoosh_index
Internal variable names updated for consistency:
_whoosh_writer→_writerinsemlix.semantic.HybridIndexWriter_WhooshBase→_SemlixBaseinsemlix.compat(internal)
Default file extension for temporary indexes changed from
.whooshto.semlixinsemlix.util.testing.TempDir.Google App Engine namespace changed from
"whooshlocks"to"semlixlocks"insemlix.filedb.gae.MemcacheLock.
Package Structure¶
Package renamed from
whooshtosemlix:All imports now use
semlixinstead ofwhooshSource code moved from
src/whoosh/tosrc/semlix/All module paths updated accordingly
New semantic search modules:
semlix.semantic: Core semantic search functionalitysemlix.semantic.stores: Vector store implementationssemlix.semantic.embeddings: Embedding provider implementations
Documentation¶
Complete documentation update reflecting the rebrand to semlix.
New semantic search documentation in Semantic Search covering:
Getting started with semantic search
Hybrid indexing and searching
Embedding providers
Vector stores
Result fusion algorithms
Migration guide
All code examples updated to use
semliximports and API.Historical references to Whoosh maintained where appropriate to acknowledge the project’s origins.
Installation¶
Package name changed from
whooshtosemlixon PyPI.Basic installation:
pip install semlix
With semantic search capabilities:
pip install semlix[semantic]
Full semantic search with all providers and FAISS support:
pip install semlix[semantic-full]
Compatibility¶
Fully backward compatible: All existing Whoosh code works without modification. Simply change imports from
whooshtosemlix.Index format compatibility: semlix 3.0 can read and write indexes created by Whoosh 2.x. The index format remains compatible.
API compatibility: All public APIs remain the same, with the exception of semantic search classes where
whoosh_indexparameter was renamed toindex.Format names: Legacy format names (
whoosh3,whoosh2) are maintained for compatibility with existing indexes.
Project Information¶
Repository moved to: https://github.com/semlix/semlix
Maintained by: Alberto Ferrer (albertof@barrahome.org)
Based on: Whoosh (created by Matt Chaput)
License: Simplified BSD (two-clause) license
Migration Guide¶
For existing Whoosh users:
Update imports: Change all
from whooshandimport whooshtofrom semlixandimport semlix.Update package installation: Uninstall
whooshand installsemlix:pip uninstall whoosh pip install semlix
No code changes required: All existing code continues to work. Your indexes, schemas, and queries work exactly as before.
Optional: Add semantic search: To add semantic search capabilities, see the Semantic Search documentation.
Example migration:
# Before (Whoosh)
from whoosh.index import create_in
from whoosh.fields import Schema, TEXT, ID
# After (semlix)
from semlix.index import create_in
from semlix.fields import Schema, TEXT, ID
# Everything else works the same!
Internal Changes¶
Updated all internal references from “Whoosh” to “semlix” in:
Docstrings and comments
Error messages
Logging namespaces
Test data and examples
Maintained historical references where appropriate (e.g., URLs, email addresses in examples, format names).
Updated project metadata in
setup.pyand configuration files.
Dependencies¶
Core: No new dependencies. semlix remains a pure Python library with minimal dependencies.
Semantic search: Optional dependencies for semantic search features:
numpy: Required for semantic search (included insemlix[semantic])sentence-transformers: For local embedding generationopenai: For OpenAI embeddingscohere: For Cohere embeddingshuggingface_hub: For Hugging Face Inference APIfaiss-cpuorfaiss-gpu: For high-performance vector storage
Performance¶
Semantic search performance depends on the chosen vector store:
NumpyVectorStore: Good for small to medium indexes (< 1M documents)FaissVectorStore: Optimized for large-scale indexes with millions of documents
Hybrid search adds minimal overhead to lexical search while providing significant improvements in search quality for conceptual queries.
Embedding generation can be batched for efficiency using the
batch_sizeparameter insemlix.semantic.HybridIndexWriter.
Future Plans¶
Continued development of semantic search features
Performance optimizations for large-scale deployments
Additional embedding provider integrations
Enhanced fusion algorithms
Improved documentation and examples