semlix 3.0 release notes¶

semlix 3.0.0¶

This is a major release that rebrands the project from Whoosh to semlix and adds powerful semantic search capabilities while maintaining full backward compatibility with existing Whoosh code.

Major Changes¶

Project Rebrand: Complete rebrand from Whoosh to semlix. The name “semlix” stands for Semantic + Lexical + Index (highlighting the S, L, and I letters), reflecting the library’s hybrid search capabilities.
Semantic Search: Added comprehensive semantic search functionality that combines traditional lexical (keyword-based) search with modern vector-based semantic similarity search. This allows semlix to understand meaning and context beyond simple keyword matching.
Hybrid Search: New hybrid search system that intelligently combines lexical and semantic search results using multiple fusion algorithms (RRF, Linear, DBSF).
Backward Compatibility: All existing Whoosh code continues to work without modification. The rebrand is transparent to existing users.

New Features¶

Semantic Search Components¶

semlix.semantic.HybridIndexWriter: Index writer that maintains both lexical (semlix) and semantic (vector) indexes in sync.
semlix.semantic.HybridSearcher: Searcher that performs hybrid search combining lexical and semantic results.
semlix.semantic.stores.VectorStore: Base interface for vector storage. Implementations include:
- semlix.semantic.stores.NumpyVectorStore: Pure Python implementation using NumPy arrays.
- semlix.semantic.stores.FaissVectorStore: High-performance implementation using Facebook’s FAISS library for large-scale deployments.

Embedding Providers¶

semlix.semantic.SentenceTransformerProvider: Uses sentence-transformers library for local embedding generation.
semlix.semantic.OpenAIProvider: Integration with OpenAI’s embedding API.
semlix.semantic.CohereProvider: Integration with Cohere’s embedding API.
semlix.semantic.HuggingFaceInferenceProvider: Uses Hugging Face Inference API for embeddings.

Result Fusion¶

RRF (Reciprocal Rank Fusion): Default fusion method that combines results from multiple sources using reciprocal ranking.
Linear Fusion: Weighted linear combination of scores.
DBSF (Distributed Borda Score Fusion): Advanced fusion algorithm for distributed search scenarios.

API Changes¶

The whoosh_index parameter in semantic search classes has been renamed to index for consistency and clarity:
- semlix.semantic.HybridIndexWriter: index parameter instead of whoosh_index
- semlix.semantic.HybridSearcher: index parameter instead of whoosh_index
- semlix.semantic.build_vector_store_from_index(): index parameter instead of whoosh_index
Internal variable names updated for consistency:
- _whoosh_writer → _writer in semlix.semantic.HybridIndexWriter
- _WhooshBase → _SemlixBase in semlix.compat (internal)
Default file extension for temporary indexes changed from .whoosh to .semlix in semlix.util.testing.TempDir.
Google App Engine namespace changed from "whooshlocks" to "semlixlocks" in semlix.filedb.gae.MemcacheLock.

Package Structure¶

Package renamed from whoosh to semlix:
- All imports now use semlix instead of whoosh
- Source code moved from src/whoosh/ to src/semlix/
- All module paths updated accordingly
New semantic search modules:
- semlix.semantic: Core semantic search functionality
- semlix.semantic.stores: Vector store implementations
- semlix.semantic.embeddings: Embedding provider implementations

Documentation¶

Complete documentation update reflecting the rebrand to semlix.
New semantic search documentation in Semantic Search covering:
- Getting started with semantic search
- Hybrid indexing and searching
- Embedding providers
- Vector stores
- Result fusion algorithms
- Migration guide
All code examples updated to use semlix imports and API.
Historical references to Whoosh maintained where appropriate to acknowledge the project’s origins.

Installation¶

Package name changed from whoosh to semlix on PyPI.
Basic installation:
```
pip install semlix
```
With semantic search capabilities:
```
pip install semlix[semantic]
```
Full semantic search with all providers and FAISS support:
```
pip install semlix[semantic-full]
```

Compatibility¶

Fully backward compatible: All existing Whoosh code works without modification. Simply change imports from whoosh to semlix.
Index format compatibility: semlix 3.0 can read and write indexes created by Whoosh 2.x. The index format remains compatible.
API compatibility: All public APIs remain the same, with the exception of semantic search classes where whoosh_index parameter was renamed to index.
Format names: Legacy format names (whoosh3, whoosh2) are maintained for compatibility with existing indexes.

Project Information¶

Repository moved to: https://github.com/semlix/semlix
Maintained by: Alberto Ferrer (albertof@barrahome.org)
Based on: Whoosh (created by Matt Chaput)
License: Simplified BSD (two-clause) license

Migration Guide¶

For existing Whoosh users:

Update imports: Change all from whoosh and import whoosh to from semlix and import semlix.
Update package installation: Uninstall whoosh and install semlix:
```
pip uninstall whoosh
pip install semlix
```
No code changes required: All existing code continues to work. Your indexes, schemas, and queries work exactly as before.
Optional: Add semantic search: To add semantic search capabilities, see the Semantic Search documentation.

Example migration:

# Before (Whoosh)
from whoosh.index import create_in
from whoosh.fields import Schema, TEXT, ID

# After (semlix)
from semlix.index import create_in
from semlix.fields import Schema, TEXT, ID

# Everything else works the same!

Internal Changes¶

Updated all internal references from “Whoosh” to “semlix” in:
- Docstrings and comments
- Error messages
- Logging namespaces
- Test data and examples
Maintained historical references where appropriate (e.g., URLs, email addresses in examples, format names).
Updated project metadata in setup.py and configuration files.

Dependencies¶

Core: No new dependencies. semlix remains a pure Python library with minimal dependencies.
Semantic search: Optional dependencies for semantic search features:
- numpy: Required for semantic search (included in semlix[semantic])
- sentence-transformers: For local embedding generation
- openai: For OpenAI embeddings
- cohere: For Cohere embeddings
- huggingface_hub: For Hugging Face Inference API
- faiss-cpu or faiss-gpu: For high-performance vector storage

Performance¶

Semantic search performance depends on the chosen vector store:
- NumpyVectorStore: Good for small to medium indexes (< 1M documents)
- FaissVectorStore: Optimized for large-scale indexes with millions of documents
Hybrid search adds minimal overhead to lexical search while providing significant improvements in search quality for conceptual queries.
Embedding generation can be batched for efficiency using the batch_size parameter in semlix.semantic.HybridIndexWriter.

Future Plans¶

Continued development of semantic search features
Performance optimizations for large-scale deployments
Additional embedding provider integrations
Enhanced fusion algorithms
Improved documentation and examples