spelling module¶
See correcting errors in user queries.
This module contains helper functions for correcting typos in user queries.
Corrector objects¶
- class semlix.spelling.Corrector¶
Base class for spelling correction objects. Concrete sub-classes should implement the
_suggestionsmethod.- suggest(text, limit=5, maxdist=2, prefix=0)¶
- Parameters:
text – the text to check. This word will not be added to the suggestions, even if it appears in the word graph.
limit – only return up to this many suggestions. If there are not enough terms in the field within
maxdistof the given word, the returned list will be shorter than this number.maxdist – the largest edit distance from the given word to look at. Values higher than 2 are not very effective or efficient.
prefix – require suggestions to share a prefix of this length with the given word. This is often justifiable since most misspellings do not involve the first letter of the word. Using a prefix dramatically decreases the time it takes to generate the list of words.
- class semlix.spelling.ReaderCorrector(reader, fieldname, fieldobj)¶
Suggests corrections based on the content of a field in a reader.
Ranks suggestions by the edit distance, then by highest to lowest frequency.
- class semlix.spelling.MultiCorrector(correctors, op)¶
Merges suggestions from a list of sub-correctors.
QueryCorrector objects¶
- class semlix.spelling.QueryCorrector(fieldname)¶
Base class for objects that correct words in a user query.
- correct_query(q, qstring)¶
Returns a
Correctionobject representing the corrected form of the given query.- Parameters:
q – the original
semlix.query.Querytree to be corrected.qstring – the original user query. This may be None if the original query string is not available, in which case the
Correction.stringattribute will also be None.
- Return type:
- class semlix.spelling.SimpleQueryCorrector(correctors, terms, aliases=None, prefix=0, maxdist=2)¶
A simple query corrector based on a mapping of field names to
Correctorobjects, and a list of("fieldname", "text")tuples to correct. And terms in the query that appear in list of term tuples are corrected using the appropriate corrector.- Parameters:
correctors – a dictionary mapping field names to
Correctorobjects.terms – a sequence of
("fieldname", "text")tuples representing terms to be corrected.aliases – a dictionary mapping field names in the query to field names for spelling suggestions.
prefix – suggested replacement words must share this number of initial characters with the original word. Increasing this even to just
1can dramatically speed up suggestions, and may be justifiable since spellling mistakes rarely involve the first letter of a word.maxdist – the maximum number of “edits” (insertions, deletions, subsitutions, or transpositions of letters) allowed between the original word and any suggestion. Values higher than
2may be slow.
- class semlix.spelling.Correction(q, qstring, corr_q, tokens)¶
Represents the corrected version of a user query string. Has the following attributes:
queryThe corrected
semlix.query.Queryobject.stringThe corrected user query string.
original_queryThe original
semlix.query.Queryobject that was corrected.original_stringThe original user query string.
tokensA list of token objects representing the corrected words.
You can also use the
Correction.format_string()method to reformat the corrected query string using asemlix.highlight.Formatterclass. For example, to display the corrected query string as HTML with the changed words emphasized:from semlix import highlight correction = mysearcher.correct_query(q, qstring) hf = highlight.HtmlFormatter(classname="change") html = correction.format_string(hf)