rb_line_joiner.py
Rule-based line joiner that joins a list of text lines into a single line, adding spaces between lines only where necessary and rejoining hyphenated words.
- class malti.line_joiner.rb_line_joiner.rb_line_joiner.RBLineJoiner
Bases:
LineJoinerRule-based line joiner that joins a list of text lines into a single line, adding spaces between lines only where necessary and rejoining hyphenated words.
- __init__() None
Initialiser.
- Return type:
None
- is_hyphenated_word_at_end(line: str) bool
Check if the line ends with a hyphenated (partial) word.
- Parameters:
line (str) – The line to check.
- Returns:
Whether the line ends with a hyphenated word.
- Return type:
bool
- join_lines(lines: list[str], fix_hyphenated_words: bool = False) str
Join a list of Maltese text lines into one string, adding a space where necessary. Optionally, try to join hyphenated word segments back together as well.
A space should not be put between two lines if it ends in a dash, em dash, or slash (for URLs), provided that the character before it is not a space. Examples:
['dak', 'kelb']->'dak kelb'['il-', 'kelb']->'il-kelb'['dak kelb -', 'litteralment']->'dak kelb - litteralment'['- item 1', '- item 2']->'- item 1 - item 2'
- Parameters:
lines (list[str]) – A list of Maltese text lines.
fix_hyphenated_words (bool) – Whether to try to join hyphenated word segments back together as well.
- Returns:
The joined lines.
- Return type:
str