rb_line_joiner.py

Rule-based line joiner that joins a list of text lines into a single line, adding spaces between lines only where necessary and rejoining hyphenated words.

class malti.line_joiner.rb_line_joiner.rb_line_joiner.RBLineJoiner

Bases: LineJoiner

Rule-based line joiner that joins a list of text lines into a single line, adding spaces between lines only where necessary and rejoining hyphenated words.

__init__() None

Initialiser.

Return type:

None

is_hyphenated_word_at_end(line: str) bool

Check if the line ends with a hyphenated (partial) word.

Parameters:

line (str) – The line to check.

Returns:

Whether the line ends with a hyphenated word.

Return type:

bool

join_lines(lines: list[str], fix_hyphenated_words: bool = False) str

Join a list of Maltese text lines into one string, adding a space where necessary. Optionally, try to join hyphenated word segments back together as well.

A space should not be put between two lines if it ends in a dash, em dash, or slash (for URLs), provided that the character before it is not a space. Examples:

  • ['dak', 'kelb'] -> 'dak kelb'

  • ['il-', 'kelb'] -> 'il-kelb'

  • ['dak kelb -', 'litteralment'] -> 'dak kelb - litteralment'

  • ['- item 1', '- item 2'] -> '- item 1 - item 2'

Parameters:
  • lines (list[str]) – A list of Maltese text lines.

  • fix_hyphenated_words (bool) – Whether to try to join hyphenated word segments back together as well.

Returns:

The joined lines.

Return type:

str