Sentence splitters
Sentence splitters are used to break up text represented as a single string (such as from a text file) into a list of sentences.
The split function
The simplest way to sentence split a text in malti is as follows:
1import malti.sent_splitter
2
3text = 'Eżempju ta\' sentenza. Eżempju ta\' sentenza oħra.'
4sentences = malti.sent_splitter.split(text)
5print(sentences)
['Eżempju ta\' sentenza.', 'Eżempju ta\' sentenza oħra.']
The SentSplitter class
The above is a convenience function that makes use of a default sentence splitter (KMSentSplitter in this version).
To gain access to all the features of sentence splitters, they should be used in their class form, for example:
1import malti.sent_splitter
2
3splitter = malti.sent_splitter.KMSentSplitter()
4
5text = 'Eżempju ta\' sentenza. Eżempju ta\' sentenza oħra.'
6sentences = splitter.split(text)
7print(sentences)
['Eżempju ta\' sentenza.', 'Eżempju ta\' sentenza oħra.']
Available sentence splitters
The following sentence splitters are available:
malti.tokeniser.KMSentSplitter(km_sent_splitter.py): ASentSplitterthat is equivalent to the one used to split sentences in the Korpus Malti.