Skip to content

Latest commit

 

History

History
16 lines (9 loc) · 531 Bytes

README.md

File metadata and controls

16 lines (9 loc) · 531 Bytes

Serbian Language Pipeline for Spacy

Work in progress. Far from production ready.

How to use with Spacy?

...

Data files

For testing training, we're using the UD dataset, which has been automatically converted to Cyrillic. This is temporary. We will eventually use our own training data.

Lemmatizer data

  • data originates from Morpho-SLaWS (Tasovac, Rudan and Rudan 2015) and Transpoetika (Tasovac 2012)
  • currently includes both Ekavian and Jekavian forms, I may move Jekavians to the normalization function