Welcome to sulci’s documentation!

Sulci is a French text mining tool, initially designed for the analysis of the corpus and thesaurus of Libération, a French newspaper.

This code is “work in progress”, but it’s yet used in production at Libération.

Therefore, here is a demo page with the frozen 0.1 alpha version:

Sulci provides 4 algorithms, designed to be run in sequence: each algorithm needs the data provided by the previous one :

  1. Part of Speech tagging
  2. Lemmatization
  3. Collocation and key entities extraction
  4. Semantical tagging

