Added README

parent fab3f912
Pipeline #233 canceled with stage
#+TITLE: Simple Mattér Parser
#+AUTHOR: Lucien Cartier-Tilet
[[http://spacemacs.org][file:https://cdn.rawgit.com/syl20bnr/spacemacs/442d025779da2f62fc86c2082703697714db6514/assets/spacemacs-badge.svg]]
* Simple Mattér parser
/Simple Mattér Parser/, or /SMP/ for short is exactly what you think it is: a
simple parser for Mattér. But you might be wondering what Mattér is: it is a
constructed language I am working on, inspired by Nordic languages, especially
Old Norse and Icelandic.
* What it does
/SMP/ will first load a gloss dictionary from a csv file named
~matter-dict.csv~. This file contains two columns:
1. the clitics of the language, or if you prefer the roots of its words
2. the linguistic gloss of these roots
Based on this, the program will then try to detect which words appear in an
input text fed to the program as its firt (and only) argument. If several
clitics can be detected in a single word, the longest one will be used by the
program. It then separates for each word the root with its suffixes which will
be analized, detecting the number suffix of the word, the possessive suffix
and its declension. If any of these is detected, it will be added to the
word’s gloss.
Verbs can also be analized to a certain extent: prefixes are analized
separately, and irregular words are marked as being unknown to the program.
* Usage example
To compile this program, you will need Rust >= 1.30 with cargo. To compile and
run this program, you can execute the following command in your shell:
#+BEGIN_SRC sh
cargo run --release -- matter.txt
#+END_SRC
~matter.txt~ being any text file you want containing some Mattér text. This
will generate a file, ~output.xml~, which will contain the result parse of the
text and its gloss.
* Example
As an example, the following text will be fed to the program:
#+BEGIN_QUOTE
Em meþ Gunnarac annéðant þynea. An ænant caupage, ar annéð caupe. Fe en eppelant etano Éþtrið fent etano? Þror eppelant feþ geffo? Du feċ gei? Hint fec fém gér? Fon landytoċ beþt bƿand? Feren Mattérant frégei? Ferve Mattérant frégei? Eppeleþant eða cirþabérant, fertið y caupei? Ferden urbyþ gon? Fertið bryðdegdynant haþt? Fertiðoċ Mattérant frégei? Fertiðac y ċilde?
#+END_QUOTE
And here is the result of the parse on this text:
#+BEGIN_SRC xml
<?xml version="1.0" encoding="utf-8"?>
<text>
<sentence>
<word text="em" morpheme="em" gloss="art.dem.sg" />
<word text="meþ" morpheme="meþ" gloss="n" />
<word text="gunnarac" morpheme="gunnar" gloss="np-ABL" />
<word text="annéðant" morpheme="annéð" gloss="adj-ACC" />
<word text="þynea" morpheme="þyn" gloss="vt-3sg.imperf" />
</sentence>
<sentence>
<word text="an" morpheme="an" gloss="art.dem.sg.near" />
<word text="ænant" morpheme="æn" gloss="nbr-ACC" />
<word text="caupage" morpheme="caup" gloss="vt-sg.IMPER" />
<word text="ar" morpheme="ar" gloss="conj" />
<word text="annéð" morpheme="annéð" gloss="adj" />
<word text="caupe" morpheme="caup" gloss="vt-1sg.imperf" />
</sentence>
<sentence>
<word text="fe" morpheme="fe" gloss="pron.q.nom" />
<word text="en" morpheme="en" gloss="art.def.sg.nhum" />
<word text="eppelant" morpheme="eppel" gloss="n-ACC" />
<word text="etano" morpheme="et" gloss="vt" />
<word text="éþtrið" morpheme="éþtrið" gloss="np" />
<word text="fent" morpheme="fent" gloss="pron.q.acc" />
<word text="etano" morpheme="et" gloss="vt" />
</sentence>
<sentence>
<word text="þror" morpheme="þror" gloss="np" />
<word text="eppelant" morpheme="eppel" gloss="n-ACC" />
<word text="feþ" morpheme="feþ" gloss="pron.q.dat" />
<word text="geffo" morpheme="geff" gloss="vt-1+3sg.perf" />
</sentence>
<sentence>
<word text="du" morpheme="du" gloss="pron.2sg.nom" />
<word text="feċ" morpheme="feċ" gloss="pron.q.loc" />
<word text="gei" morpheme="g" gloss="vi-2sg.imperf" />
</sentence>
<sentence>
<word text="hint" morpheme="hint" gloss="pron.3sg.n.acc" />
<word text="fec" morpheme="fec" gloss="pron.q.abl" />
<word text="fém" morpheme="fém" gloss="pron.q.limit" />
<word text="gér" morpheme="gér" gloss="vt" />
</sentence>
<sentence>
<word text="fon" morpheme="fon" gloss="pron.q.gen" />
<word text="landytoċ" morpheme="landyt" gloss="n-LOC" />
<word text="beþt" morpheme="beþt" gloss="unknown" />
<word text="bƿand" morpheme="bƿ" gloss="vi-part.prog" />
</sentence>
<sentence>
<word text="feren" morpheme="feren" gloss="pron.q.goal" />
<word text="mattérant" morpheme="mattér" gloss="np-ACC" />
<word text="frégei" morpheme="frég" gloss="vt-2sg.imperf" />
</sentence>
<sentence>
<word text="ferve" morpheme="ferve" gloss="pron.q.motivation" />
<word text="mattérant" morpheme="mattér" gloss="np-ACC" />
<word text="frégei" morpheme="frég" gloss="vt-2sg.imperf" />
</sentence>
<sentence>
<word text="eppeleþant" morpheme="eppel" gloss="n-pl-ACC" />
<word text="eða" morpheme="eða" gloss="adv" />
<word text="cirþabérant" morpheme="cirþabér" gloss="n-ACC" />
<word text="fertið" morpheme="fertið" gloss="pron.q.loc.temp" />
<word text="y" morpheme="y" gloss="aux.fut" />
<word text="caupei" morpheme="caup" gloss="vt-2sg.imperf" />
</sentence>
<sentence>
<word text="ferden" morpheme="ferden" gloss="pron.q.instr" />
<word text="urbyþ" morpheme="urby" gloss="n-pl" />
<word text="gon" morpheme="g" gloss="vi-3pl.perf" />
</sentence>
<sentence>
<word text="fertið" morpheme="fertið" gloss="pron.q.loc.temp" />
<word text="bryðdegdynant" morpheme="bryðdeg" gloss="n-POSS.2sg-ACC" />
<word text="haþt" morpheme="haþt" gloss="unknown" />
</sentence>
<sentence>
<word text="fertiðoċ" morpheme="fertiðoċ" gloss="pron.q.abl.temp" />
<word text="mattérant" morpheme="mattér" gloss="np-ACC" />
<word text="frégei" morpheme="frég" gloss="vt-2sg.imperf" />
</sentence>
<sentence>
<word text="fertiðac" morpheme="fertiðac" gloss="pron.q.limit.temp" />
<word text="y" morpheme="y" gloss="aux.fut" />
<word text="ċilde" morpheme="ċild" gloss="vi-1sg.imperf" />
</sentence>
</text>
#+END_SRC
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment