الخلاصة:
While large language models perform well in answering general questions, their
deployment in specialized domains such as law faces several challenges, including
generating inaccurate answers or responses unsupported by legal texts, and difficulty
handling complex questions due to the lack of high-quality specialized data. These
challenges are even more pronounced in the Algerian legal context, where Arabic
legal texts are often limited and poorly digitized. This thesis aims to develop a
legal question-answering system in Arabic based on Algerian tax law by combining
dense semantic retrieval with a generative language model. The work includes
several phases: collecting legal texts from a reliable source, preprocessing them,
segmenting them into legal articles, representing them using models adapted to
the Arabic language such as AraBERT and E5, and archiving them using FAISS
to facilitate retrieval. Then, a generative model is used to formulate the answer
based on the retrieved article. The system was implemented using Python in the
Google Colab environment and was evaluated based on retrieval quality and answer
accuracy.
The experimental results demonstrated that the semantic retrieval approach
using the E5 model achieved a recall of 91%, significantly outperforming keyword-
based methods such as BM25. Furthermore, the integration of the retrieved content
with a fine-tuned generative model led to more legally grounded and fluent answers,
especially in handling multi-layered questions. These findings highlight the effec-
tiveness of combining semantic search with generative modeling in addressing the
unique challenges of Arabic legal question answering in the Algerian tax context.