Welcome to the homepage of Turkish National Corpus.

Turkish National Corpus (TNC) with a size of 50 million words, is a balanced and a representative corpus of contemporary Turkish. It consists of samples of textual data across a wide variety of genres covering a period of 20 years (1990-2009). Written component consists of texts produced in different domains on various topics. Transcriptions from spoken data constitute 2% of TNC’s database, which involves spontaneous, every day conversations and speeches collected in particular communicative settings.
TNC-Demo Version with its 4438 different text samples represents 9 domains and 34 different genres. From a size of 48 million words collection, users will be able to perform queries by defining restrictions to generate outputs from media, text sample, domain, derived text type, sex of author, type of author, text genre, as well as the audience of the text.
TNC-Demo version is RELEASED
For registration and full access, check out the “Query Interface” menu on the left.
Publishing TNC-based studies:

(i) Concordance lines or statistical values provided by TNC cannot be manipulated or changed.

(ii) TNC-based studies should include the following publication for citation.

Aksan, Y. et al. (2012). Construction of the Turkish National Corpus (TNC). In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012). İstanbul. Turkiye. http://www.lrec-conf.org/proceedings/lrec2012/papers.html

(iii) TNC-based publications will be submitted and announced through TNC web site. Please use “Add Publication” menu on the left for submission.