Skip to main content

What is Corpus


Join our Corpora Group on LinkedIn

Corpus:

“Corpus” is a Latin word meaning “body” [McEnery & Wilson, 1996]; hence a text corpus is any discrete body of text. The term “corpus” used in computing means a mass of electronic text, readable by machines. Corpora (the plural form of corpus) could be any form of media, such as text, speech or microfilms.

Corpora are the knowledge base used in corpus linguistics, to analyze and study language. The linguistic processing of corpora is called annotation where tools like part of speech tagging, stemming or lemmatization are applied. Annotation also includes reformulating corpora into new linguistic forms [McEnery & Oakes, 1996].

Corpora applications are used in variety fields including computational linguistics, speech recognition, Information Retrieval and machine translation.

References:


  • [Abusalah, 2008] Abusalah M., (2008). "Cross Language Information Retrieval Using Ontologies", PhD Thesis, University of Sunderland.
  • [McEnery & Oakes, 1996] McEnery T. and Oakes M. (1996). “Sentence and word alignment in the CRATER Project”. In: J. Thomas and M. Short (eds), Using Corpora for Language Research, Longman, London, Pages 211–231.
  • [McEnery & Wilson, 1996] McEnery T. and Wilson A. (1997). “Corpus Linguistics”. Edinburgh: Edinburgh University Press, ISBN 0-7486-0808-7.

Comments

Popular posts from this blog

Error: Write to Disk Access Denied - Troubleshooting - BitTorrent

I have downloaded Bit Torrent software and when trying to download I got an error after few seconds saying: Error: Write to Disk Access Denied Solving this problem is so simple: Shut down BitTorrent program. Go to Start and in the small search box on top of windows start button start typing Bittorrent and the program will show, right click with the mouse on the icon and Run as Administrator. All ur problems are sorted out now and you can enjoy downloading... Good Luck.

Coldfusion Facebook Graph API publish to your wall and your friends walls

In this tutorial we will learn by full coldfusion Graph API code example how to publish on your wall and your friends walls. This application uses new oauth authentication method. The code is divided into four files: we will first start with a file called index.cfm: <cfoutput>         <!--- Your FB application IDS --->       <cfset api_key = ""/>     <cfset secret_key = ""/>     <cfset appID = ""/>     <!--- create a connection to the fb graph cfc --->     <cfset graphCFC = createObject("component", "graph").init(#appID#, #api_key#, #secret_key#) />     <!--- If user is authenticated or his access token is set create a cookie --->        <cfif not isdefined("cookie.access_token") and isdefined("url.access_token")>         <cfset cookie.access_token=url.access_token>     </cfif>     <!--- If the user is authenticated and access token

IDLE subprocess didn't make connection Python 3.7

After installing Python on Windows 10. When trying to open IDLE, Python's IDE, you might get a  message saying that " IDLE's subprocess didn't make connection . Either IDLE can't start a subprocess or personal firewall software is blocking the connection ". To solve this issue just run IDLE as an Administrator, by right click on the IDLE icon and click Run as administrator. See Photo below: