Skip to main content

What is Corpus


Join our Corpora Group on LinkedIn

Corpus:

“Corpus” is a Latin word meaning “body” [McEnery & Wilson, 1996]; hence a text corpus is any discrete body of text. The term “corpus” used in computing means a mass of electronic text, readable by machines. Corpora (the plural form of corpus) could be any form of media, such as text, speech or microfilms.

Corpora are the knowledge base used in corpus linguistics, to analyze and study language. The linguistic processing of corpora is called annotation where tools like part of speech tagging, stemming or lemmatization are applied. Annotation also includes reformulating corpora into new linguistic forms [McEnery & Oakes, 1996].

Corpora applications are used in variety fields including computational linguistics, speech recognition, Information Retrieval and machine translation.

References:


  • [Abusalah, 2008] Abusalah M., (2008). "Cross Language Information Retrieval Using Ontologies", PhD Thesis, University of Sunderland.
  • [McEnery & Oakes, 1996] McEnery T. and Oakes M. (1996). “Sentence and word alignment in the CRATER Project”. In: J. Thomas and M. Short (eds), Using Corpora for Language Research, Longman, London, Pages 211–231.
  • [McEnery & Wilson, 1996] McEnery T. and Wilson A. (1997). “Corpus Linguistics”. Edinburgh: Edinburgh University Press, ISBN 0-7486-0808-7.

Comments

Popular posts from this blog

Error: Write to Disk Access Denied - Troubleshooting - BitTorrent

I have downloaded Bit Torrent software and when trying to download I got an error after few seconds saying:

Error: Write to Disk Access Denied
Solving this problem is so simple:
Shut down BitTorrent program.
Go to Start and in the small search box on top of windows start button start typing Bittorrent and the program will show, right click with the mouse on the icon and Run as Administrator.
All ur problems are sorted out now and you can enjoy downloading...
Good Luck.

Importing SSL certs to Coldfusion Railo's keystore

If you are having the below error:
Railo 3.3.4.003 Error (javax.net.ssl.SSLHandshakeException)Messagesun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested targetCausejavax.net.ssl.SSLHandshakeException

This means you are trying to invoke an https webservice. to invoke an SSL webserver you need to import the certificate into Railo by using keytool command. Below are the steps on how to do this:

1- use fire fox to open the webservice and click on the padlock as shown on the below image:

2-Click on more information as below:
 3- Click on View Certificate as below:
4-Click on details as below:
 5- Now export the certificate and save it to your computer with .cer extension.

6- Search in railo folder for the keytool command location, in my case it was under railo\jdk\bin

7-  Search in railo folder for the cacerts location, in my case it was under railo\lib\railo-s…

Publish to Wordpress using JSON API plugin and Coldfusion

I was using Postie to automatically publish to wordpress blog. I was searching a better method of publishing using soap api or json api. I found an excellent json plugin that can be used to publish posts and comments to wordpress.
Please follow the below step by step, this code will let you publish automatically for froma different interface to wordpress rather than using postie or email.

Step one: Install json plugin, enable the plugin and then go to settings > JSON API and  activate post

Step two: you have to modify the file: yourwordpressdirectory/wp-content/plugins/json-api/controllers/post.php

Please replace the file content with the below or add the missing Authenticate method. This method is necessary to authenticate a user to wordpress to be able to post content without a valid session.


<?php /* Controller name: Posts Controller description: Data manipulation methods for posts */ classJSON_API_Posts_Controller{ publicfunctioncreate_post(){ global$json_api; $this->authenticate()…