collocations package:none R Documentation Extract collocations for a target word from a given raw text. Description: collocations receives a text and a target word and select the senteces from the text which contain the target word. From those sentences, the co-occurrences between target word and the other words which are above a certain threshold will constitue the set of collocations. Usage: collocates(thetext, targetword, ncollmax) Arguments: thetext character. Text given by the user in .txt format and UTF-8 encoding. targetword character. Any word the user has chosen from the text. It will the reference for the extraction of the collocations. ncollmax numeric. Maximum number of collocates to be displayed on the graph generated by the function. In case the number of extracted collocates is less than the sitpulated maximum, then ncollmax will be ignored. Details: The function may not work well depending on the size of the text file given even though some optimizations were tried such as using environments hash to count faster the words' occurrences. Value: Instead of returning values, collocates generates one text file and another file for a barplot in png format. Both are saved in the workspace being used to run the function. Warning: Depending on the size of the text file, the function may get too slow or not work. As a suggestion, the usar can exeriment the function with different text sizes. See Examples for a simple teste of the function. Author: Viviane Santos da Silva viviane.sds90@gmail.com viviane.santos.silva@usp.br References: http://en.wikibooks.org/wiki/R_Programming/Text_Processing Last access in may 18th 2014. About environments and hash argument: http://adv-r.had.co.nz/Environments.html (There has been created a hash function to optimize the use of hashes, but it only works for later versions of R. Read "See Also") Download of non-annotaded corpora for testing the function: http://corpora.informatik.uni-leipzig.de/download.html Last access in may 15th 2014. To understand a little bit more about collocations in a more intuitive way: http://esl.fis.edu/grammar/easy/colloc.htm See Also: For more information on hash usage in R, see: http://cran.r-project.org/web/packages/hash/index.html, http://cran.r-project.org/web/packages/hash/hash.pdf and http://opendatagroup.wordpress.com/2009/07/26/hash-package-for-r/. Examples: # Download the file "teste-texto-bbc.txt" in (http://ecologia.ib.usp.br/bie5782/doku.php?id=bie5782:01_curso_atual:alunos:trabalho_final:viviane.santos.silva:start) and save it to your R workspace to run this example. collocates(thetext="test-text-bbc.txt", targetword="fiction", ncollmax=10) # generates a barplot for the 10 first collocates which co-occur with the target word "film" in the text given.