Articles - R

Cr?er un nuage de mots avec R (wordcloud)

  |   6538  |  R


Cet article vous pr?sentera une application simple pour transformer un texte en un joli nuage de mots ou wordcloud (en anglais) en utilisant le logiciel R. Le but est de trouver le mot le plus fr?quent dans le texte.




Par exemple, supposons que nous voulions savoir, les mots les plus fr?quents dans "I have a dream" de Martin Luther King :


Code TEXT :
 
And so even though we face the difficulties of today and tomorrow, I still have a dream. It is a dream deeply rooted in the American dream.
 
I have a dream that one day this nation will rise up and live out the true meaning of its creed:
 
We hold these truths to be self-evident, that all men are created equal.
 
I have a dream that one day on the red hills of Georgia, the sons of former slaves and the sons of former slave owners will be able to sit down together at the table of brotherhood.
 
I have a dream that one day even the state of Mississippi, a state sweltering with the heat of injustice, sweltering with the heat of oppression, will be transformed into an oasis of freedom and justice.
 
I have a dream that my four little children will one day live in a nation where they will not be judged by the color of their skin but by the content of their character.
 
I have a dream today!
 
I have a dream that one day, down in Alabama, with its vicious racists, with its governor having his lips dripping with the words of interposition and nullification ? one day right there in Alabama little black boys and black girls will be able to join hands with little white boys and white girls as sisters and brothers.
 
I have a dream today!
 
I have a dream that one day every valley shall be exalted, and every hill and mountain shall be made low, the rough places will be made plain, and the crooked places will be made straight; and the glory of the Lord shall be revealed and all flesh shall see it together.
 
This is our hope, and this is the faith that I go back to the South with.
 
With this faith, we will be able to hew out of the mountain of despair a stone of hope. With this faith, we will be able to transform the jangling discords of our nation into a beautiful symphony of brotherhood. With this faith, we will be able to work together, to pray together, to struggle together, to go to jail together, to stand up for freedom together, knowing that we will be free one day.
 
And this will be the day ? this will be the day when all of God?s children will be able to sing with new meaning:
 
My country ?tis of thee, sweet land of liberty, of thee I sing.
Land where my fathers died, land of the Pilgrim?s pride,
From every mountainside, let freedom ring!
And if America is to be a great nation, this must become true.
And so let freedom ring from the prodigious hilltops of New Hampshire.
Let freedom ring from the mighty mountains of New York.
Let freedom ring from the heightening Alleghenies of Pennsylvania.
Let freedom ring from the snow-capped Rockies of Colorado.
Let freedom ring from the curvaceous slopes of California.
 
But not only that:
Let freedom ring from Stone Mountain of Georgia.
Let freedom ring from Lookout Mountain of Tennessee.
Let freedom ring from every hill and molehill of Mississippi.
From every mountainside, let freedom ring.
And when this happens, when we allow freedom ring, when we let it ring from every village and every hamlet, from every state and every city, we will be able to speed up that day when all of God?s children, black men and white men, Jews and Gentiles, Protestants and Catholics, will be able to join hands and sing in the words of the old Negro spiritual:
Free at last! Free at last!
 
Thank God Almighty, we are free at last! 
 


Il suffit juste de copier le texte ci-dessus et de le coller dans la zone de texte en cliquant ici puis cliquez sur le bouton 'envoyer' pour voir l'image.

Pour acc?der ? l'application cliquez-ici


La fonction utilis?e est montr?e ci-dessous.
Elle est t?l?chargeable en cliquant ici : wordcloud.r


Code R :
 
#-------------------------------
#wordcloud
#----------------------------------
#min.freq=2# minimum frequency of word to kept
#min.length=4 #minimum length of word to kept
sthda.wordcloud<-function(text, min.freq=2, min.length=4)
{
require(wordcloud)
require(RColorBrewer)
# special chars we want to delete
sent=c(",", "\\.", ";", "=", ":", "\\?", "!", "-", "\\(", "\\)", "\\*", "&", "%", "$", "\\+","\"", "'", "'", "<", ">", "\\[", "\\]", "\\{", "\\}", "\\/", "\\\\")
# and of course delet HTML tags
tags=c("a", "b", "br", "strong", "em", "i", "p", "more", "td", "table", "tr", "th", "script", "h1", "h2", "h3", "h4", "h5", "h6", "div", "span", "small","img")
tags=paste("</?", tags, "[^>]*>", sep="")
# combine all purge-regex'
repl=c(tags,  sent)
 
#combine them in a text
text=paste(as.matrix(text), collapse=" ")
 # replace all unwanted stuff
 tmp=sapply(repl, function (r) text<<-gsub(r, " ", text))
 # here are our words:
 words=table(strsplit(tolower(text), "\\s+"))
 # remove words with _bad_ chars (non utf-8 stuff)
 words=words[nchar(names(words), "c")==nchar(names(words), "b")]
 # remove words shorter then 4 chars
 words=words[nchar(names(words), "c")>=min.length]
 # remove words accuring less than min.freq =2 times
 words=words[words>min.freq]
 
 # create the image
 #png("cloud.png", width=580, height=580)
 pal2 <- brewer.pal(8,"Dark2")
 #wordcloud(names(words), words, scale=c(9,.1),min.freq=min.freq, max.words=Inf, random.order=F, rot.per=.3, colors=pal2)
wordcloud(names(words), words, scale=c(4, 0.5), min.freq=min.freq, max.words=Inf,random.order=F, rot.per=.2, colors=pal2)
#dev.off()
}
 



source
http://www.r-bloggers.com/wordpress-wordcloud-with-r/

http://www.r-bloggers.com/text-mining-to-word-cloud-app-with-r/

Licence - Pas d?Utilisation Commerciale - Partage dans les M?mes Conditions
Licence Creative Commons