Statistical Analysis of Texts
What the software is doingFirst of all, the entered text is cleaned. That means all the punctuation is removed, so that there are only words left. Then each word is listed. If a word is used several times, it shows how many times it is used.
You only need to type in your text for analysis into the form field and push the button. Then the text is analysed and the result is shown. The length of the text you can enter is without limit. Probably you will not type in the text but copy and paste from some other place into the field. Best you try it and you will see the result, that explains itself.
What can i do with the data?We have analyzed different texts form different categories like News, Literature, technical Texts and so on.
For example news from internet of an online news service.
We have analyzed each text, copied and pasted the table into an EXCEL Table and clean the data further as for example big or small writing, words that appear at a beginning of sentence are written first letter big, but is usually written in small letters and so on. If we had 2 versions of the same word we put it together in one basic version and deleted the other version, but not forgetting to add the number of occurrences in the table for the one we deleted.
With nouns we have only kept the singular form and deleted the plural. then we added the number of occurrences on the singular form accordingly.
To be short, we cleaned up the data in the EXCEL table manually.
To analyze a big number of texts you could analyze them one by one and insert them in one EXCEL sheet. There you would need to eliminate duplicates and add the number of occurrences at that first entry. You can easily see duplicates by sorting the data from A-Z. When you delete duplicates, don't forget to update the number of occurrence in the first one.
To analyze 10'000 or even 50'000 words with this method is very time consuming. A faster and easier method would be to first collect all the texts you want to analyze and then collect them into one big text file. Copy and paste that one text file into our software and let it process through.
Only then you copy and paste the listing of the analysis into EXCEL for further manual cleanup.
For a text of 46'560 words the program used about 8 seconds. This depends on server load and internet connection quality.
For us the analysis of texts to sort out the most common (most used) words is a research we do for an educational project about learning the most important words of a foreign language first. The "1000 Words Project".
Number of words (words that occur several times are counted too)The text contains 0 words
Unsorted listing of all words together with their number of occurancy in the text
Back to previous page