tf-idf = term frequency x inverse document frequency
For example, if the word 'peace' appears 6 times in a document with 100 words the tf is 6/100 = 0.06. And if the corpus or document list contains 1000 documents and if 'peace' appears in 200 documents in the corpus then the idf is 1000/200 = 5. Hence the tf-idf for the word is 0.06 x 5 = 0.3. The weight or tf-idf is directly related to the importance word, i.e. if the tf-idf is higher then the importance of the word is high.
I've written a python module to extract the keywords from a given corpus. This is useful if you want to extract the keywords from a given website links and categorized them according to the keywords. You can use the code freely by downloading from the following Github location.
Complete sample of the usage can be found here: https://github.com/ludmal/pylib/blob/master/sample.py