Getting regular text from wikipedia page
I am trying to get the text or the summary text from a random wikipedia page, i need it, to be a list of lists of words (list of sentences) in the end. I am using the following code def get_random_pages_summary(pages = 0): import wikipedia page_names = [wikipedia.random(1) for i in range(pages)] return [[p,wikipedia.page(p).summary] for p in page_names] def text_to_list_of_words_without_new_line(text): t = text.replace("\n", " ").strip() t1 = t.split() t2 = ["".join(w) for w in t1] return t2 text = get_random_pages_summary(1) for i,row in enumerate(text): text[i] = text_to_list_of_words_without_new_line(row) print text I am getting weird tokens, i assume they are a relic of the markdown code for the wikipedia page e.g Russian:', u'\u0418\u0432\u0430\u043d I found that it is probably happening when there is a quote from another language inside the English page, it also happens when having a range of years in the page e.g 2015-2016 I would like to convert all of these to regular words, and remove those that i can not convert to regular words. Thanks.
forms in django, overriding validation on file upload to make sure just one value is there
How to animate font size without text reordering, when `text_size=self.size`
How to import the last row from array of excel files to another excel using openpyxl
how to terminate a thread which calls the webbrowser in python
Can I load HTML on Ghost.py
Combining two lists of names and sorting them to make one sorted list of names
Python equivalent of bash sort lexicographical and numerical
Why isn't my frames background showing?
Trying to do a natural join using python standard library
How to combine several querysets by key in common?
How I can speed up row column access to pandas dataframe?
Create list with combinations of 3 elements of other list with repetitions
PyQt5 does not change gifs
Pydub - combine split_on_silence with minimum length / file size
how to choose python version accordingly in pycharm?
Unable to import Flask to Kivy iOS