Kitap (epub, pdf) Icerigini Cikartmak
Herhangi bir epub, pdf bazli kitap, ya da makale icerigini Python ile metin (text) haline cevirmek icin once
pip install textract
Sonra
def bookextract(filein, fromperc, toperc, fileout): text = textract.process(filein,encoding='ascii') L = len(text) froml = int((L * fromperc) / 100.0) tol = int((L * toperc) / 100.0) t = str(text[froml:tol] ) fout = codecs.open (file_out,"w","utf-8") fout.write(t) fout.close()
book_extract(os.environ['HOME'] + "kitap.epub", 10, 11, "out.txt")
fromperc ve toperc yuzde degerleri.
Yukarı