dersblog

Kitap (epub, pdf) Icerigini Cikartmak

Herhangi bir epub, pdf bazli kitap, ya da makale icerigini Python ile metin (text) haline cevirmek icin once

pip install textract

Sonra

def bookextract(filein, fromperc, toperc, fileout): text = textract.process(filein,encoding='ascii') L = len(text) froml = int((L * fromperc) / 100.0) tol = int((L * toperc) / 100.0) t = str(text[froml:tol] ) fout = codecs.open (file_out,"w","utf-8") fout.write(t) fout.close()

book_extract(os.environ['HOME'] + "kitap.epub", 10, 11, "out.txt")

fromperc ve toperc yuzde degerleri.


Yukarı