Convert a webpage to EPUB
Use dotEPUB
Convert multiple web pages to EPUB
ⓘ Percollate is used instead of pandoc because pandoc always combines all input files before converting. In particular, it seems incapable of generating a proper EPUB table of contents from multiple files.
-
Save each web page
👉 Don’t bother formatting the web page before saving it; when you save it, it will always be saved with the original format
-
Combine the pages using percollate, e.g.
percollate -o book.epub --title Title *.html
Encoding is wrong
If the final EPUB has encoding errors:
-
Edit the source HTML file
-
Add this line under the
html
head
element:<meta charset="UTF-8" />
Page content is missing
-
Open the page in the browser
-
Open the dev tools and find the div with the page content
-
Right-click the div and copy the HTML
-
Paste it into a new HTML file and use that instead
Update source reference in EPUB
Use --url
, e.g.
percollate epub -o book.epub page1.html --url http://example.org/page1.html page2.html --url http://example.org/page2.html
Convert MOBI to EPUB
-
Install and open Calibre
-
Drag the file into Calibre to add it
-
Right-click the book in the list of books > Convert books > Convert individually
-
In the Metadata section adjust the title and author as desired
-
Change the settings in EPUB Output as desired
-
Under Page setup set Output profile to Tablet (this will disable resizing of images)
-
Click OK
-
The generated EPUB will be located in your home folder/Calibre Library/author/title
Convert document (.doc, .docx, .odt, .rtf) to EPUB
-
Using Calibre
This will preserve most formatting (text styling, line spacing, font, text size, line justification).
-
Install and open Calibre
-
If it’s not a .docx file, open it in LibreOffice and save it as a .docx file first
- .doc files must be converted because Calibre doesn’t support them
- .odt files are supported by Calibre but converting to .docx first seems to give more consistent formatting
-
(Optional) Edit the paragraphs so there’s spacing between them. In LibreOffice:
- Edit > Select All
- Format > Paragraph
- Spacing > Above paragraph > 0.50 cm (or 0.19”)
- Spacing > Below paragraph > 0.50 cm (or 0.19”) > OK
-
Add books > browse to the .docx or .rtf file > Open
-
Right-click the document in the list of books > Convert books > Convert individually
-
In the Metadata section adjust the title and author as desired
-
If you don’t want an autogenerated cover, go to the EPUB Output section and check No default cover
-
Click OK
-
Once the conversion is finished, double-click the entry in Calibre to preview it with the built-in EPUB reader
-
The generated EPUB will be located in your home folder/Calibre Library/author/title
-
-
Using pandoc
This will preserve basic text styling (bold, italics, etc) but lose most other formatting.
pandoc -o output.epub input.docx
To set the title/author:
pandoc -o output.epub input.docx -M author="Fartrell Cluggins" -M title="The Greatest Book Ever Written"
Convert PDF to EPUB
-
Using Calibre
-
Install and open Calibre
-
Drag the file into Calibre to add it
-
Right-click the PDF in the list of books > Convert books > Convert individually
-
In the Metadata section adjust the title and author as desired
-
Change the settings in PDF Input and EPUB Output as desired
-
Click OK
-
The generated EPUB will be located in your home folder/Calibre Library/author/title
-
-
Using pandoc
pdftohtml -noframes input.pdf pandoc -o output.epub input.html
To combine multiple PDFs:
pdftohtml -noframes input1.pdf pdftohtml -noframes input2.pdf pandoc -o output.epub input1.html input2.html