若忘雜文: 2015

8/19/2015

將中文電子書 (epub 格式) 編排成豎版

/* 豎體排版 */

html {

writing-mode: vertical-rl;

-webkit-writing-mode: vertical-rl;

-epub-writing-mode: vertical-rl;

-epub-line-break: strict;

line-break: strict;

-epub-word-break: normal;

word-break: normal;

}

② body中下邊距縮進一個字元，避免行尾最後的一個標點會被“吃掉”（即會無法顯示）

body {

         word-break: normal;

         margin: 0;

         margin-left: 0%;

         margin-right: 0%;

         margin-top: 1.5%;

         margin-bottom: 1em;

         text-align: left;

③ 製作電子書目錄，可以通過編輯TOC的ncx來更改目錄資訊，並生成html目錄，也可以連結到CSS樣式表中。

I. 元數據編輯器中，語言一欄可以是「中文」或者「中文-臺灣」。這裡有一些特別不同之處。

如果是使用預設字體不添加自訂字體時，豎排電子書建議將語言設定為「中文-臺灣」，在content.opf檔中，屬性為zh-TW。這樣做的好處是，橫排標點會自動轉換為豎排標點，無需要將原始檔中的橫排標點進行轉換，並且標點會自動置中，看起來比較美觀。但也有一個缺點，在Kindle閱讀器上，只會顯示「宋體」和「黑體」兩種字體選項，也不能使用自訂字體樣式。

II. 如果需要添加添加自訂字體時，可以設定為語言設定為「中文」，content.opf檔中屬性為zh，好處是完整利用到Kindle閱讀器的四種字體和自訂字體，缺點是需要先行轉換橫排標點為豎排，且標點可能無法居中對齊。

III. 模仿古籍書從右往左翻，在content.opf 檔案中編輯，，加上page-progression-direction="rtl"的內容，變成。

⑤ 添加KindleGen轉換時識別豎排的屬性。在content.opf文件中添加，,

支援日語垂直渲染和多種頁面書寫模式。可以使用元標籤，在 OPF 檔中指定值（例如：）。有效值有 horizontal-lr、vertical-lr、horizontal-rl、vertical-rl。

Enabled support for JP vertical rendering and multiple writing modes. Values can be specified in OPF files using meta tags (ex: ). The valid values are horizontal-lr, vertical-lr, horizontal-rl, vertical-rl.

並且難以用CSS來定義邊距。使用了這個屬性後，文檔左右邊距會使用預設值，上下邊距可以自行調整（CSS屬性:margin-top，margin-bottom）。

⑥ 檢查EPUB檔有沒有報錯，如果有錯，可以將錯誤的網頁另存為xhtml,上傳到 The W3C Markup Validation Service 進行驗證和修改。

⑦ 將EPUB檔拖到Kindle Previewer進行轉換和查看效果，如果有不滿意的地方，再回到EPUB當中修改。

⑧ 正式輸出MOBI文檔。方法有兩種：

I. 通過命令列視窗使用KindleGen加上相關參數轉換成MOBI檔，或者通過Kindle Previewer得出轉換的檔（但檔比較大，kindlegen 默認會把轉換原始檔案即epub檔添加到生成的 mobi 裡面）。

II. 使用sigil外掛程式kindlegen，把參數設定好（外掛程式設置C:\Users\用戶名\AppData\Local\sigil- ebook\sigil\plugins\kindlegen\kindlegen.ini），就能得到比較滿意的AZW3文件。

⑨ 最後可以把製作完成的AZW3檔放到Kindle閱讀器上，與原書或PDF進一步查看效果和校對有關的錯誤。

7/21/2015

將網頁或PDF檔案裡的中文字片段抽出後，利用Calibre合成正確的段落

在Calibre 中開啟文字檔案。
將文字檔案內容 import 後轉成 epub格式檔案。

Edit book in Calibre.
1. 在 Edit book 中啟動 "Marked text" option。
2. 使用 mouse，配合 Ctrl+Shift+M 將欲處理的文字標示起來。
3. 使用 Calibre 中的 regular expression (Regex) option，作 search & replace。

Regular expression routine
Find: (.*)([^。|^！|^”|^：]{1})\n\n
Replace: \1\2

說明
若某一文字片段在
之前不為「。」或不為｢！」，或不為｢”」或不為｢：」則去掉後面的\n等字元。\n: 換行的識別字元 (i.e., newline)

Click on "Find" box.
Click on "Replace all" box. All the patterns in the marked text which match the regular expression described above will be processed.
Repeat the above procedure several cycles, until the search routine does not find more contents to be replaced.

3/10/2015

Using Regex search & replacement in Sigil

To make "Chapter XY" text(s) with ordinary format to that with level-3 heading in Sigil, do the following Regex find-and-replace:

Find: Chapter (\d+)
Replace: Chapter \1

\d tells Sigil to find a digit.
\d+ tells Sigil to find one or more digits.
() groups the words together for later retrieval.
\1 is used in Replace to retrieve the value of a saved group. (Use \2 for the second group, etc.)

2/24/2015

讓 Calibre 認識中文的章節分段 (用於產生 ePub 電子書的TOC)

下列 XPath 表示式讓 Calibre 在 import .txt 或 .pdb 電子檔時，可以分析其結構而自動產生 TOC (Table of Contents)

//h:h2[re:test(., "第(零|一|二|三|四|五|六|七|八|九|十|百|[0-9])+(章)\s", "i")]

1/23/2015

將簡體的電子書轉為繁體

利用 Calibre 將 .mobi, .azw, azw3 等格式的電子書檔案轉為 .epub 格式。通常中國的電子書在 Windows 電腦系統中的書名亦為 UTF-8 編碼的簡體字。
Download 到 local 電腦之後，此時應先利用 ConvertZ.exe 轉為 UTF-8 編碼的繁體漢字。
當文件的名稱都轉為繁體字碼之後，就可利用 epub Converter 1.5 (by Denny Su) 將文件的內容從簡體字轉換為繁體字了。