Customizing the Calibre Associated Press (AP) Recipe
I’ve been using Calibre to manage my eBook library and download daily news to my Kindle. It’s great. If you have a Kindle or other eBook reader, I recommend that you check it out.
Calibre comes stocked with a number of “recipes” for popular news sites. One of those is the Associated Press (AP). Since the AP creates a good deal of the news in many popular dailies, I’ve found the AP recipe most useful.
I noticed today, though, that the AP recipe that comes with Calibre is incomplete! It omits at least three sections, Sports, Business, and Entertainment. I don’t know if the creator, Kovid Goyal, just doesn’t like these sections, or if they weren’t available when he wrote the original recipe.
(When I wrote my custom recipe for the Minneapolis Star-Tribune, I omitted the sections that weren’t of interest to me.)
In any event, I’ve customized his recipe script to include the omitted sections. It was a very easy customization – I only had to add three lines to the original python script. The only tricky part was picking a local “news source” for the AP’s RSS feed – otherwise, the AP site serves up a random source, which can lead to problems with processing the page.
import re
from calibre.web.feeds.news import BasicNewsRecipe
class AssociatedPress(BasicNewsRecipe):
title = u'Associated Press'
description = 'Global news'
__author__ = 'Kovid Goyal'
use_embedded_content = False
language = 'en'
max_articles_per_feed = 15
html2lrf_options = ['--force-page-break-before-tag="chapter"']
preprocess_regexps = [ (re.compile(i[0], re.IGNORECASE | re.DOTALL), i[1]) for i in
[
(r'.*?' , lambda match : ''),
(r'.*?', lambda match : ''),
(r'.*?', lambda match : ''),
(r'.*?', lambda match : ''),
(r'.*?', lambda match : ''),
(r'
.*?
', lambda match : '
'),
(r'
', lambda match : '
'),
(r'Learn more about our Privacy Policy.*?', lambda match : '


Leave a Reply