I’ve been using Calibre to manage my eBook library and download daily news to my Kindle.   It’s great.  If you have a Kindle or other eBook reader, I recommend that you check it out.

Calibre comes stocked with a number of “recipes” for popular news sites.  One of those is the Associated Press (AP).  Since the AP creates a good deal of the news in many popular dailies,  I’ve found the AP recipe most useful.

I noticed today, though, that the AP recipe that comes with Calibre is incomplete!  It omits at least three sections, Sports, Business, and Entertainment.  I don’t know if the creator, Kovid Goyal,  just doesn’t like these sections, or if they weren’t available when he wrote the original recipe.

(When I wrote my custom recipe for the Minneapolis Star-Tribune, I omitted the sections that weren’t of interest to me.)

In any event, I’ve customized his recipe script to include the omitted sections.  It was a very easy customization – I only had to add three lines to the original python script.  The only tricky part was picking a local “news source” for the AP’s RSS feed – otherwise, the AP site serves up a random source, which can lead to problems with processing the page.

import re
from calibre.web.feeds.news import BasicNewsRecipe

class AssociatedPress(BasicNewsRecipe):

    title = u'Associated Press'
    description = 'Global news'
    __author__ = 'Kovid Goyal'
    use_embedded_content   = False
    language = 'en'

    max_articles_per_feed = 15
    html2lrf_options = ['--force-page-break-before-tag="chapter"']

    preprocess_regexps = [ (re.compile(i[0], re.IGNORECASE | re.DOTALL), i[1]) for i in
[
        (r'.*?' , lambda match : ''),
        (r'.*?', lambda match : ''),
        (r'.*?', lambda match : ''),
        (r'.*?', lambda match : ''),
        (r'.*?', lambda match : ''),
        (r'


.*?

', lambda match : '

'), (r'

', lambda match : ' '), (r'Learn more about our Privacy Policy.*?', lambda match : ''), ] ] feeds = [ ('AP Headlines', 'http://hosted.ap.org/lineups/TOPHEADS-rss_2.0.xml?SITE=ORAST&SECTION=HOME'), ('AP US News', 'http://hosted.ap.org/lineups/USHEADS-rss_2.0.xml?SITE=CAVIC&SECTION=HOME'), ('AP World News', 'http://hosted.ap.org/lineups/WORLDHEADS-rss_2.0.xml?SITE=SCAND&SECTION=HOME'), ('AP Political News', 'http://hosted.ap.org/lineups/POLITICSHEADS-rss_2.0.xml?SITE=ORMED&SECTION=HOME'), ('AP Business News', 'http://hosted.ap.org/lineups/BUSINESSHEADS-rss_2.0.xml?SITE=RANDOM&SECTION=HOME'), ('AP Technology News', 'http://hosted.ap.org/lineups/TECHHEADS-rss_2.0.xml?SITE=CTNHR&SECTION=HOME'), ('AP Sports News', 'http://hosted.ap.org/lineups/SPORTSHEADS-rss_2.0.xml?SITE=CAVIC&SECTION=HOME'), ('AP Business News', 'http://hosted.ap.org/lineups/BUSINESSHEADS-rss_2.0.xml?SITE=CAVIC&SECTION=HOME'), ('AP Entertainment News', 'http://hosted.ap.org/lineups/ENTERTAINMENT-rss_2.0.xml?SITE=CAVIC&SECTION=HOME'), ('AP Science News', 'http://hosted.ap.org/lineups/SCIENCEHEADS-rss_2.0.xml?SITE=OHCIN&SECTION=HOME'), ('AP Strange News', 'http://hosted.ap.org/lineups/STRANGEHEADS-rss_2.0.xml?SITE=WCNC&SECTION=HOME'), ]