PyCon-Tech (the python behind pycon) is an open source project for the management of community run conferences. It is actually a framework for developing conference websites, and doing collaborative management, written on top of the django web framework. The project is broken into multiple application, most of which can be used independently. The issue is, not many people even know this resource exists.
One of the stated goals of PyCon-Tech is to give back to the python community which makes the conference possible. This series is my attempt to shine a light on some of the general use applications under the hood and how you can use them for other projects. The first app in this series is the feedutil, a lightweight generic RSS/Atom feed pull. (it is also the only app in PyCon-Tech being used for other projects that I know of).
Overview
Feedutil is a lightweight app for pulling RSS/Atom feeds onto your site with django template tags, or custom views. This is not a full feed aggregator like feedjack, but you could write one with it. This is more along the lines of the blogger plugin which allows you to have the latest 5 entries from an RSS feed appear on your sidebar. We use it on the PyCon website for the main about page which has summaries of the latest PyCon Blog entries via Atom, and on an organizer page which replicates a Trac RSS issue feed for open website bugs. This does not use any django models, and there is no database interaction. You could create your own models for managing your feeds, but that is not the purpose of feedutil.
The feedutil provides two primary template tags {% feed feed_url [posts_to_show] [cache_expires] %} and {% get_feed feed_url [posts_to_show] [cache_expires] as var %}. There is also a higher level interface to feedparser which includes caching pull_feed(feed_url, posts_to_show=None, cache_expires=None) => posts_dict.
Requirements
Before we go any further, there are some requirements for feedutil:
- Python2.5 (sorry, I like typing ‘res = x if test else y’)
- django 0.96 or greater
- a properly configured django cache backend (not really needed, but highly recommended)
- feedparser (feedparser.py needs to be on your python path)
Settings
There are some configuration settings you can set in your settings.py (plah). They all have reasonable? defaults:
- FEEDUTIL_NUM_POSTS (default: -1)
Max number of entries to pull from the feed. Use -1 to pull all entries.
- FEEDUTIL_CACHE_MIN (default: 30)
Number of minutes to hold the feed in cache before polling the feed again. this is to set the default cache limit. each feed can have their cache time/poll frequency set independently. Use 0 to never cache. Cache backend must be set for non-sero value.
- FEEDUTIL_SUMMARY_LEN (default: 150)
Each post gets a html-stripped text ’summary’, and this is character limit on those summaries. A ‘…’ will be appended if the summary does not fit in this limit. This uses the django template filter striptags to remove the html tags before applying the character limit.
- FEEDUTIL_SUMMARY_HTML_WORDS (default: 25)
Each post gets a ’summary_html’ entry which preserves the html tags, but limits the number of words in the rendered html output. this uses the django template filter truncatewords_html to get the work done.
The {% feed %} tag
{% feed feed_url [posts_to_show] [cache_expires] %}
{% feed “http://www.dougma.com/feed/atom/” 1 30 %}
- feed_url - literal string or variable which resolves to a valid feed url (required)
- posts_to_show - number of entries you want shown (default: FEEDUTIL_NUM_POSTS)
- cache_expires - #min before the cache expires. (default: FEEDUTIL_CACHE_MIN)
{% load feedutil %}
{% feed "http://www.dougma.com/feed/atom/" %}
Yup, that is all you need to reproduce a feed on a website. Its minimal, but it works. The default html template used to render this tag will look something like the following per feed entry:
<p class="feed-post"> <h2><a href="http://www.dougma.com/archives/47" title="HELP!">HELP!</a></h2> <cite>doug</cite> ... post content omitted.... <h3>Published: Thu, 20 Sep 2007 08:57:05 -0500</h3>
This is using a django inclusion tag, and the default template ‘feedutil/feed.html’. You can override this to make a new custom default, or you can pull in the feed as a veriable into your django template and have per-feed customization (next section). There are optional arguments to the {% feed %} tag, ‘posts_to_show’, and ‘cache_expires’. These have the obvious effect.
The {% get_feed %} tag
{% get_feed feed_url [posts_to_show] [cache_expires] as var_name %}
{% get_feed “http://www.dougma.com/feed/atom/” 5 30 as posts %}
- feed_url - literal string or variable which resolves to a valid feed url (required)
- posts_to_show - number of entries you want shown (default: FEEDUTIL_NUM_POSTS)
- cache_expires - #min before the cache expires. (default: FEEDUTIL_CACHE_MIN)
- var_name - name of the variable to assign the post list to (required)
The variable will be assigned a list of dictionaries, each entry in the feed being a dict with the following keys:
- title - the post title text.
- url - the url for the entry.
- author - the author (if available)
- published - datetime object for the publish date
- summary -Text summary generated from the post content with tags removed, and limited to FEEDUTIL_SUMMARY_LEN characters. (see settings above)
- summary_html - The content html limited to FEEDUTIL_SUMMARY_HTML_WORDS words. (see settings above)
- content - the content of the entry.
- comments - Sometimes a link to the comments section, sometimes the number of comments as a string, sometimes the actual html for comments. and sometimes something weird. All depends on what is generating the feed., and whether or not it conforms to the standards.
With this you can create any custom html you need for your feed reproduction. Let’s look at the template code for the actual pycon 2008 website.
{% load feedutil %}{% load appmedia %}
{% get_feed "http://pycon.blogspot.com/feeds/posts/default" 5 as news %}
<div id="lfcolumn">
<h3>Latest News
(<a href="http://pycon.blogspot.com/">PyCon Blog</a>)
<a href="http://pycon.blogspot.com/feeds/posts/default">
<img src="{% app_media_prefix 'website' %}img/rss.png"
alt="Subscribe to the RSS Feed" />
</a>
</h3>
<ul id="feed_entries">
{% for entry in news %}
<li> <a href="{{ entry.url }}" title="{{ entry.title|escape }}">
{{ entry.title|escape }}</a>
<br/> {{ entry.published|date }}
<br/> <em>{{entry.summary}}</em>
</li>
{% endfor %}
</ul>
</div>
The pull_feed() function
pull_feed(feed_url, posts_to_show=None, cache_expires=None) => list of entry dicts
This is the worker function behind feedutil. You can use this in your django python code to grab the feed as a list of python dictionaries with a common interface across RSS, RSS2, and Atom feeds. You have the added benefits of the summary, summary_html, and caching. Simple, elegant, and gets the job done. This function is most often used in custom views where you want to merge the data from multiple fields and sort them by the ‘published’ datetime object. This is a simple way to create a feed aggregator.
def aggregator_view(request, myfeeds):
"""myfeeds must ba a dictionary of {'title': 'http://feed/url/'}"""
feeds = [pull_feed(url) for url in myfeeds.itervalues()]
entries = sorted(chain(*feeds), key="published", reversed=True)
return render_to_response('feeds.html', { 'feeds': myfeeds, 'entries': entries},
context_instance=RequestContext(request))
There, a feed aggregator in 5 sloc, not counting imports and actual feed data, and well the html ![]()
