Feedparser

Parse Atom and RSS feeds in Python.

Install⚑

pip install feedparser

Basic Usage⚑

Parsing content⚑

Parse a feed from a remote URL ⚑

>>> import feedparser
>>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml')
>>> d['feed']['title']
u'Sample Feed'

Parse a feed from a string ⚑

>>> import feedparser
>>> rawdata = """<rss version="2.0">
<channel>
<title>Sample Feed</title>
</channel>
</rss>"""
>>> d = feedparser.parse(rawdata)
>>> d['feed']['title']
u'Sample Feed'

Access common elements⚑

The most commonly used elements in RSS feeds (regardless of version) are title, link, description, publication date, and entry ID.

Channel elements ⚑

>>> d.feed.title
u'Sample Feed'
>>> d.feed.link
u'http://example.org/'
>>> d.feed.description
u'For documentation <em>only</em>'
>>> d.feed.published
u'Sat, 07 Sep 2002 00:00:01 GMT'
>>> d.feed.published_parsed
(2002, 9, 7, 0, 0, 1, 5, 250, 0)

All parsed dates can be converted to datetime with the following snippet:

from time import mktime
from datetime import datetime
dt = datetime.fromtimestamp(mktime(item['updated_parsed']))

Item elements ⚑

>>> d.entries[0].title
u'First item title'
>>> d.entries[0].link
u'http://example.org/item/1'
>>> d.entries[0].description
u'Watch out for <span>nasty tricks</span>'
>>> d.entries[0].published
u'Thu, 05 Sep 2002 00:00:01 GMT'
>>> d.entries[0].published_parsed
(2002, 9, 5, 0, 0, 1, 3, 248, 0)
>>> d.entries[0].id
u'http://example.org/guid/1'

An RSS feed can specify a small image which some aggregators display as a logo.

>>> d.feed.image
{'title': u'Example banner',
'href': u'http://example.org/banner.png',
'width': 80,
'height': 15,
'link': u'http://example.org/'}

Feeds and entries can be assigned to multiple categories, and in some versions of RSS, categories can be associated with a “domain”.

>>> d.feed.categories
[(u'Syndic8', u'1024'),
(u'dmoz', 'Top/Society/People/Personal_Homepages/P/')]

As feeds in the real world may be missing some elements, you may want to test for the existence of an element before getting its value.

>>> import feedparser
>>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml')
>>> 'title' in d.feed
True
>>> 'ttl' in d.feed
False
>>> d.feed.get('title', 'No title')
u'Sample feed'
>>> d.feed.get('ttl', 60)
60

Advanced usage⚑

It is possible to interact with feeds that are protected with credentials.

Issues⚑

Deprecation warning when using updated_parsed, once solved tweak the airss/adapters/extractor.py#RSS.get at updated_at.

Links⚑

Git
Docs