expand and cleanup date validation routines
We used to have a long list of fields. Now we just iterate over the item, then the feed, and look for the fields we want. It's cleaner visually and might even make some feeds validate, as we now look...
View Articleuse dateparser module if available
This behaves better than the feedparser date parser in most scenarios. Still needs unit tests and dependency checks. Closes: #6
View Articlehandle broken pipe correctly from plugins
Before this, doing "feed2exec parse foo | head" would yield an error message for *every* feed item. This silences the warnings completely.
View Articleadd JSON output plugin
This plugin is very simple, if not trivial: it simply dumps all the feed items in a JSON stream. This can be parsed by `jq` on the commandline to diagnose feed problems, do scripting or whatever. This...
View Articleswitch to dateparser for PyPI and tests as well
Because the Debian package recommends dateparser, I had different results running tests natively on Debian and within tox. This harmonizes things and makes use of dateparser everywhere, warts and...
View Articlereduce noise level of 'missing time' problems
It seems like previous versions of feedparser would never trigger that problem, and would fill in the date instead. Now I have feeds that have this on *every* item and it generates a lot of noise in...
View Articlesplit large feeds.py into model.py and controller.py
This rearchitecture seems to make sense to me. I would like to keep those files smaller and that naming will force me to follow that model/controller distinction more clearly. Already, the parse/fetch...
View Articlerename FeedCacheStorage to FeedItemCacheStorage
This is, effectively, a per-item cache, not a full feed cache. We want to implement the latter as part of #10 so it makes sense to rename this first. This is an API breaking change.
View Articlefactor out getter/setters in the base sqlite class
This should pave the way for reusing this class in a caching backend like cachecontrol.
View Articleimplement thread-level locking
This is not absolutely necessary as we don't do thread-level parallelism. But if we every want to switch back to doing that, this is an elegant way of supporting that. Inspired by cachecontrol-sqlite.
View Articleenforce commit in context manager unless explicitely disabled
This makes sure we never, ever forget to commit unless we *explicitely* disable it. This is also inspired by cachecontrol-sqlite, except the latter uses False as a default for the autocommit, which...
View Articlefirst attempt at using cachecontrol, failing
It seems we need to provide the timestamp, and it doesn't store it in the database, so it doesn't send if-modified-since headers, so it fails. Maybe we are better off implementing this on our own?
View Articlefix broken cache adapter support
We did not need to pass the if-modified-since header. All that was needd was that we lookup (and return!) the cache value properly. So also remove that from the database. The way things were setup,...
View Articleforcibly preset the builtin feed session
Without this setting, the wrong session gets initialized in the new Feed object. Before the caching layer was implemented, this didn't matter much because those sessions were never called. But since...
View Articleinstall python3-dev, required for compiling regex
Not sure why all that junk is necessary, but I want to fix the build.
View Articleavoid newer feedparser versions
feedparser 6.0+ removed the FeedParserDict which we depend on: https://github.com/kurtmckee/feedparser/issues/197 Until we refactor the Feed class, stick with older versions of feedparser.
View Articlemove session and fetching to the feed manager
Having the session and the network code in the "model" makes no sense: that stuff belongs in the "controller". Having it there made it particularly difficult to implement the caching layer, as I had...
View Articleremove class-level sticky session parameter
This cleans up a lot of stuff. Now we can treat the session as a normal feed_manager parameter. Since there is usually only one feed_manager in operation at any time, it is basically a static member....
View Articlemake test suite pass again
This was failing because hooking up the cache into the session completely obliterates our poor old betamax cache. Instead of doing that, we politely queue the cache layer behind it... ... except that...
View Articlerename feeds to feed_manager in main
The "feeds" appelation is an old remnant of the previous data structures. Now we do use a FeedManager everywhere and we should name it as such. It will be easier to grep for it and will more obviously...
View Articlereuse feed_manager object in fetch as well
I see no reason why we need to construct a different object in this specific class, let's just reuse the one already created. This should make a tiny improvement on the startup latency, but hasn't...
View Article