If anyone wants, I can post my hacked-together Ruby script that will download your feed and all the associated entries and images.
Josh Haley,
Eric @ CS Techcast,
Bruce Lewis,
Steve C, Team Marina,
Stephen Mack,
Micah,
Benny Bucko (Josh),
Meg VMeg,
Sig. N,
Osiride,
((°},
ufuk,
Big Joe Silenced,
Maitani,
fewdayslate,
Jenny H.,
Spidra Webster,
Jennifer Dittrich,
Andrew C (see frenf.it),
Nils Sandin,
Kobayashi,
t-ra: sentimental gorgon,
John (bird whisperer),
Stephan Planken,
and
Greg GuitarBuster
liked this
It is pretty rough.
- Victor Ganata
So far known bug: I think any entry where the post text is just "." will result in weirdness
- Victor Ganata
I'd like to see it, Victor.
- Micah
It depends on the nokogiri gem.
- Victor Ganata
OK, I will find somewhere to upload it to in a bit.
- Victor Ganata
Cool. Thanks.
- Micah
There is a superfluous "Â" in line 29 and in line 285. It should be "Older items »" not "Older items »". This won't matter for the main feeds, but it will break things when trying to download comment feeds.
- Victor Ganata
Ooooh, sweet.
- Jennifer Dittrich
I've never done anything with a Ruby script. Are there newbie instructions, or is it the kind of thing you need to know how to use so I shouldn't even try? :-)
- bentley
I guess it depends on how adventurous you are. Right now it's just a command line script, and as I've just discovered, it can be messy trying to install ruby. And this is in OS X. I wouldn't know where to start in Windows.
- Victor Ganata
I'm on a Mac but I know nothing about Ruby, so I think I'll pass for now. Thanks for writing it, though.
- bentley
I installed ruby and I think I figured out how to install Nokogiri too ("gem install nokogiri" should "just work" according to the nokogiri page) but I'm running into problems when I run the script: "C:/Ruby22/lib/ruby/2.2.0/rubygems/core_ext/kernel_require.rb:54:in `require': cannot load such file -- nokogiri/nokogiri (LoadError)" =P
- Andrew C (see frenf.it)
It's a Windows incompatibilty thing http://stackoverflow.com/a...
- Micah
Ah, so install an older version of Ruby?
- Andrew C (see frenf.it)
That may work.
- Micah
I just posted the most recent version of the script at the same URL http://fatoprofugus.net/testing... — most of the updates are for rewriting URLs to be more self-contained on a local file system, although all those ./../../ paths would probably be really nasty on an actual webserver….
- Victor Ganata
I wrote another version of the script that downloads entries through the API just for the sake of completeness because I realize that there's a bunch of metadata that gets lost when you just scrape the HTML. http://fatoprofugus.net/testing... (Entries are stored as XML only because I know how to work with XML but have no idea how to work with JSON.)
- Victor Ganata
Does that second version go past the 10K limit? (Does Claudio's now? I haven't tried any exporters since the first week post-announcement.)
- Andrew C (see frenf.it)
The API itself seems hard coded to 10,000 entries max so I still have to use the HTML feed directly to get the entry IDs
- Victor Ganata
Oh, so you grab the HTML feed to get the entry IDs, then download the individual entries via the API? Clever.
- Andrew C (see frenf.it)
How hard would it be to change the 10K limit in the FF API?
- Greg GuitarBuster
Maybe not that hard for someone who has access to the source code on the server.
- Victor Ganata
Updated the script that downloads entries through the API so that it skips already downloaded entries faster (ffbackup 2015.04.02) http://fatoprofugus.net/testing...
- Victor Ganata
Is it... out of beta? =)
- Andrew C (see frenf.it)
lol
- Victor Ganata
Downloading entries via the API seems much slower, it's been almost three hours and I've only gotten to late 2013.
- Victor Ganata
Sheesh, still in early 2013.
- Victor Ganata
Is it looping, or just taking that long? (I'm tempted to get this running at home tonight and just letting it run at home until it finishes.)
- Jennifer Dittrich
I think it's just taking a long time because requests through the API are slower. I've already crossed the 10k threshold and am around 12k http://friendfeed.com/aswang...
- Victor Ganata
6 hours out and still in mid 2012 :D
- Victor Ganata
8 hours out—early 2012 >.<
- Victor Ganata
Sounds like for heavy users, we may want to get started at least six days before the shutdown...
- Andrew C (see frenf.it)
Is this the only script that goes past the 10,000 post mark?
- Eric @ CS Techcast
Nokogiri runs on Ruby 2.0.0. But now I'm getting some kind of SSL problem: "certificate verify failed". (for the API version. In the meantime, I'm running scrape-ff, which seems to be working.)
- Andrew C (see frenf.it)
scrape-ff got me to mid-2009 before failing. That's good!
- Andrew C (see frenf.it)
Oooh, how does the script die?
- Victor Ganata
Error in mkdir.
- Andrew C (see frenf.it)
Hmm. What directory does it choke on?
- Victor Ganata
I'm uploading my feed to http://fatoprofugus.net/friendf... — it's still rsync'ing and it hasn't gotten to the stylesheets yet so it's going to be pretty raw
- Victor Ganata
Looks like it finally uploaded the stylesheets. So this is an example of the final output of scrape-ff http://fatoprofugus.net/friendf...
- Victor Ganata
Wow. That looks really good.
- Greg GuitarBuster
Does it handle DMs, Victor?
- Stephen Mack
It can only download entries that were posted publicly, unfortunately.
- Victor Ganata
I don't think it choked on any particular directory. Maybe it hit the limits of Windows' filesystem?
- Andrew C (see frenf.it)
Huh. Maybe it is an NTFS limitation? I ran the script on your feed and it got me all the way to 2008 http://friendfeed.com/arc315...
- Victor Ganata
I've installed the 32-bit version for windows and both the nokogiri and fileutils gems (open-uri is supposedly part of the base install?). I've saved the script on your website to an rb file (FFExport.rb), and executed it from a command prompt as follows: "FFExport.rb mellissad". I did this because I noticed that the filename variable in the script seems to assume you're passing a command line argument to specify which username to use? Even given that all required libraries are seemingly installed I still get a page of errors when the script is run. First it barks about an invalid UTF-8 character, then about a page of syntax errors (assumedly because of the character/encoding error?). Anyone have any thoughts?
- Benny Bucko (Josh)
Is it choking on line 34 and line 319 by any chance? I'm having problems with getting my web host to display plain-text files in UTF-8 so there might be a couple of stray invalid characters in those lines.
- Victor Ganata
On mine it's line 45 which reads: if @index.xpath('(//div[@class="pager bottom"]/a)[1]').first.content == "Older items »" then
- Benny Bucko (Josh)
The character that seems to be causing the face-plant is the » character. This character is in the UTF-8 character specification though. :S Removing it only causes other problems. i.e. Nokogiri isn't recognised as being installed, even though successfully installed. It will also make the If statement fail since it's looking for specific text. Having devkit installed or not makes no difference. :-(
- Benny Bucko (Josh)
It took a while to figure out how to do it, but I think I finally got the scrape-ff script running.
- John (bird whisperer)
Ugh, I'm not sure how to fix the UTF-8 issue. I think my webhost is doing something really weird with the » character. Maybe if I upload it to a zip file, the encoding won't get borked? http://fatoprofugus.net/testing...
- Victor Ganata
Turns out Ruby on Windows is rubbish. Switched to my Ubuntu install and after installing all the appropriate packages it worked first attempt. Melly's feed is downloading as we speak. You have no idea how elated I am! Thanks so much for writing this script Victor! It's nice to be able to get around the 10k API limit. Melly has posted so many of the kids' quotes over the years that I feel it would be a travesty to lose them. Thanks again!
- Benny Bucko (Josh)
I have no way of running the Ruby script if anyone would do me the pleasure of grabbing my feed?
- Josh Haley
I meant to add please to that. Please?
- Josh Haley
ugh, just noticed that the newer and older post links are messed up. but all the data should be there.
- Victor Ganata
I think this fixes it http://fatoprofugus.net/testing... but I don't know if it's worth running now in case they shut down the site soon.
- Victor Ganata
I noticed the same, but a quick google of sed will show how to find/replace across multiple files.
- Benny Bucko (Josh)
I will write a quick script to fix up the already downloaded index files. Watch this space: http://fatoprofugus.net/testing...
- Victor Ganata
Have my babies.
- Benny Bucko (Josh)
Well, new ones I mean. I'm not trying to sell my kids on the black market.
- Benny Bucko (Josh)
^ Although that is tempting...
- Melly #FForever
hah!
- Victor Ganata
well it seems fix-indices works http://fatoprofugus.net/testing... — this is probably unnecessarily complicated but oh well (might want to do this on a backup of your backup, though)
- Victor Ganata
ok, I think I really figured out how to get around any 403s. http://fatoprofugus.net/testing... better late than never, right? :D
- Victor Ganata
I'm giving it another run. It occurred to me after I started the new version that I probably should have run it in a different directory, but at least I made a backup copy of the previous run.
- John (bird whisperer)
It got into June 2009 before it hit a mkdir error. That's about 3 months more than it got before.
- John (bird whisperer)