Copytables Simplifies Extracting Tabular Data from Web Pages

Originally published at: Copytables Simplifies Extracting Tabular Data from Web Pages - TidBITS

If you’ve ever been frustrated by trying to copy a column of data—or even just a handful of cells—from within a table on a Web page, get the Copytables browser extension for Chrome-based browsers, Firefox, or Safari.

4 Likes

I’m a learning technologist at a university where we use Blackboard Learn as the virtual learning environment (VLE) and it’s not uncommon that I have to grab large swathes of data from it. Not unlike you, I’ve often done a cmd-A, cmd-C and cmd-V into BBEdit before going through the rigmarole of excluding the bits I don’t need, before moving the data over to Excel. I’m pretty adept at using BBEdit to whittle it down, but…

This seems pretty awesome! I’ve just had a quick play with it on tables that extend to a 1000 rows. (A kilotable?) It dealt with selecting the first few columns from a wider table without breaking a sweat.

One observation - it initially seemed more miss than hit for making its selection. And then I twigged - at least in the case of these Blackboard tables, you seem to have to start your initial click in the upper left corner of the first table cell. Clicking in the centre of the top cell (typically over the text in the column header) doesn’t seem to work.

Ooh - awesome+ You can select discontiguous columns.

I’ve noticed that it can be a little finicky about where you have to click too, but I’ve chalked that up to tables being formatted in odd ways.

And yes, discontinuous columns! Forgot to mention that—I’ll add it.

Not a biggie, but it isn’t free. $2.99 in the App Store for Safari.

Oops! Updating—the Chrome extension is free.

Does it work with the web interface to Gmail?

In other words, if you use it to copy a table from a web page, can the table be pasted into a Gmail message (within a browser window)?

1 Like

Copying and pasting directly from an html table to gmail (at least with Mailplane) does not give you what you are probably looking for. However, paste into BBEdit and then copy/paste from there works fine. Columns are separated by tabs.

Interestingly enough, copying from html and pasting into TextEdit produces something that looks more like a table.

And then copying that and pasting it into Gmail produces the identical format. Might be more like what you are seeking.

2 Likes

Thanks, @hartley. I’m not getting the same results in TextEdit, but I’m not entirely surprised—clipboard formats are a black box, and I’m probably doing something slightly differently.

What worked for me was to run the HTML code through HTML Cleaner, a Web utility I found long ago, and then to copy the rendered HTML again. That put formatted HTML on the clipboard for pasting into Mimestream, TextEdit, and yes, Gmail’s Web interface.

Thanks, Adam! I have shelled out for the Safari version and it’s working well for me. Regular expressions in BBEdit will clean up anything given enough patience (I use BBEdit’s Text Filters for frequent jobs), but Copytables is easier.

2 Likes

Thanks @ace, what a fantastic extension - I use it with Safari and works great with HTML tables so far. It spares me from wrangling the formatting and such - a great productivity and quality-of-life improvement.

1 Like

I’m once again displaying my acumen for not understanding even the Dummies for Dummies series books. I bought and downloaded it and installed the extension. Great–it selected the columns or the cells I clicked in (option or command key). But Command C copied nothing. Went to Edit menu, all options are grayed except Select All. I saw a comment in the Review on the app store, “I only wish it would use the regular command C to copy the selection, though.” My thoughts exactly but I can’t seem to see how to copy it using ANY command or menu option. Nor do I see the “Popup Menu” referred to by the creator for copying in various formats. Please take pity on me and be kind with the “You Dummy!” responses I know some are probably formulating. I cry easily. Thanks for any advice on how to do more than just select. I’d like to copy and paste as well.

Nevermind. I knew as soon as I broadcast the fool that I am, the answer would appear. The little icon that shows up next to the URL in safari seems to provide the answer. Apologies for wasting your time.

1 Like

I find it a bit erratic depending on the table and the web site. Some it will work fine with, others it is just like not using it.

It is not clear to me how you copy an entire table that spans several web pages. For example Strava segment results can span several pages with each page containing 25 results. I can copy the bit of table displayed on the single web page but how do I get the entire table without having to do each individual page?

Wow, this takes me back to 1984 and many long hours studying the Inside Mac volumes. A long forgotten item in IM concerned the clipboard. IM described THREE types of basic content there: text, pictures and (ta-da!) tables! However, this was not actually implemented in the API. There was an Apple-run discussion among some Mac developers called the Table Group that tried to hammer out specs but it rapidly became unwieldy and overcomplicated and nothing became of it. I think Apple sent us a certificate or some such in thanks.

3 Likes

I use an extension called Table Capture. It’s not free, but it handles multi-page and scrolling tables. You have to scroll or page through them yourself, but as you’re doing so, Table Capture is grabbing them. It can also export to Excel or Google Sheets (as well as just copying the table). As I said, it’s not free (it’s $12/year), but I copy tables, particularly scrolling tables, enough that it’s worth it to me.

1 Like

Paradoxically, the paid-for Safari extension is significantly less capable than the free plugins for other browsers, presumably thanks to App Store hoop-jumping. It’s still useful, but the paste formats are limited to Formatted (= As is), CSV regular and “transposed” (= Swap), HTML, and Markdown; there are no markup-free plain-text options at all, so some kind of regex cleanup is needed in most everyday use cases. There are also no Find or Capture options in the Copytables window.

1 Like

That may be due to truly wacky HTML that it can’t parse. I’ve not had problems, but my suggestion would be to try alternative selections (just columns instead of the entire table, for instance).

@ace Having written a web page table parser I can assure you that wacky HTML is the norm rather than the exception. Good ol’ simple html tables are a comparative breeze because they’re a simple structure (cough) but modern css-driven tables are a nightmare because you can randomly place cells visually that have nothing to do with the normal left-right; top-down expectation. (I gave up.)

I noticed mention of clipboards somewhere in here and the modern clipboard is nothing like the old one. You think “a” clipboard, the reality is 6, 8, 10 or more different expressions of the same stuff. It wouldn’t surprise me to discover an EBCDIC clipboard at at some point. In fact, the clipboards are a pretty miraculous piece of engineering. Complicated translation on the fly. . . .

As far as grabbing tables go, I’ve found copying a web page (not necessarily a table) into Pages has pretty amazing results—accuracy and non-hair-pulling wise. I’m not sure why TextEdit results and Pages results are different (given the probably identical engines) but they are. You then copy that out of Pages into Bbedit and away you go. . . . :slight_smile:

Dave

3 Likes