Originally published at: Modern CSV Lets You Manipulate CSV Files Directly - TidBITS
If you don’t know what CSV stands for, stop right here. I’ll save you the time of even reading that it’s “comma-separated values” and is a funny, least-common-denominator format used to pass field-based data in rows—tabular data—between apps. For those who read “CSV” and thought, “Aha!” I want to introduce you to an excitingly useful new app.
If you’re only roughly familiar with CSV, let’s review the basics. A CSV file represents values for fields, like those used in a spreadsheet, as a set of items identically presented row after row in a text file. These fields have to be separated from one another on each row, or delimited, using characters that let a CSV parser know that they’re used for field division.
Rows are always delimited by a Return character. In modern macOS and Unix, a Return character is actually a Line Feed (LF), while Windows uses a Carriage Return/Line Feed (CR LF) combination. (These characters came into computing from typewriter technology, where a carriage return moved the carriage back to the starting point and a line feed advanced the paper.) I’m unaware of any controversy surrounding using Return characters to delimit rows.
Delimiting columns is trickier. Any character used to delimit fields could also appear in the data. In the data part of a column, you have to encode the delimiter character in a special way, such as enclosing it in quotation marks or preceding it with a backslash, among other techniques, a process called “escaping.”
For many years, I thought tab-delimited files (TSV or tab-separated values) ruled the roost. Because tabs are used primarily for formatting in text, they seemed to me to be much less likely to appear in tabular data. Using a tab limited the need for escaping in the data portion of a file. However, I think that’s changed. Many apps and services only offer CSV as an export option and list CSV as the only plain-text import supported. (Some of those also quietly support TSV even when they don’t use the terminology). I’m not a fan of CSV because so many characters require escaping—starting with the frequently used comma. To avoid escaping commas, CSV exports often add quotation marks around all the fielded portions—like "lastname, firstname", "phone"
—but then you have to deal with quotation marks instead. But there’s no point in arguing for tabs or other, more obscure characters at this point: it seems that commas have won.
Despite its uptick in popularity, the CSV style of encoding tabular data isn’t new; it was first used by the IBM Fortran compiler in 1972. The names “comma-separated values” and “CSV” date to 1983 and the SuperCalc spreadsheet. It wasn’t until 2005 that CSV files were formally standardized with RFC 4180—perhaps getting a standard gave CSV the advantage over TSV in the market. Whether most products and Web apps conform accurately to this standard is unknown.
CSV files can be opened in any text editor and are human-readable, but few people look at raw CSV files because they’re primarily used as an interchange format for moving data between systems. To work with the data in a CSV file, you generally import it into a spreadsheet or database-driven system. In today’s world, most people work with tabular data in Microsoft Excel, Apple’s Numbers, or Google Sheets. BBEdit has a few helpful features for working with columnar data directly, but they are unusual among text editors.
Because of my event management work for the Finger Lakes Runners Club, I frequently handle CSV files. For races where I’m managing the timing system, I need to download registration data to merge onto bib number labels and print lists of runners so volunteers can give each runner the correct bib number, so the local Wilderness Search & Rescue team can identify which runners have passed which checkpoints, so clothing coordinators know who ordered race shirts, and for any number of other reasons. After a race, I often have to download race results from the timing system and tweak them before uploading the processed information to the club website.
Because CSV is an interchange format, I usually import files into a spreadsheet, make any necessary changes, and then save, print, or export for whatever my next step is. I usually use Excel for processing because it can save an opened CSV without a separate export step, which Numbers requires. Google Sheets would also require exporting and would clutter my Google Drive with temporary documents that I need only briefly.
Please don’t interpret my usage of Excel as an endorsement, though. I have a fractious relationship with Excel, particularly when working with running times, which spreadsheets treat like times of day and often reformat in weird ways. A 5:24 mile has nothing in common with 5:24 AM, so it’s essential to block any automatic date formatting. And don’t get me started about ZIP codes that start with a zero!
Might there be a better way of working with CSV files?
Introducing Modern CSV
I ran across an intriguing app earlier this year that has become my go-to tool for working with CSV files: Modern CSV. When you launch it, it looks like a spreadsheet, displaying data in rows and columns, but it doesn’t require that you write formulas to manipulate data. Instead, it has an extensive set of data manipulation capabilities that you apply directly to the contents of a CSV file. In essence, Modern CSV uses CSV as its native format and lets you choose common data transformation, conversion, concatenation, and other actions from menus instead of pre-parsing files in a text editor or building formulas and juggling results columns in a spreadsheet app. With CSV as the native file format, you skip all that to work directly with tabular data.
I have to note that Modern CSV is a cross-platform app with support for macOS, Windows, and Linux. That’s great for those who shift between platforms, but it means that Modern CSV doesn’t look particularly Mac-like. You won’t have trouble using Modern CSV, but aspects of its interface are both unfamiliar and ugly. The speed and flexibility with which I can work with CSV files in Modern CSV outweigh such aesthetic considerations for me, so I have changed my settings in the Finder so that all .csv files open in Modern CSV. (Select a CSV file, choose File > Get Info, choose Modern CSV from the Open With pop-up menu, and click Change All under “Use this application to open all documents like this one.”)
To help you understand what Modern CSV offers, I’m going to walk you through a sampling of real files that I need to work with, nicely anonymized by asking ChatGPT to generate random names to replace the actual ones. Some of these actions are simple, others complex, and a few somewhat contrived, but overall, they’re the sort of tasks I need to accomplish regularly. Advanced features in Modern CSV that go beyond what I need include a high-performance read-only mode for huge files, support for multiple delimiters and line endings, transposition of columns and rows, export and import of JSON and XML files, and support for multiple character encodings.
It’s worth keeping in mind that most of what I find so helpful in Modern CSV can be accomplished in a spreadsheet. But unless you’re already an Excel or Numbers expert, figuring out how to perform some action efficiently in a spreadsheet may take longer than doing it inefficiently or in another app. Feature discoverability in spreadsheets is extremely low. So, if you notice me praising Modern CSV for something you know a spreadsheet can do, it’s likely because I didn’t know it was possible.
Modern CSV offers a solid set of features for free, but you can increase its functionality with the $39 Premium Personal or the $59 Premium Business licenses—both are one-time fees, not subscriptions. Whether you need to upgrade to a paid plan depends entirely on the transformations you commonly make. Although I tested the Premium Business version of Modern CSV, I think the Premium Personal version would suffice for my needs.
Sort Columns
I frequently want to sort columns numerically or alphabetically. Although this is simple enough in most spreadsheets, Modern CSV makes it even easier by sorting the entire table when you double-click a column header. Double-click it a second time to reverse the sort order.
Many CSV files have a header row describing each column’s contents. That’s helpful, and it’s often essential to retain it if you’re uploading the edited CSV file to a system that requires a header row to understand the column layout. However, having text like “Bib” at the top of a numeric column of bib numbers can sometimes confuse sorting unless the app handling knows that row is header information, as most spreadsheets do (or can be convinced to do).
Header information isn’t usually a problem because Modern CSV usually assumes the first line is a header and excludes it from sorting. If your CSV file contains multiple non-data rows or multiple internal tables, you may have to identify a particular row as a header. Control-click the desired row and choose Set Header Rows to Selected Row. You can do the same thing with columns.
As you’ll see in the Data > Sort menu, simple column sorting is just the start for Modern CSV. It can also sort rows and selected cells, and, for the truly persnickety, make all sorts case-sensitive.
Delete Columns and Rows
My downloaded data usually contains numerous extra columns that get in the way. Although Modern CSV doesn’t have much of an advantage over spreadsheets here, deleting columns is a breeze. Just select the desired columns or rows by Shift- or Command-clicking, Control-click one, and choose Remove Column(s) or Remove Row(s). I do this often enough that I’ve memorized the Command-Shift-L and Command-Shift-K keyboard shortcuts.
However, Modern CSV takes deletion to new levels. With a single command from Data > Delete, you can remove duplicate or empty rows or columns, which could make cleaning up a file far quicker. Although the menu options are extremely wordy, their long names help you understand what they’ll do.
Move Columns, Rows, and More
When developing a list of runners for bib pickup, I want to ensure the Bib column is first. It’s not always in that position to start, so I need to move it. To move a column in Modern CSV, select a column, press Shift, and then drag it to the new position. Moving rows is just as easy, but I usually rely on sorting for rows. While Numbers and Google Sheets don’t require pressing Shift, I find Excel’s interface maddening: you must select the column, move the pointer out of the column header area, position it over the edge of the column such that it turns into a hand, press Shift, and then drag the column. If you fail to press Shift, Excel warns that it will overwrite the data in the destination column, which is an insane bit of user interface.
What’s really cool, though, is that Modern CSV can also move selected cells around. If you use the Data > Move commands, Modern CSV shuffles the data to avoid leaving empty space; Shift-dragging the selected cells replaces the contents in the destination cells (indicated by highlighting). This capability could be helpful with CSV files with a lot of manual formatting that breaks across rows and columns, something I’ve seen occasionally in output from older systems. In Excel and Numbers, dragging cells allows only moving or copying to overwrite the destination.
Join, Split, and Trim Columns
Whenever you receive data with people’s names as a combination of first and last (and any initials or other pieces before, after, or in between), you know you’ll need it broken into first name and last name for the ultimate source. Likewise, if you get a CSV with first and last names in separate columns, you know you’ll have to combine them. It’s just a principle of the universe.
In my running work, I often need to save space on printed lists or results tables, so I like to combine separate first and last name columns. Doing that in a spreadsheet requires creating a new Full Name column, writing a formula that concatenates the two names with a space between them, and then filling the formula to the bottom of the sheet. Then, I have to select that entire column (excluding header data) and use a special paste operation to insert the formula’s results while severing the relationship with the original columns—only then can I delete the original columns.
Joining columns in Modern CSV involves much less work: select them and choose Data > String Manipulation > Join Selected Cells Horizontally. A dialog prompts you for what text you want to separate the joined data; for a name, it’s a space. Press Return and the joined data appears in the first column; the other one becomes empty.
Modern CSV’s other string manipulation commands let you split and trim selected cells in various ways.
The trimming is beneficial for me because I often need to remove tenths of seconds from the end of times (turning 17:56.3 into 17:56, for instance). There’s no easy way to perform that operation on duration in a spreadsheet app, so in the past, I would have used a grep search in BBEdit to remove the tenths. It’s much easier to use the Keep Only Left Part of Selected Cells option in Modern CSV, entering -2 for the index to tell it to delete the two rightmost characters. Note that header rows are subject to string manipulation, so you must deselect them to avoid unwanted changes.
One final note. When race results include times that are under and over an hour, such as 0:58:36 and 1:02:46, I want to remove the 0: from the under-hour times. I could use Modern CSV’s Find and Replace commands, but it’s even easier to select just the under-hour times and trim off the first two characters with Keep Only Right Part of Selected Cells.
Normalize Case
I’ll admit to a certain level of obsessiveness here, but as a publishing professional, I can’t stand publishing results with random capitalization, often the result of a Caps Lock key being left on. Fixing such mistakes is harder than it’s worth in a spreadsheet, but Modern CSV can normalize case easily.
Select some data (probably a column or two) and choose Data > Case > Title Case. (Yes, I realize that a few people prefer their names in a specific case, but no one has ever complained, and without such a complaint, I’m going to set the house style for my results.)
Other case options include UPPER CASE, lower case, UpperCamelCase, lowerCamelCase, UPPER_SNAKE_CASE, lower_snake_case, and Convert to Acronym, which produces an uppercase word using the first letter of each word.
Find and Replace
In addition to string manipulation commands, Modern CSV provides powerful find and replace capabilities for when you want to match and make changes instead of applying an operation across a range of cells. The buttons below the sidebar let you perform case-sensitive searches, regular expression searches, match only whole words, match only entire cells, search only within selected cells, highlight matches, and preview replacements.
Imagine I have a set of times where some contain a leading zero that I want to remove. In the screenshot below, I’m searching for the string 0:
within selected cells and highlighting the matches. The red highlighted cells indicate that there are several unwanted matches. Modern CSV has at least two ways to remove just leading zeros and their colons. First, what I show below restricts the replacement to the selected cells (highlighted in blue). A second approach would be to use a regular expression search for ^0:
since the ^
indicates the following characters must be at the start of a line.
Filter Columns
I often need to work with a subset of my data—everyone who purchased shirts, for instance. Sorting sometimes helps with that, but a better approach is to use a filter that restricts the visible rows to just those that match the filter. Modern CSV has a plethora of filtering commands that I don’t entirely understand yet, but the easiest one lets you select one or more cells in a column and then choose Data > Filter > Filter Selected Column(s) with Selected Cell Contents. That displays only the rows whose cells in the selected column match those you had selected.
For a more visible and flexible way of doing that, choose Data > Filter > Select Values in Selected Column to Filter. That brings up the File Analysis window’s Unique Values tab. Along with showing statistics about the contents of the column, the window lets you select the desired items (everything but None, shown below) and then click the Filter Selected Values In button to show just the matching rows. (Clicking Filter Selected Values Out would show only the rows containing None.)
If all that’s more than you need, Modern CSV has simple Hide/Show commands that let you control what you see without deleting data or structure.
Other Features
It’s impossible to list all the Modern CSV features that you might find helpful. Others that I’ve appreciated on occasion include:
- The status bar at the bottom of the window (see below) displays the file grid size and selected cell grid. Cell length is the number of characters in the current cell. The Sum and Mean fields report on those calculations if you select cells containing numbers.
- If you click the tiny button in the lower-right corner of the Modern CSV window, it displays a File Metadata sidebar with helpful information and controls. For instance, if you’ve opened a tab-delimited file but need to save a CSV, you can change the delimiter in the Save Parameters section to a comma. This feature might even help me with one app—the crufty HyTek Meet Manager—that requires a semicolon as a delimiter.
- The Data > Add or Update ID Column command is an easy way to create a column of sequential numbers starting with 1. If you’d prefer to start with another number or increment by something other than 1, choose Data > Multi-cell > Fill Selected Cells with Series, and enter your starting number and increment value. The Data > Numbers commands also provide interesting options for filling columns with random numbers.
- Those who work with huge CSV files might appreciate Modern CSV’s extensive set of commands for scrolling or jumping around in a file, like Go To > Go To Cell. The Go To menu also has commands for following hyperlinks, emailing selected addresses, searching the Web for selected text, and searching for places on a map.
What about Printing?
It might seem as though Modern CSV has an answer to nearly every CSV-related need you could have. Unfortunately, there is one lapse: printing. I often need to print lists of runners for race coordination, and Modern CSV can’t print at all. To an extent, this isn’t entirely surprising since printing probably differs between macOS, Windows, and Linux.
As a result, I often find myself opening a CSV file downloaded from a race registration system in Modern CSV, quickly removing unnecessary columns and rows, sorting and editing appropriately, saving, and opening in Excel, which has fairly flexible printing options. I appreciate that Excel can print just selected cells and can scale a printout to a single page wide. (Google Sheets can print selected cells, too, but Numbers can’t.)
Although this lack of printing support reduces the utility of filtering and hiding data in Modern CSV, its File > Save Visible Data As command provides a workaround, albeit at the cost of creating another file to print externally.
Happily, when I asked Modern CSV’s developer about printing, he said it was at the top of his priority list and would be included in the next revision to the app, which is due in a few months. I may never launch Excel after that.
Despite its current inability to print and occasionally wacky cross-platform interface, Modern CSV is a powerful tool to have at your disposal. It may be too specialized for those who work with CSV files only occasionally, but for people like me whose Downloads folders are littered with CSVs, it’s easy to look past Modern CSV’s peculiarities.