Fetch and Combine CSV Files from the Web

Written by , November 11, 2016

Sceenshot of output.csv

If you’ve ever analyzed large amounts of data, you know that it can be time consuming to get all the data you need into one place.  Data providers (often government agencies) frequently provide the data online in a comma delimited format (aka CSV).  The data is regularly broken down into multiple files to provide smaller downloads or to provide more manageable data sets for users.  It is a time-consuming task to fetch and combine CSV files from the web into one file for large scale analysis.

A recent discussion with an engineer inspired us to put together a simple program to solve this problem.  We created a tool to fetch and combine CSV files from the web into one locally stored file.  This can save you lots of time if you routinely need to retrieve the same data sets over and over for analysis.  The tool can be configured to fetch the files you need and then scheduled to run at routine intervals.

Everybody likes free stuff, so we’re making this tool available free of charge to any of our readers.  If this interests you, we have some instructions below to help you get started.

Instructions

Popular data sets include files put out by the Census Bureau, Department of Energy and many other governmental agencies that distribute data for public consumption. The Michigan Public Service Commission (MPSC) compiles data from monthly natural gas production reports submitted by oil and gas operators in the state of Michigan. We’ll be using some CSV files from their website as an example.

Step 1: Download the Tool

You can download the zip file containing the program files here:

After you unzip the file, you will see that the tool consists of two files: a “main.exe” file and a text file called “urls.txt.”

Files for tool for combining multiple CSV files

The executable “main.exe” is what you run to fetch your files.  The file “urls.txt” is the file you use to configure what files the program will fetch.

Step 2: Locate the desired CSV files

For this step we will use an example site to demonstrate how you can use the tool.  When you go to the Michigan Natural Gas Production Reports page of the MPSC’s website, you’ll see that the Commission has compiled natural gas production data from July 1990 through March 2016. There are over a million rows of data for this time period and the Commission has broken it down into five CSV files:

MPSC website - example to Fetch and Combine CSV Files from the Web

Step 3:  Inspect the CSV files to make sure they all have the same columns

When using this tool, you should keep in mind that the tool is designed to fetch and combine multiple CSV files that use the same schema. That means that all of your CSV files should have the same columns in the same order (when opened in Microsoft Excel). You can see that all five of the files we are combining from the MPSC site have the following columns:

CSV column headers

Step 4: Copy your URL(s) from the web by right-clicking on the link and selecting “Copy link address”

'copy link as' screenshot

Step 5: Paste the URL for each file you want included in your output onto a new line in the urls.txt file and save the file

Open the “urls.txt” file in a standard text editor such as Notepad. After you’ve pasted the URL for each file onto a new line, save and close the file.

*PLEASE NOTE:  You should paste the URL with the column labels that you want first. The tool will use these column labels and all subsequent labels from the other URLs will be lost.

Text file screenshot - Fetch and Combine CSV Files from the Web

Step 6: Run the “main.exe” file

Executing the “main.exe” file will initiate a command line dialogue. This dialogue may appear to be doing nothing at first, but be patient. When it is finished with one file, it will show the progress to the next file. No user action is needed in this screen. When the operation is finished, the screen will automatically close.

Screenshot of tool to Fetch and Combine CSV Files from the Web

Step 7: Open “output.csv”

Once the operation is finished a new file will be in the folder, named “output.csv.” This file will have all of the data from the multiple CSVs.

Screenshot of result of Fetch and Combine CSV Files from the Web

Tip

If you find yourself routinely (weekly, monthly… etc.) needing to fetch and combine CSV files from the web because the provider is frequently posting updates, try using Window’s Task Scheduler.

Windows Task Scheduler

You can set tasks and triggers with Task Scheduler that Windows will automatically run based on set time intervals or in response to other actions.

Additional Help

We hope this is a useful tool for you to fetch and combine CSV files from the web. If you have any issues, would like additional support or would like to us to help you develop a more powerful tool, contact us. We’d love to hear from you!

– Jess

PLEASE NOTE:  This application is provided free of charge.  It is still in beta mode and may contain bugs or issues.  Users should review the output.csv file to ensure that all data was fetched and combined correctly.