When speaking with clients, customers and friends about Alteryx, I always mention that adding multiple files in bulk with ease is a nice feature which Alteryx does well.
However, there are a couple of ways to do this, and in this post I'm going to explain two of them, and discuss the merits/demerits of each.
Using the * wildcard input.
A neat feature to input multiple files is to use *.<filetype> within the input tool. Let me show this example.
I have multiple CSVs, similar to shown below. I would like to bring these all through, and I know each file has the same data structure - by this, I mean each file has the same column headers.
One solution here could be to bring these all into Alteryx as individual inputs, and then use the Union tool to bring these all together. But what happens when I have hundreds of csv files, such as below?
The trick here is to use *.csv in the input, and this will union the files as they come in. One thing to be aware of is that Alteryx will use the first file it reads in (which will be based alphabetically) as the one it uses as the 'master' to union the columns)
Below is how we'd normally input a file..
But when we want to bring in multiple files, we replace the text (in this case Transactions 2015 Apr.csv) with a * -- as below.
So this works nicely when you know that each file in your set has an identical structure, and is a godsend when you wish to combine/input/union many files at once within seconds.
However, if my CSVs are not in one place, but in multiple folders (perhaps categorised by country) the *.csv trick will not work.
So let's talk about the Dynamic Input tool.
The Dynamic Input tool isn't as straightforward as the *.<filetype> trick. In order to input multiple files, it requires the Directory tool (to read in file names from a folder) and a 'master' file to compare data structures - similar to the *.<filetype>, but more explicitly.
For this example, I am sharing the process I use when I input data from a folder which contains two seasons worth of football data. I'm interested in 2 things;
- The 'chances' file as this gives me shot level data BUT
- Only for the current season
So the tool I need here in Alteryx to read all of these fields in is the File Directory tool and the Dynamic Input tool. These two in tandem can help me bring in ONLY the data I'm interested in seeing.
In order to only look at the .csv files, we use a similar syntax to the asterisk trick, and add a *.csv for the wildcard. In this case, we also want SubDirectories included. The output is below:
The output will look like something like this, with the full path, directory, file name - etc all in the output. We're interested in the 'FullPath' field here - but in my case I want to exclude all filepaths which don't include "2016-17 Data" - so a filter tool comes in before my Dynamic Input tool.
There's a few configuration points to make here:
- The Dynamic Input tool has to have a 'Source Template' - this is basically the .csv file you want to dynamically input with a guideline on the columns, and any other input configs you would like to make (ie. delimiters, whether the file name should be outputted as a field etc.)
- We want to set the tool to 'Read a List of Data Sources' - which in this case is the chances dataset for this season we've filtered to.
- The 'Action' drop down should then change the Entire File Path (of the Input Source) and the field we're using to do this is FullPath
0 comments:
Post a Comment