The gist for this lambda function can be found here.

One-hot encoding. Create as many columns as there are unique values in a variable. Put a 1 in a cell if the column and row represent the same value, otherwise put a zero in the cell. Use these new columns to create ML models.

Do it in Python. Do it in R. Do it in Excel, if the mood takes you. That’s right. You can one-hot encode categorical data in Excel.

Use this:

=LAMBDA(rng,
		LET(
			var,INDEX(rng,1,1),
			vals,UNIQUE(INDEX(rng,2,1):INDEX(rng,ROWS(rng),1)),
			heads,var&"_"&TRANSPOSE(SUBSTITUTE(vals," ","_")),
			MAKEARRAY(
				ROWS(rng),
				COLUMNS(heads),
				LAMBDA(r,
					   c,
					   IFS(r=1,INDEX(heads,1,c),
						   INDEX(rng,r,1)=INDEX(TRANSPOSE(vals),1,c),1,
						   TRUE,0)
						)
					)
			)
		)

Let’s break it down:

=LAMBDA(rng,
		LET(
			var,INDEX(rng,1,1),
			vals,UNIQUE(INDEX(rng,2,1):INDEX(rng,ROWS(rng),1)),
			heads,var&"_"&TRANSPOSE(SUBSTITUTE(vals," ","_")),

ONEHOT has one argument: a single-column range of data that includes a column header.

In the gif above, you can see that I select the “country” column and all the rows beneath.

Immediately following the arguments we have a LET function, which defines:

  • var – the name of the variable we are encoding. This is the first item in the rng array – the column header. In the example above: “country”
  • vals – the unique list of items in the array from row 2 to the end. This is the unique list of countries
  • heads – here we are concatenating the variable (country) with a transposed array of the values (e.g. united kingdom), and replacing any spaces with an underscore

Next we have:

			MAKEARRAY(
				ROWS(rng),
				COLUMNS(heads),
				LAMBDA(r,
					   c,
					   IFS(r=1,INDEX(heads,1,c),
						   INDEX(rng,r,1)=INDEX(TRANSPOSE(vals),1,c),1,
						   TRUE,0)
						)
					)
			)
		)

Here we are creating an array that is the same height as rng and the same width as heads.

If the row of the new array is 1, then we will place the column header there: INDEX(heads,1,c).

Otherwise, if the value in rng on row r is equal to the value represented by the column header for column c, then we place a 1. Otherwise, we place a 0.

That’s pretty much it. Another learning exercise using MAKEARRAY to shortcut a task.

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>