## Transformation: Group By

The group by transformation is a powerful transformation for calculating summary statistics. You can use it to calculate group statistics, such as totals, or for summarizing the table with statistics for each group. Below we discuss some examples.

#### Group By Add to Columns

Say we have the following table about hotel reservations. We want to calculate the average tip per hotel room type, and the total nights stayed.

|<b></b>|name<br><span style="font-weight:normal;color:gray">&lt;character&gt;</span>|arrival date<br><span style="font-weight:normal;color:gray">&lt;character&gt;</span>|hotel room<br><span style="font-weight:normal;color:gray">&lt;character&gt;</span>|Nights<br><span style="font-weight:normal;color:gray">&lt;numeric&gt;</span>|tip<br><span style="font-weight:normal;color:gray">&lt;numeric&gt;</span>|breakfast y/n<br><span style="font-weight:normal;color:gray">&lt;logical&gt;</span>|
|:---|:---|:---|:---|---:|---:|:---|
|0|Kamile Sanford|1999-04-21|suite A|5|_\<null>_|true|
|1|Kody Tucker|1999-03-01|suite B|4|_\<null>_|true|
|2|Mandy Hewitt|1999-10-29|suite B|5|_\<null>_|true|
|3|Payton Sellers|1999-11-14|suite B|5|11.07|false|
|4|Jarrad Chang|1999-05-07|suite B|4|8.47|true|
|5|Hakeem Whitley|1999-10-27|suite C|2|_\<null>_|true|
|6|Fred Pruitt|1999-03-26|suite B|5|10.55|false|
|7|Mina Guevara|1999-09-13|suite C|3|_\<null>_|true|
|8|Flynn Oakley|1999-10-26|suite C|3|10.78|true|
|9|Hector Stuart|1999-10-07|suite A|1|11.87|true|

We can do that with the following specification in the transformation.

![GroupBy](images/group-by-settings.png)

These settings group the table by 'hotel room' and calculate the `mean` and `sum` for the 'tip' and 'Nights' columns.
The result looks as follows (we hid name, arrival date and breakfast y/n).

|&nbsp;|hotel room<br><span style="font-weight:normal;color:gray">&lt;character&gt;</span>|Nights<br><span style="font-weight:normal;color:gray">&lt;numeric&gt;</span>|tip<br><span style="font-weight:normal;color:gray">&lt;numeric&gt;</span>|tip (mean)<br><span style="font-weight:normal;color:gray">&lt;numeric&gt;</span>|tip (sum)<br><span style="font-weight:normal;color:gray">&lt;numeric&gt;</span>|Nights (mean)<br><span style="font-weight:normal;color:gray">&lt;numeric&gt;</span>|Nights (sum)<br><span style="font-weight:normal;color:gray">&lt;numeric&gt;</span>|
|:---|:---|---:|---:|---:|---:|---:|---:|
|0|suite A|5|_\<null>_|11.87|11.87|3|6|
|1|suite B|4|_\<null>_|10.03|30.09|4.6|23|
|2|suite B|5|_\<null>_|10.03|30.09|4.6|23|
|3|suite B|5|11.07|10.03|30.09|4.6|23|
|4|suite B|4|8.47|10.03|30.09|4.6|23|
|5|suite C|2|_\<null>_|10.78|10.78|2.67|8|
|6|suite B|5|10.55|10.03|30.09|4.6|23|
|7|suite C|3|_\<null>_|10.78|10.78|2.67|8|
|8|suite C|3|10.78|10.78|10.78|2.67|8|
|9|suite A|1|11.87|11.87|11.87|3|6|

As you can see, the transformation repeats the result statistics and appends it as columns. Now you can use the result in the tooltip of a chart, or in another transformation.

If you look closely, the column 'tip (mean)' did not consider _\<null\>_ values. Meaning here we see the average tip assuming we have a tip. If you first transform the _\<null\>_ values to zero, with the column [replace null transformation](TransformationReplaceNull.md), then we get the average tip. In general, the aggregations skip null values. 

#### Group By Summarize

We can also summarize the table, keeping only the results. In the above example, setting 'Summarize method' to 'summarize' produces the following result.

|&nbsp;|hotel room<br><span style="font-weight:normal;color:gray">&lt;character&gt;</span>|tip (mean)<br><span style="font-weight:normal;color:gray">&lt;numeric&gt;</span>|tip (sum)<br><span style="font-weight:normal;color:gray">&lt;numeric&gt;</span>|Nights (mean)<br><span style="font-weight:normal;color:gray">&lt;numeric&gt;</span>|Nights (sum)<br><span style="font-weight:normal;color:gray">&lt;numeric&gt;</span>|
|:---|:---|---:|---:|---:|---:|
|0|suite A|11.87|11.87|3|6|
|1|suite B|10.03|30.09|4.6|23|
|2|suite C|10.78|10.78|2.67|8|

With summarize, the result contains only the group by and the aggregation columns. 

#### Aggregation Functions

The transformation uses the following aggregation functions:

* **min**: take the minimum of the values, skipping *nulls*
* **max**: take the maximum of the values, skipping *nulls*
* **sum**: calculate the sum-total
* **mean**: calculate the sample mean, skipping *nulls*
* **count**: counts the number of not-_null_ values
* **first**: take the first value
* **last**: take the last value
* **list**: combine the values into a list
* **median**: calculate the sample median, skipping *nulls*
* **variance**: calculate the unbiased sample variance, skipping *nulls*

### See Also

[Add (Sub)Totals](TransformationAddTotal.md), [Group to Other](TransformationGroupToOther.md)

_