Уроки модуля (1/2)
Фильтрация и выбор данных
The Tidyverse is a collection of R packages designed for data science that share a common philosophy, grammar, and data structures. The core package for data manipulation in this suite is called dplyr.
dplyr introduces a set of functions ("verbs") that make data frame manipulation highly intuitive.
The Pipe Operator (%>% or |>)
The Tidyverse philosophy centers on chaining operations together. The pipe operator %>% (or the native pipe |> introduced in R 4.1+) takes the result of one expression and passes it as the first argument to the next function. This avoids nested function calls and prevents cluttering your workspace with temporary variables.
# Without pipe:
filter(select(df, name, age), age > 20)
# With pipe:
df %>%
select(name, age) %>%
filter(age > 20)
dplyr Verbs for Filtering and Selecting
The three fundamental verbs for extracting data from a data frame are:
1. select()
Selects specific columns from a data frame. You can list the column names to keep or use the - prefix to exclude columns.
# Select the 'name' and 'salary' columns
select(df, name, salary)
# Remove the 'address' column
select(df, -address)
2. filter()
Filters rows based on one or more logical conditions.
# Filter rows where age is greater than 30
filter(df, age > 30)
# Filter using multiple conditions (logical AND)
filter(df, age > 30, department == "HR")
3. arrange()
Sorts rows based on the values of one or more columns. The sorting order is ascending by default. To sort in descending order, wrap the column name in desc().
# Sort by age (ascending)
arrange(df, age)
# Sort by salary (descending)
arrange(df, desc(salary))
Try it yourself
Exercise 1: Select columns
Given the data frame df, select the name and age columns using select() and save the result in df_selected.
Показать подсказку
Usa: df_selected <- select(df, name, age)
Решение доступно после 3 попыток
Filter the rows of the data frame df where the age column is strictly greater than 18, saving the result in df_adults.
Показать подсказку
Use the filter(df, age > 18) function and assign the result to df_adults.
Решение доступно после 3 попыток
Use the pipe operator %>% to chain operations: first filter df keeping only records where age > 18, and then select the name column. Save the result in res.
Показать подсказку
Scrivi: res <- df %>% filter(age > 18) %>% select(name)
Решение доступно после 3 попыток
Sort the rows of the data frame df based on the salary column in descending order using arrange() and desc(). Save the result in df_sorted.
Показать подсказку
Usa arrange(desc(salary)) all'interno di una pipeline o come argomento diretto.
Решение доступно после 3 попыток
Write a complete pipeline on df: filter the records where department equals 'IT', select the columns name and salary, and sort the result by salary (ascending). Save the final result in res.
Показать подсказку
Use the pipe %>% to chain filter(department == 'IT'), select(name, salary), and arrange(salary).
Решение доступно после 3 попыток