モジュールのレッスン (1/2)
データのフィルタリングと選択
The Tidyverse is a collection of R packages designed for data science that share a common philosophy, grammar, and data structures. The core package for data manipulation in this suite is called dplyr.
dplyr introduces a set of functions ("verbs") that make data frame manipulation highly intuitive.
The Pipe Operator (%>% or |>)
The Tidyverse philosophy centers on chaining operations together. The pipe operator %>% (or the native pipe |> introduced in R 4.1+) takes the result of one expression and passes it as the first argument to the next function. This avoids nested function calls and prevents cluttering your workspace with temporary variables.
# Without pipe:
filter(select(df, name, age), age > 20)
# With pipe:
df %>%
select(name, age) %>%
filter(age > 20)
dplyr Verbs for Filtering and Selecting
The three fundamental verbs for extracting data from a data frame are:
1. select()
Selects specific columns from a data frame. You can list the column names to keep or use the - prefix to exclude columns.
# Select the 'name' and 'salary' columns
select(df, name, salary)
# Remove the 'address' column
select(df, -address)
2. filter()
Filters rows based on one or more logical conditions.
# Filter rows where age is greater than 30
filter(df, age > 30)
# Filter using multiple conditions (logical AND)
filter(df, age > 30, department == "HR")
3. arrange()
Sorts rows based on the values of one or more columns. The sorting order is ascending by default. To sort in descending order, wrap the column name in desc().
# Sort by age (ascending)
arrange(df, age)
# Sort by salary (descending)
arrange(df, desc(salary))
Try it yourself
Exercise 1: Select columns
Given the data frame df, select the name and age columns using select() and save the result in df_selected.
ヒントを表示
Usa: df_selected <- select(df, name, age)
3 回の試行後に解決策が利用可能になります
Filter the rows of the data frame df where the age column is strictly greater than 18, saving the result in df_adults.
ヒントを表示
Use the filter(df, age > 18) function and assign the result to df_adults.
3 回の試行後に解決策が利用可能になります
Use the pipe operator %>% to chain operations: first filter df keeping only records where age > 18, and then select the name column. Save the result in res.
ヒントを表示
Scrivi: res <- df %>% filter(age > 18) %>% select(name)
3 回の試行後に解決策が利用可能になります
Sort the rows of the data frame df based on the salary column in descending order using arrange() and desc(). Save the result in df_sorted.
ヒントを表示
Usa arrange(desc(salary)) all'interno di una pipeline o come argomento diretto.
3 回の試行後に解決策が利用可能になります
Write a complete pipeline on df: filter the records where department equals 'IT', select the columns name and salary, and sort the result by salary (ascending). Save the final result in res.
ヒントを表示
Use the pipe %>% to chain filter(department == 'IT'), select(name, salary), and arrange(salary).
3 回の試行後に解決策が利用可能になります