メインコンテンツにスキップ
eLearner.app
モジュール 4 · レッスン 1 / 2コース内の 7/10~12 min
モジュールのレッスン (1/2)

データのフィルタリングと選択

The Tidyverse is a collection of R packages designed for data science that share a common philosophy, grammar, and data structures. The core package for data manipulation in this suite is called dplyr.

dplyr introduces a set of functions ("verbs") that make data frame manipulation highly intuitive.


The Pipe Operator (%>% or |>)

The Tidyverse philosophy centers on chaining operations together. The pipe operator %>% (or the native pipe |> introduced in R 4.1+) takes the result of one expression and passes it as the first argument to the next function. This avoids nested function calls and prevents cluttering your workspace with temporary variables.

Code
# Without pipe:
filter(select(df, name, age), age > 20)

# With pipe:
df %>%
  select(name, age) %>%
  filter(age > 20)

dplyr Verbs for Filtering and Selecting

The three fundamental verbs for extracting data from a data frame are:

1. select()

Selects specific columns from a data frame. You can list the column names to keep or use the - prefix to exclude columns.

Code
# Select the 'name' and 'salary' columns
select(df, name, salary)

# Remove the 'address' column
select(df, -address)

2. filter()

Filters rows based on one or more logical conditions.

Code
# Filter rows where age is greater than 30
filter(df, age > 30)

# Filter using multiple conditions (logical AND)
filter(df, age > 30, department == "HR")

3. arrange()

Sorts rows based on the values of one or more columns. The sorting order is ascending by default. To sort in descending order, wrap the column name in desc().

Code
# Sort by age (ascending)
arrange(df, age)

# Sort by salary (descending)
arrange(df, desc(salary))

Try it yourself

Exercise 1: Select columns

運動#r.m4.l1.e1
試行回数: 0読み込み中…

Given the data frame df, select the name and age columns using select() and save the result in df_selected.

エディターを読み込み中…
ヒントを表示

Usa: df_selected <- select(df, name, age)

3 回の試行後に解決策が利用可能になります

運動#r.m4.l1.e2
試行回数: 0読み込み中…

Filter the rows of the data frame df where the age column is strictly greater than 18, saving the result in df_adults.

エディターを読み込み中…
ヒントを表示

Use the filter(df, age > 18) function and assign the result to df_adults.

3 回の試行後に解決策が利用可能になります

運動#r.m4.l1.e3
試行回数: 0読み込み中…

Use the pipe operator %>% to chain operations: first filter df keeping only records where age > 18, and then select the name column. Save the result in res.

エディターを読み込み中…
ヒントを表示

Scrivi: res <- df %>% filter(age > 18) %>% select(name)

3 回の試行後に解決策が利用可能になります

運動#r.m4.l1.e4
試行回数: 0読み込み中…

Sort the rows of the data frame df based on the salary column in descending order using arrange() and desc(). Save the result in df_sorted.

エディターを読み込み中…
ヒントを表示

Usa arrange(desc(salary)) all'interno di una pipeline o come argomento diretto.

3 回の試行後に解決策が利用可能になります

運動#r.m4.l1.e5
試行回数: 0読み込み中…

Write a complete pipeline on df: filter the records where department equals 'IT', select the columns name and salary, and sort the result by salary (ascending). Save the final result in res.

エディターを読み込み中…
ヒントを表示

Use the pipe %>% to chain filter(department == 'IT'), select(name, salary), and arrange(salary).

3 回の試行後に解決策が利用可能になります