跳转到主要内容
eLearner.app
模块 4 · 第 1 课(共 2)课程中的7/10~12 min
模块课程(1/2)

过滤和选择数据

The Tidyverse is a collection of R packages designed for data science that share a common philosophy, grammar, and data structures. The core package for data manipulation in this suite is called dplyr.

dplyr introduces a set of functions ("verbs") that make data frame manipulation highly intuitive.


The Pipe Operator (%>% or |>)

The Tidyverse philosophy centers on chaining operations together. The pipe operator %>% (or the native pipe |> introduced in R 4.1+) takes the result of one expression and passes it as the first argument to the next function. This avoids nested function calls and prevents cluttering your workspace with temporary variables.

Code
# Without pipe:
filter(select(df, name, age), age > 20)

# With pipe:
df %>%
  select(name, age) %>%
  filter(age > 20)

dplyr Verbs for Filtering and Selecting

The three fundamental verbs for extracting data from a data frame are:

1. select()

Selects specific columns from a data frame. You can list the column names to keep or use the - prefix to exclude columns.

Code
# Select the 'name' and 'salary' columns
select(df, name, salary)

# Remove the 'address' column
select(df, -address)

2. filter()

Filters rows based on one or more logical conditions.

Code
# Filter rows where age is greater than 30
filter(df, age > 30)

# Filter using multiple conditions (logical AND)
filter(df, age > 30, department == "HR")

3. arrange()

Sorts rows based on the values of one or more columns. The sorting order is ascending by default. To sort in descending order, wrap the column name in desc().

Code
# Sort by age (ascending)
arrange(df, age)

# Sort by salary (descending)
arrange(df, desc(salary))

Try it yourself

Exercise 1: Select columns

锻炼#r.m4.l1.e1
尝试:0加载中...

Given the data frame df, select the name and age columns using select() and save the result in df_selected.

正在加载编辑器...
显示提示

Usa: df_selected <- select(df, name, age)

3 次尝试后可用的解决方案

锻炼#r.m4.l1.e2
尝试:0加载中...

Filter the rows of the data frame df where the age column is strictly greater than 18, saving the result in df_adults.

正在加载编辑器...
显示提示

Use the filter(df, age > 18) function and assign the result to df_adults.

3 次尝试后可用的解决方案

锻炼#r.m4.l1.e3
尝试:0加载中...

Use the pipe operator %>% to chain operations: first filter df keeping only records where age > 18, and then select the name column. Save the result in res.

正在加载编辑器...
显示提示

Scrivi: res <- df %>% filter(age > 18) %>% select(name)

3 次尝试后可用的解决方案

锻炼#r.m4.l1.e4
尝试:0加载中...

Sort the rows of the data frame df based on the salary column in descending order using arrange() and desc(). Save the result in df_sorted.

正在加载编辑器...
显示提示

Usa arrange(desc(salary)) all'interno di una pipeline o come argomento diretto.

3 次尝试后可用的解决方案

锻炼#r.m4.l1.e5
尝试:0加载中...

Write a complete pipeline on df: filter the records where department equals 'IT', select the columns name and salary, and sort the result by salary (ascending). Save the final result in res.

正在加载编辑器...
显示提示

Use the pipe %>% to chain filter(department == 'IT'), select(name, salary), and arrange(salary).

3 次尝试后可用的解决方案