Exam Databricks-Certified-Data-Engineer-Associate Topic 3 Question 6 Discussion
Actual exam question for Databricks's Databricks-Certified-Data-Engineer-Associate exam
Question #: 6
Topic #: 3
Question #: 6
Topic #: 3
A data engineer has realized that the data files associated with a Delta table are incredibly small. They want to compact the small files to form larger files to improve performance.
Which of the following keywords can be used to compact the small files?
Which of the following keywords can be used to compact the small files?
Suggested Answer: B Vote an answer
The keyword that can be used to compact the small files associated with a Delta table is OPTIMIZE. The OPTIMIZE command performs file compaction on a Delta table by rewriting a set of small files into a set of larger files1. This can improve the performance of queries that scan the table by reducing the number of files that need to be read and the amount of metadata that needs to be processed1. The OPTIMIZE command can also optionally sort the data within each file by a given set of columns, which can further improve the query performance by enabling data skipping and predicate pushdown1. The OPTIMIZE command can be applied to the whole table or to a specific partition of the table1.
The other keywords are not suitable for compacting the small files associated with a Delta table. REDUCE is a keyword used in the SQL syntax for aggregating data using a user-defined function2. COMPACTION is not a valid keyword in SQL or Python. REPARTITION is a keyword used in the Python syntax for changing the number of partitions of a DataFrame or an RDD3. VACUUM is a keyword used to remove files that are no longer referenced by a Delta table and are older than a retention threshold4.
1: OPTIMIZE | Databricks on AWS
2: REDUCE | Databricks on AWS
3: repartition | Databricks on AWS
4: VACUUM | Databricks on AWS
The other keywords are not suitable for compacting the small files associated with a Delta table. REDUCE is a keyword used in the SQL syntax for aggregating data using a user-defined function2. COMPACTION is not a valid keyword in SQL or Python. REPARTITION is a keyword used in the Python syntax for changing the number of partitions of a DataFrame or an RDD3. VACUUM is a keyword used to remove files that are no longer referenced by a Delta table and are older than a retention threshold4.
1: OPTIMIZE | Databricks on AWS
2: REDUCE | Databricks on AWS
3: repartition | Databricks on AWS
4: VACUUM | Databricks on AWS
by Anastasia at Mar 07, 2026, 09:27 AM
Contact Us
If you have any question please leave me your email address, we will reply and send email to you in 12 hours.
Our Working Time: ( GMT 0:00-15:00 ) From Monday to Saturday
Support: Contact now
Comments
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
Report Comment
Commenting
You can sign-up / login (it's free).