Databricks-Certified-Data-Engineer-Associate by Databricks Actual Free Exam Questions And Answers

Question 1

A data engineer is designing a streaming pipeline and wants to limit how long Spark maintains state information for aggregation queries. Which Structured Streaming feature defines how long late data can be processed?

A. Cache

B. Partition

C. Watermark

D. Checkpoint

Discussion 0

Correct Answer: C Vote an answer

Question 2

A data engineer is onboarding a new bronze ingestion pipeline in Databricks with Unity Catalog.
The team wants Databricks to handle storage layout, apply platform optimizations over time, and simplify lifecycle management so that when a table is dropped, its underlying data is also cleaned up according to Databricks-managed retention policies.
Which table type should the data engineer create for these ingestion tables?

A. Temporary views over files to avoid table-level governance and lifecycle coupling.

B. External tables with a LOCATION pointing to an external volume for full control of file layout.

C. Managed tables so that Unity Catalog manages both metadata and underlying data lifecycle.

D. Foreign tables federated from an external catalog to delegate optimization to the source system.

Discussion 0

Correct Answer: C Vote an answer

Explanation: Only visible for Fast2test members. You can sign-up / login (it's free).

Question 3

A Delta Live Table pipeline includes two datasets defined using STREAMING LIVE TABLE.
Three datasets are defined against Delta Lake table sources using LIVE TABLE.
The table is configured to run in Development mode using the Continuous Pipeline Mode.
Assuming previously unprocessed data exists and all definitions are valid, what is the expected outcome after clicking Start to update the pipeline?

A. All datasets will be updated once and the pipeline will shut down. The compute resources will be terminated.

B. All datasets will be updated once and the pipeline will persist without any processing. The compute resources will persist but go unused.

C. All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist to allow for additional testing.

D. All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist until the pipeline is shut down.

E. All datasets will be updated once and the pipeline will shut down. The compute resources will persist to allow for additional testing.

Discussion 0

Correct Answer: C Vote an answer

Question 4

A data engineer only wants to execute the final block of a Python program if the Python variable day_of_week is equal to 1 and the Python variable review_period is True.
Which of the following control flow statements should the data engineer use to begin this conditionally executed code block?

A. if day_of_week = 1 and review_period:

B. if day_of_week = 1 & review_period: = "True":

C. if day_of_week == 1 and review_period:

D. if day_of_week = 1 and review_period = "True":

E. if day_of_week == 1 and review_period == "True":

Discussion 0

Correct Answer: C Vote an answer

Question 5

A data engineer has created a new database using the following command:
CREATE DATABASE IF NOT EXISTS customer360;
In which of the following locations will the customer360 database be located?

A. More information is needed to determine the correct response

B. dbfs:/user/hive/database/customer360

C. dbfs:/user/hive/warehouse

D. dbfs:/user/hive/database

E. dbfs:/user/hive/customer360

Discussion 0

Correct Answer: C Vote an answer

Question 6

A data engineer has a single-task Job that runs each morning before they begin working. After identifying an upstream data issue, they need to set up another task to run a new notebook prior to the original task.
Which of the following approaches can the data engineer use to set up the new task?

A. They can create a new task in the existing Job and then add it as a dependency of the original task.

B. They can clone the existing task to a new Job and then edit it to run the new notebook.

C. They can create a new job from scratch and add both tasks to run concurrently.

D. They can clone the existing task in the existing Job and update it to run the new notebook.

E. They can create a new task in the existing Job and then add the original task as a dependency of the new task.

Discussion 0

Correct Answer: A Vote an answer

Question 7

A data engineer needs to process SQL queries on a large dataset with fluctuating workloads. The workload requires automatic scaling based on the volume of queries, without the need to manage or provision infrastructure. The solution should be cost-efficient and charge only for the compute resources used during query execution. Which compute option should the data engineer use?

A. Databricks Jobs

B. Serverless SQL Warehouse

C. Databricks SQL Analytics

D. Databricks Runtime for ML

Discussion 0

Correct Answer: B Vote an answer

Explanation: Only visible for Fast2test members. You can sign-up / login (it's free).

Question 8

A data engineer is reading a very large dataset that contains skewed key values. Some partitions contain significantly more records than others, causing uneven workload distribution across executors. Which Spark issue does this scenario describe?

A. Data skew

B. Schema evolution

C. Broadcast failure

D. Lazy evaluation

Discussion 0

Correct Answer: A Vote an answer

Question 9

A data engineer is building a streaming pipeline that reads JSON files arriving in a cloud storage location. The pipeline must automatically process new files as they arrive without manual intervention. Which Databricks feature is designed for this use case?

A. Structured Streaming Auto Loader

B. Broadcast Join

C. Spark Cache

D. Delta Time Travel

Discussion 0

Correct Answer: A Vote an answer

Question 10

A data engineer is building an ETL pipeline in Databricks that loads raw JSON files from cloud object storage into a Delta Lake table. The pipeline must ensure ACID transactions and schema enforcement while supporting scalable reads and writes from multiple concurrent jobs. Which storage format should the engineer use?

A. Avro

B. CSV

C. Parquet

D. Delta Lake

Discussion 0

Correct Answer: D Vote an answer

Question 11

A data engineer is developing an ETL process based on Spark SQL. The execution fails. The data engineer checks the Spark UI and can see the ERRORS as follows:
"java.lang.OutofMemoryError: Java heap space"
Which two corrective actions should the data engineer perform to resolve this issue? (Choose two.)

A. Upsize the worker nodes and activate autoshuffle partitions

B. Cache the dataset in order to boost the query performance

C. Upsize the driver node and deactivate autoshuffle partitions

D. Narrow the filters in order to collect less data in the query

E. Fix the shuffle partitions to 50 to ensure the allocation

Discussion 0

Correct Answer: A,D Vote an answer

Explanation: Only visible for Fast2test members. You can sign-up / login (it's free).

Question 12

A data engineer needs to create a table in Databricks using data from their organization's existing SQLite database.
They run the following command:

Which of the following lines of code fills in the above blank to successfully complete the task?

A. org.apache.spark.sql.jdbc

B. sqlite

C. autoloader

D. DELTA

E. org.apache.spark.sql.sqlite

Discussion 0

Correct Answer: A Vote an answer

Question 13

A data engineer has left the organization. The data team needs to transfer ownership of the data engineer's Delta tables to a new data engineer. The new data engineer is the lead engineer on the data team.
Assuming the original data engineer no longer has access, which of the following individuals must be the one to transfer ownership of the Delta tables in Data Explorer?

A. Workspace administrator

B. Databricks account representative

C. New lead data engineer

D. Original data engineer

E. This transfer is not possible

Discussion 0

Correct Answer: A Vote an answer

Question 14

A data engineer is attempting to drop a Spark SQL table my_table and runs the following command:
DROP TABLE IF EXISTS my_table;
After running this command, the engineer notices that the data files and metadata files have been deleted from the file system.
Which of the following describes why all of these files were deleted?

A. The table's data was larger than 10 GB

B. The table was external

C. The table's data was smaller than 10 GB

D. The table was managed

E. The table did not have a location

Discussion 0

Correct Answer: D Vote an answer

Databricks Certified Data Engineer Associate - Databricks-Certified-Data-Engineer-Associate Exam Practice Test

Contact Us

Useful Links

Latest Updated