Databricks Certified Data Engineer Associate - Databricks-Certified-Data-Engineer-Associate Exam Practice Test
A data engineer is designing a streaming pipeline and wants to limit how long Spark maintains state information for aggregation queries. Which Structured Streaming feature defines how long late data can be processed?
Correct Answer: C
Vote an answer
A data engineer is onboarding a new bronze ingestion pipeline in Databricks with Unity Catalog.
The team wants Databricks to handle storage layout, apply platform optimizations over time, and simplify lifecycle management so that when a table is dropped, its underlying data is also cleaned up according to Databricks-managed retention policies.
Which table type should the data engineer create for these ingestion tables?
The team wants Databricks to handle storage layout, apply platform optimizations over time, and simplify lifecycle management so that when a table is dropped, its underlying data is also cleaned up according to Databricks-managed retention policies.
Which table type should the data engineer create for these ingestion tables?
Correct Answer: C
Vote an answer
Explanation: Only visible for Fast2test members. You can sign-up / login (it's free).
A Delta Live Table pipeline includes two datasets defined using STREAMING LIVE TABLE.
Three datasets are defined against Delta Lake table sources using LIVE TABLE.
The table is configured to run in Development mode using the Continuous Pipeline Mode.
Assuming previously unprocessed data exists and all definitions are valid, what is the expected outcome after clicking Start to update the pipeline?
Three datasets are defined against Delta Lake table sources using LIVE TABLE.
The table is configured to run in Development mode using the Continuous Pipeline Mode.
Assuming previously unprocessed data exists and all definitions are valid, what is the expected outcome after clicking Start to update the pipeline?
Correct Answer: C
Vote an answer
A data engineer only wants to execute the final block of a Python program if the Python variable day_of_week is equal to 1 and the Python variable review_period is True.
Which of the following control flow statements should the data engineer use to begin this conditionally executed code block?
Which of the following control flow statements should the data engineer use to begin this conditionally executed code block?
Correct Answer: C
Vote an answer
A data engineer has created a new database using the following command:
CREATE DATABASE IF NOT EXISTS customer360;
In which of the following locations will the customer360 database be located?
CREATE DATABASE IF NOT EXISTS customer360;
In which of the following locations will the customer360 database be located?
Correct Answer: C
Vote an answer
A data engineer has a single-task Job that runs each morning before they begin working. After identifying an upstream data issue, they need to set up another task to run a new notebook prior to the original task.
Which of the following approaches can the data engineer use to set up the new task?
Which of the following approaches can the data engineer use to set up the new task?
Correct Answer: A
Vote an answer
A data engineer needs to process SQL queries on a large dataset with fluctuating workloads. The workload requires automatic scaling based on the volume of queries, without the need to manage or provision infrastructure. The solution should be cost-efficient and charge only for the compute resources used during query execution. Which compute option should the data engineer use?
Correct Answer: B
Vote an answer
Explanation: Only visible for Fast2test members. You can sign-up / login (it's free).
A data engineer is reading a very large dataset that contains skewed key values. Some partitions contain significantly more records than others, causing uneven workload distribution across executors. Which Spark issue does this scenario describe?
Correct Answer: A
Vote an answer
A data engineer is building a streaming pipeline that reads JSON files arriving in a cloud storage location. The pipeline must automatically process new files as they arrive without manual intervention. Which Databricks feature is designed for this use case?
Correct Answer: A
Vote an answer
A data engineer is building an ETL pipeline in Databricks that loads raw JSON files from cloud object storage into a Delta Lake table. The pipeline must ensure ACID transactions and schema enforcement while supporting scalable reads and writes from multiple concurrent jobs. Which storage format should the engineer use?
Correct Answer: D
Vote an answer
A data engineer is developing an ETL process based on Spark SQL. The execution fails. The data engineer checks the Spark UI and can see the ERRORS as follows:
"java.lang.OutofMemoryError: Java heap space"
Which two corrective actions should the data engineer perform to resolve this issue? (Choose two.)
"java.lang.OutofMemoryError: Java heap space"
Which two corrective actions should the data engineer perform to resolve this issue? (Choose two.)
Correct Answer: A,D
Vote an answer
Explanation: Only visible for Fast2test members. You can sign-up / login (it's free).
A data engineer needs to create a table in Databricks using data from their organization's existing SQLite database.
They run the following command:

Which of the following lines of code fills in the above blank to successfully complete the task?
They run the following command:

Which of the following lines of code fills in the above blank to successfully complete the task?
Correct Answer: A
Vote an answer
A data engineer has left the organization. The data team needs to transfer ownership of the data engineer's Delta tables to a new data engineer. The new data engineer is the lead engineer on the data team.
Assuming the original data engineer no longer has access, which of the following individuals must be the one to transfer ownership of the Delta tables in Data Explorer?
Assuming the original data engineer no longer has access, which of the following individuals must be the one to transfer ownership of the Delta tables in Data Explorer?
Correct Answer: A
Vote an answer
A data engineer is attempting to drop a Spark SQL table my_table and runs the following command:
DROP TABLE IF EXISTS my_table;
After running this command, the engineer notices that the data files and metadata files have been deleted from the file system.
Which of the following describes why all of these files were deleted?
DROP TABLE IF EXISTS my_table;
After running this command, the engineer notices that the data files and metadata files have been deleted from the file system.
Which of the following describes why all of these files were deleted?
Correct Answer: D
Vote an answer