DEA-C01: AWS Certified Data Engineer - Associate

To advance the offset of a stream to the current table version without consuming the change data in a DML operation, which of the following operations can be done by Data Engineer? [Select 2]

using the CREATE OR REPLACE STREAM syntax, Recreate the STREAM
Insert the current change data into a temporary table. In the INSERT statement, query the stream but include a WHERE clause that filters out all of the change data (e.g. WHERE 0 = 1).
A stream advances the offset only when it is used in a DML transaction, so none of the options works without consuming the change data of table.
Delete the offset using STREAM properties SYSTEM$RESET_OFFSET( <stream_id> )

Correct answer: AB

Explanation:

When created, a stream logically takes an initial snapshot of every row in the source object (e.g. table, external table, or the underlying tables for a view) by initializing a point in time (called an offset) as the current transactional version of the object. The change tracking system utilized by the stream then records information about the DML changes after this snapshot was taken. Change records provide the state of a row before and after the change. Change information mirrors the column structure of the tracked source object and includes additional metadata columns that describe each change event. Note that a stream itself does not contain any table data. A stream only stores an offset for the source object and returns CDC records by leveraging the versioning history for the source object. A new table version is created whenever a transaction that includes one or more DML statements is committed to the table. In the transaction history for a table, a stream offset is located between two table versions. Querying a stream returns the changes caused by transactions committed after the offset and at or before the current time. Multiple queries can independently consume the same change data from a stream without changing the offset. A stream advances the offset only when it is used in a DML transaction. This behavior applies to both explicit and autocommit transactions. (By default, when a DML statement is execut-ed, an autocommit transaction is implicitly started and the transaction is committed at the completion of the statement. This behavior is controlled with the AUTOCOMMIT parameter.) Querying a stream alone does not advance its offset, even within an explicit transaction; the stream contents must be consumed in a DML statement. To advance the offset of a stream to the current table version without consuming the change data in a DML operation, complete either of the following actions:· Recreate the stream (using the CREATE OR REPLACE STREAM syntax). Insert the current change data into a temporary table. In the INSERT statement, query the stream but include a WHERE clause that filters out all of the change data (e.g. WHERE 0 = 1).

When created, a stream logically takes an initial snapshot of every row in the source object (e.g. table, external table, or the underlying tables for a view) by initializing a point in time (called an offset) as the current transactional version of the object. The change tracking system utilized by the stream then records information about the DML changes after this snapshot was taken. Change records provide the state of a row before and after the change. Change information mirrors the column structure of the tracked source object and includes additional metadata columns that describe each change event.

Note that a stream itself does not contain any table data. A stream only stores an offset for the source object and returns CDC records by leveraging the versioning history for the source object.

A new table version is created whenever a transaction that includes one or more DML statements is committed to the table.

In the transaction history for a table, a stream offset is located between two table versions. Querying a stream returns the changes caused by transactions committed after the offset and at or before the current time.

Multiple queries can independently consume the same change data from a stream without changing the offset. A stream advances the offset only when it is used in a DML transaction. This behavior applies to both explicit and autocommit transactions. (By default, when a DML statement is execut-ed, an autocommit transaction is implicitly started and the transaction is committed at the completion of the statement. This behavior is controlled with the AUTOCOMMIT parameter.) Querying a stream alone does not advance its offset, even within an explicit transaction; the stream contents must be consumed in a DML statement.

To advance the offset of a stream to the current table version without consuming the change data in a DML operation, complete either of the following actions:

· Recreate the stream (using the CREATE OR REPLACE STREAM syntax).

Insert the current change data into a temporary table. In the INSERT statement, query the stream but include a WHERE clause that filters out all of the change data (e.g. WHERE 0 = 1).

Data Engineer is performing below steps in sequence while working on Stream s1 created on table t1.

Step 1: Begin transaction.

Step 2: Query stream s1 on table t1.

Step 3: Update rows in table t1.

Step 4: Query stream s1.

Step 5: Commit transaction.

Step 6: Begin transaction.

Step 7: Query stream s1.

Mark the Incorrect Operational statements:

For Step 2, The stream returns the change data capture records between the current position to the Transaction 1 start time. If the stream is used in a DML statement, the stream is then locked to avoid changes by concurrent transactions.
For Step 4, Returns the CDC data records by streams with updated rows happened in the Step 3 because Streams works in Repeated committed mode in which statements see any changes made by previous statements executed within the same transaction, even though those changes are not yet committed.
For Step 5, If the stream was consumed in DML statements within the transaction, the stream position advances to the transaction start time.
For Step 7, Results do include table changes committed by Transaction 1.
if Transaction 2 had begun before Transaction 1 was committed, queries to the stream would have returned a snapshot of the stream from the position of the stream to the be-ginning time of Transaction 2 and would not see any changes committed by Transac-tion 1.

Correct answer: B

Explanation:

Streams support repeatable read isolation. In repeatable read mode, multiple SQL statements within a transaction see the same set of records in a stream. This differs from the read committed mode supported for tables, in which statements see any changes made by previous statements executed within the same transaction, even though those changes are not yet committed. The delta records returned by streams in a transaction is the range from the current position of the stream until the transaction start time. The stream position advances to the transaction start time if the transaction commits; otherwise, it stays at the same position. Within Transaction 1, all queries to stream s1 see the same set of records. DML changes to table t1 are recorded to the stream only when the transaction is committed. In Transaction 2, queries to the stream see the changes recorded to the table in Transaction 1. Note that if Transaction 2 had begun before Transaction 1 was committed, queries to the stream would have returned a snapshot of the stream from the position of the stream to the beginning time of Transaction 2 and would not see any changes committed by Transaction 1.

Streams support repeatable read isolation. In repeatable read mode, multiple SQL statements within a transaction see the same set of records in a stream. This differs from the read committed mode supported for tables, in which statements see any changes made by previous statements executed within the same transaction, even though those changes are not yet committed.

The delta records returned by streams in a transaction is the range from the current position of the stream until the transaction start time. The stream position advances to the transaction start time if the transaction commits; otherwise, it stays at the same position.

Within Transaction 1, all queries to stream s1 see the same set of records. DML changes to table t1 are recorded to the stream only when the transaction is committed.

In Transaction 2, queries to the stream see the changes recorded to the table in Transaction 1. Note that if Transaction 2 had begun before Transaction 1 was committed, queries to the stream would have returned a snapshot of the stream from the position of the stream to the beginning time of Transaction 2 and would not see any changes committed by Transaction 1.

Streams record the differences between two offsets. If a row is added and then updated in the current offset, what will be the value of METADATA^^SUPDATE Columns in this scenario?

TRUE
FALSE
UPDATE
INSERT

Correct answer: B

Explanation:

Stream Columns A stream stores an offset for the source object and not any actual table columns or data. When queried, a stream accesses and returns the historic data in the same shape as the source object (i.e. the same column names and ordering) with the following additional columns:METADATA$ACTION Indicates the DML operation (INSERT, DELETE) recorded. METADATA^^SUPDATE Indicates whether the operation was part of an UPDATE statement. Updates to rows in the source object are represented as a pair of DELETE and INSERT records in the stream with a metadata column METADATA^^SUPDATE values set to TRUE. METADATA$ROW_ID Specifies the unique and immutable ID for the row, which can be used to track changes to specific rows over time. Note that streams record the differences between two offsets. If a row is added and then updated in the current offset, the delta change is a new row. The METADATA^^SUPDATE row records a FALSE value.

Stream Columns

A stream stores an offset for the source object and not any actual table columns or data. When queried, a stream accesses and returns the historic data in the same shape as the source object (i.e. the same column names and ordering) with the following additional columns:

METADATA$ACTION Indicates the DML operation (INSERT, DELETE) recorded.

METADATA^^SUPDATE Indicates whether the operation was part of an UPDATE statement. Updates to rows in the source object are represented as a pair of DELETE and INSERT records in the stream with a metadata column METADATA^^SUPDATE values set to TRUE.

METADATA$ROW_ID Specifies the unique and immutable ID for the row, which can be used to track changes to specific rows over time.

Note that streams record the differences between two offsets. If a row is added and then updated in the current offset, the delta change is a new row. The METADATA^^SUPDATE row records a FALSE value.

Mark the Incorrect Statements with respect to types of streams supported by Snowflake?

Standard streams cannot retrieve update data for geospatial data.
An append-only stream returns the appended rows only and therefore can be much more performant than a standard stream for extract, load, transform (ELT).
Insert-only Stream supported on external tables only.
An insert-only stream tracks row inserts & Delete ops only

Correct answer: D

Explanation:

Standard Stream:Supported for streams on tables, directory tables, or views. A standard (i.e. delta) stream tracks all DML changes to the source object, including inserts, updates, and deletes (including table truncates). This stream type performs a join on inserted and deleted rows in the change set to provide the row level delta. As a net effect, for example, a row that is inserted and then deleted between two transactional points of time in a table is removed in the delta (i.e. is not returned when the stream is queried). Append-only Stream:Supported for streams on standard tables, directory tables, or views. An append-only stream tracks row inserts only. Update and delete operations (including table truncates) are not recorded. For example, if 10 rows are inserted into a table and then 5 of those rows are deleted before the offset for an append-only stream is advanced, the stream records 10 rows. An append-only stream returns the appended rows only and therefore can be much more performant than a standard stream for extract, load, transform (ELT) and similar scenarios that depend exclu-sively on row inserts. For example, a source table can be truncated immediately after the rows in an append-only stream are consumed, and the record deletions do not contribute to the overhead the next time the stream is queried or consumed. Insert-only Stream:Supported for streams on external tables only. An insert-only stream tracks row inserts only; they do not record delete operations that remove rows from an inserted set (i.e. no-ops). For example, inbetween any two offsets, if File1 is removed from the cloud storage location referenced by the external table, and File2 is added, the stream returns records for the rows in File2 only. Unlike when tracking CDC data for standard tables, Snowflake cannot access the historical records for files in cloud storage.

Standard Stream:

Supported for streams on tables, directory tables, or views. A standard (i.e. delta) stream tracks all DML changes to the source object, including inserts, updates, and deletes (including table truncates).

This stream type performs a join on inserted and deleted rows in the change set to provide the row level delta. As a net effect, for example, a row that is inserted and then deleted between two transactional points of time in a table is removed in the delta (i.e. is not returned when the stream is queried).

Append-only Stream:

Supported for streams on standard tables, directory tables, or views. An append-only stream tracks row inserts only. Update and delete operations (including table truncates) are not recorded. For example, if 10 rows are inserted into a table and then 5 of those rows are deleted before the offset for an append-only stream is advanced, the stream records 10 rows.

An append-only stream returns the appended rows only and therefore can be much more performant than a standard stream for extract, load, transform (ELT) and similar scenarios that depend exclu-sively on row inserts. For example, a source table can be truncated immediately after the rows in an append-only stream are consumed, and the record deletions do not contribute to the overhead the next time the stream is queried or consumed.

Insert-only Stream:

Supported for streams on external tables only. An insert-only stream tracks row inserts only; they do not record delete operations that remove rows from an inserted set (i.e. no-ops). For example, inbetween any two offsets, if File1 is removed from the cloud storage location referenced by the external table, and File2 is added, the stream returns records for the rows in File2 only. Unlike when tracking CDC data for standard tables, Snowflake cannot access the historical records for files in cloud storage.

Stuart, a Lead Data Engineer in MACRO Data Company created streams on set of External tables. He has been asked to extend the data retention period of the stream for 90 days, which parameter he can utilize to enable this extension?

MAX_DATA_EXTENSION_TIME_IN_DAYS
DATA_RETENTION_TIME_IN_DAYS
DATA_EXTENSION_TIME_IN_DAYS
None of the above

Correct answer: D

Explanation:

External tables do not have data retention period applicable. Good to Understand other Options available. DATA_RETENTION_TIME_IN_DAYS Type: Object (for databases, schemas, and tables) — Can be set for Account » Database » Schema » TableDescription: Number of days for which Snowflake retains historical data for performing Time Trav-el actions (SELECT, CLONE, UNDROP) on the object. A value of 0 effectively disables Time Travel for the specified database, schema, or table. Values:0 or 1 (for Standard Edition) 0 to 90 (for Enterprise Edition or higher) Default:1 MAX_DATA_EXTENSION_TIME_IN_DAYS Type: Object (for databases, schemas, and tables) — Can be set for Account » Database » Schema » Table Description: Maximum number of days for which Snowflake can extend the data retention period for tables to prevent streams on the tables from becoming stale. By default, if the DATA_ RETENTION_TIME_IN_DAYS setting for a source table is less than 14 days, and a stream has not been consumed, Snowflake temporarily extends this period to the stream's offset, up to a maximum of 14 days, regardless of the Snowflake Edition for your account. The MAX_DATA_EXTENSION_TIME_IN_DAYS parameter enables you to limit this automatic ex-tension period to control storage costs for data retention or for compliance reasons. This parameter can be set at the account, database, schema, and table levels. Note that setting the parameter at the account or schema level only affects tables for which the parameter has not already been explicitly set at a lower level (e.g. at the table level by the table owner). A value of 0 effective-ly disables the automatic extension for the specified database, schema, or table. Values:0 to 90 (i.e. 90 days) — a value of 0 disables the automatic extension of the data retention period. To increase the maximum value for tables in your account, Client needs to contact Snowflake Sup-port. Default: 14

External tables do not have data retention period applicable.

Good to Understand other Options available.

DATA_RETENTION_TIME_IN_DAYS Type: Object (for databases, schemas, and tables) — Can be set for Account » Database » Schema » Table

Description: Number of days for which Snowflake retains historical data for performing Time Trav-el actions (SELECT, CLONE, UNDROP) on the object. A value of 0 effectively disables Time Travel for the specified database, schema, or table.

Values:

0 or 1 (for Standard Edition)

0 to 90 (for Enterprise Edition or higher)

Default:

1

MAX_DATA_EXTENSION_TIME_IN_DAYS

Type: Object (for databases, schemas, and tables) — Can be set for Account » Database » Schema » Table

Description: Maximum number of days for which Snowflake can extend the data retention period for tables to prevent streams on the tables from becoming stale. By default, if the DATA_ RETENTION_TIME_IN_DAYS setting for a source table is less than 14 days, and a stream has not been consumed, Snowflake temporarily extends this period to the stream's offset, up to a maximum of 14 days, regardless of the Snowflake Edition for your account. The MAX_DATA_EXTENSION_TIME_IN_DAYS parameter enables you to limit this automatic ex-tension period to control storage costs for data retention or for compliance reasons.

This parameter can be set at the account, database, schema, and table levels. Note that setting the parameter at the account or schema level only affects tables for which the parameter has not already been explicitly set at a lower level (e.g. at the table level by the table owner). A value of 0 effective-ly disables the automatic extension for the specified database, schema, or table.

Values:

0 to 90 (i.e. 90 days) — a value of 0 disables the automatic extension of the data retention period. To increase the maximum value for tables in your account, Client needs to contact Snowflake Sup-port.

Default: 14

Ron, Snowflake Developer needs to capture change data (insert only) on the source views, for that he follows the below steps:

Enable change tracking on the source views & its underlying tables.

Inserted the data via Scripts scheduled with the help of Tasks.

then simply run the below Select statements.

A. select *

B. from test_table

C. changes(information => append_only)

D. at(timestamp => (select current_timestamp()));

Select the Correct Query Execution Output option below:

Developer missed to create stream on the source table which can further query to cap-ture DML records.
Select query will fail with error: 'SQL compilation error-Incorrect Keyword "Chang-es()" found'
No Error reported, select command gives Changed records with Metadata columns as change tracking enabled on the Source views & its underlying tables.
Select statement complied but gives erroneous results.

Correct answer: C

Explanation:

As an alternative to streams, Snowflake supports querying change tracking metadata for tables or views using the CHANGES clause for SELECT statements. The CHANGES clause enables query-ing change tracking metadata between two points in time without having to create a stream with an explicit transactional offset. To Know more about Snowflake CHANGES clause, please refer the mentioned link:https://docs.snowflake.com/en/sql-reference/constructs/changes

As an alternative to streams, Snowflake supports querying change tracking metadata for tables or views using the CHANGES clause for SELECT statements. The CHANGES clause enables query-ing change tracking metadata between two points in time without having to create a stream with an explicit transactional offset.

To Know more about Snowflake CHANGES clause, please refer the mentioned link:

https://docs.snowflake.com/en/sql-reference/constructs/changes

Which column provides information when the stream became stale or may become stale if not consumed?

STREAM_STALE_PERIOD
STALE_PERIOD_AFTER
STALE_STREAM_PERIOD
STALE_AFTER

Correct answer: D

Explanation:

execute SHOW STREAMS command. Column Name: STALE_AFTERTimestamp when the stream became stale or may become stale if not consumed. The value is calculated by adding the retention period for the source table (i.e. the larger of the DATA_ RETENTION_TIME_IN_DAYS or MAX_DATA_EXTENSION_TIME_IN_DAYS parame-ter setting) to the last time the stream was read.

execute SHOW STREAMS command.

Column Name: STALE_AFTER

Timestamp when the stream became stale or may become stale if not consumed. The value is calculated by adding the retention period for the source table (i.e. the larger of the DATA_ RETENTION_TIME_IN_DAYS or MAX_DATA_EXTENSION_TIME_IN_DAYS parame-ter setting) to the last time the stream was read.

When created, a stream logically takes an initial snapshot of every row in the source object and the contents of a stream change as DML statements execute on the source table.

A Data Engineer, Sophie Created a view that queries the table and returns the CURRENT_USER and CURRENT_TIMESTAMP values for the query transaction. A Stream has been created on views to capture CDC.

Tony, another user inserted the data e.g.

insert into <table> values (1),(2),(3); Emily, another user also inserted the data e.g.

insert into <table> values (4),(5),(6); What will happened when Different user queries the same stream after 1 hour?

All the 6 records would be shown with METADATA$ACTION as 'INSERT' out of which 3 records would be displayed with username 'Tony' & rest 3 records would be displayed with username 'Emily'.
All the Six Records would be displayed with CURRENT_USER & CUR-RENT_TIMESTAMP while querying Streams.
All the Six records would be displayed with User 'Sohpie' Who is the owner of the View.
User would be displayed with the one who queried during the session, but Recorded timestamp would be of past 1 hour i.e. actual records insertion time.

Correct answer: B

Explanation:

When User queries the stream, the stream returns the username for the user. The stream also returns the current timestamp for the query transaction in each row, NOT the timestamp when each row was inserted.

Which Function would Data engineer used to recursively resume all tasks in Chain of Tasks rather than resuming each task individually (using ALTER TASK … RESUME)?

SYSTEMASK_DEPENDENTS
SYSTEMASK_DEPENDENTS_ENABLE
SYSTEMASK_DEPENDENTS_RESUME
SYSTEMASK_RECURSIVE_ENABLE

Correct answer: B

Explanation:

To recursively resume all tasks in a DAG(A Directed Acyclic Graph (DAG) is a series of tasks com-posed of a single root task and additional tasks, organized by their dependencies.), query the SYSTEM$ TASK_DEPENDENTS_ENABLE function rather than resuming each task individually (us-ing ALTER TASK … RESUME).

Steven created the task, what additional privileges required by Steven on the task so that he can suspend or resume the tasks?

Steven is already owner of the task; he can execute the task & suspend/resume the task without any additional privileges.
In addition to the task owner, a Steven Role must have OPERATE privilege on the task so that he can suspend or resume the task.
Steven must have SUSPEND privilege on the task so that he can suspend or resume the task.
Steven needs to have Global Managed RESUME privilege by TASK administrator.

Correct answer: B

Explanation:

In addition to the task ownership privilege, a role that has the OPERATE privilege on the task can suspend or resume the task.

Vendor:	Amazon
Exam Code:	DEA-C01
Exam Name:	AWS Certified Data Engineer - Associate
Date:	Jul 14, 2023
File Size:	49 KB
Downloads:	4

Download AWS Certified Data Engineer - Associate.DEA-C01.CertDumps.2023-07-14.37q.vcex

How to open VCEX files?

Demo Questions

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Question 9

Question 10

ProfExam at a 20% markdown