Image

Easy 🔗3475. DNA Pattern Recognition

📝문제 요약

Table: Samples

+----------------+---------+
| Column Name    | Type    |
+----------------+---------+
| sample_id      | int     |
| dna_sequence   | varchar |
| species        | varchar |
+----------------+---------+
sample_id is the unique key for this table.
Each row contains a DNA sequence represented as a string of characters (A, T, G, C) and the species it was collected from.

Biologists are studying basic patterns in DNA sequences. Write a solution to identify sample_id with the following patterns:

  • Sequences that start with ATG (a common start codon)
  • Sequences that end with either TAATAG, or TGA (stop codons)
  • Sequences containing the motif ATAT (a simple repeated pattern)
  • Sequences that have at least 3 consecutive G (like GGG or GGGG)

Return the result table ordered by sample_id in ascending order.

The result format is in the following example.

Example:

Input:

Samples table:

+-----------+------------------+-----------+
| sample_id | dna_sequence     | species   |
+-----------+------------------+-----------+
| 1         | ATGCTAGCTAGCTAA  | Human     |
| 2         | GGGTCAATCATC     | Human     |
| 3         | ATATATCGTAGCTA   | Human     |
| 4         | ATGGGGTCATCATAA  | Mouse     |
| 5         | TCAGTCAGTCAG     | Mouse     |
| 6         | ATATCGCGCTAG     | Zebrafish |
| 7         | CGTATGCGTCGTA    | Zebrafish |
+-----------+------------------+-----------+

Output:

+-----------+------------------+-------------+-------------+------------+------------+------------+
| sample_id | dna_sequence     | species     | has_start   | has_stop   | has_atat   | has_ggg    |
+-----------+------------------+-------------+-------------+------------+------------+------------+
| 1         | ATGCTAGCTAGCTAA  | Human       | 1           | 1          | 0          | 0          |
| 2         | GGGTCAATCATC     | Human       | 0           | 0          | 0          | 1          |
| 3         | ATATATCGTAGCTA   | Human       | 0           | 0          | 1          | 0          |
| 4         | ATGGGGTCATCATAA  | Mouse       | 1           | 1          | 0          | 1          |
| 5         | TCAGTCAGTCAG     | Mouse       | 0           | 0          | 0          | 0          |
| 6         | ATATCGCGCTAG     | Zebrafish   | 0           | 1          | 1          | 0          |
| 7         | CGTATGCGTCGTA    | Zebrafish   | 0           | 0          | 0          | 0          |
+-----------+------------------+-------------+-------------+------------+------------+------------+

Explanation:

  • Sample 1 (ATGCTAGCTAGCTAA):
    • Starts with ATG (has_start = 1)
    • Ends with TAA (has_stop = 1)
    • Does not contain ATAT (has_atat = 0)
    • Does not contain at least 3 consecutive ‘G’s (has_ggg = 0)
  • Sample 2 (GGGTCAATCATC):
    • Does not start with ATG (has_start = 0)
    • Does not end with TAA, TAG, or TGA (has_stop = 0)
    • Does not contain ATAT (has_atat = 0)
    • Contains GGG (has_ggg = 1)
  • Sample 3 (ATATATCGTAGCTA):
    • Does not start with ATG (has_start = 0)
    • Does not end with TAA, TAG, or TGA (has_stop = 0)
    • Contains ATAT (has_atat = 1)
    • Does not contain at least 3 consecutive ‘G’s (has_ggg = 0)
  • Sample 4 (ATGGGGTCATCATAA):
    • Starts with ATG (has_start = 1)
    • Ends with TAA (has_stop = 1)
    • Does not contain ATAT (has_atat = 0)
    • Contains GGGG (has_ggg = 1)
  • Sample 5 (TCAGTCAGTCAG):
    • Does not match any patterns (all fields = 0)
  • Sample 6 (ATATCGCGCTAG):
    • Does not start with ATG (has_start = 0)
    • Ends with TAG (has_stop = 1)
    • Starts with ATAT (has_atat = 1)
    • Does not contain at least 3 consecutive ‘G’s (has_ggg = 0)
  • Sample 7 (CGTATGCGTCGTA):
    • Does not start with ATG (has_start = 0)
    • Does not end with TAA, “TAG”, or “TGA” (has_stop = 0)
    • Does not contain ATAT (has_atat = 0)
    • Does not contain at least 3 consecutive ‘G’s (has_ggg = 0)

Note:

  • The result is ordered by sample_id in ascending order
  • For each pattern, 1 indicates the pattern is present and 0 indicates it is not present


✏️문제 풀이

SELECT

SELECT sample_id, dna_sequence, species,
    CASE WHEN dna_sequence LIKE 'ATG%' THEN 1 ELSE 0 END AS has_start,
    CASE WHEN dna_sequence REGEXP 'TAA$|TAG$|TGA$' THEN 1 ELSE 0 END AS has_stop,
    CASE WHEN dna_sequence LIKE '%ATAT%' THEN 1 ELSE 0 END AS has_atat,
    CASE WHEN dna_sequence REGEXP 'GGG+' THEN 1 ELSE 0 END AS has_ggg
  • sample_id : 샘플 ID
  • dna_sequence : DNA 시퀀스
  • species : 종
  • has_start
    • dna_sequence LIKE 'ATG%' THEN 1 : dna_sequence가 ATG로 시작하는 경우는 1
    • ELSE 0 : 아닌 경우에는 0
  • has_stop
    • dna_sequence REGEXP 'TAA$|TAG$|TGA$' THEN 1 : 정규 표현식을 사용하여 dna_sequence가 TAA, TAG 또는 TGA로 끝나는 경우는 1
    • ELSE 0 : 아닌 경우에는 0
  • has_atat
    • dna_sequence LIKE '%ATAT%' THEN 1 : dna_sequence에 ATAT가 포함되는 경우는 1
    • ELSE 0 : 아닌 경우에는 0
  • has_ggg
    • dna_sequence REGEXP 'GGG+' THEN 1 : 정규 표현식을 사용하여 dna_sequence가 연속적으로 3개 이상의 G를 갖는 경우에는 1
    • ELSE 0 : 아닌 경우에는 0

FROM

FROM Samples
  • Samples 테이블의 데이터를 가져옴

ORDER BY

ORDER BY sample_id
  • sample_id를 기준으로 오름차순 정렬


💯제출 코드

SELECT sample_id, dna_sequence, species,
    CASE WHEN dna_sequence LIKE 'ATG%' THEN 1 ELSE 0 END AS has_start,
    CASE WHEN dna_sequence REGEXP 'TAA$|TAG$|TGA$' THEN 1 ELSE 0 END AS has_stop,
    CASE WHEN dna_sequence LIKE '%ATAT%' THEN 1 ELSE 0 END AS has_atat,
    CASE WHEN dna_sequence REGEXP 'GGG+' THEN 1 ELSE 0 END AS has_ggg
FROM Samples
ORDER BY sample_id

댓글남기기