https://medium.com/@hugmanskj/transformerμ-ν°-κ·Έλ¦Ό-μ΄ν΄-κΈ°μ μ -볡μ‘ν¨-μμ΄-ν΅μ¬-μμ΄λμ΄-νμ νκΈ°-5e182a40459d
https://www.blossominkyung.com/deeplearning/transfomer-positional-encoding
https://www.blossominkyung.com/deeplearning/transformer-mha
https://www.blossominkyung.com/deeplearning/transfomer-last
https://medium.com/@mansoorsyed05/understanding-transformers-architecture-c571044a1c21
Attention is All you Need : https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
self-attention, positional encoding, Mulit-Head Attention, Scaled dot-product Attention, Cross Attention
κΈ°μ‘΄μ RNN λͺ¨λΈ: μμ°¨μ μ²λ¦¬λ‘ μΈν λ³λ ¬νμ μ΄λ €μμΌλ‘ μ₯거리 μμ‘΄μ± λ¬Έμ (κΈ°μΈκΈ° μμ€)
κΈ°μ‘΄μ seq2seq λͺ¨λΈ: μΈμ½λ-λμ½λ κ΅¬μ‘°λ‘ κ΅¬μ±λμ΄μλ€.
νμ§λ§ μ΄λ° ꡬ쑰λ μΈμ½λκ° μ λ ₯ μνμ€λ₯Ό νλμ 벑ν°λ‘ μμΆνλ κ³Όμ μμ μ λ ₯ μνμ€μ μ λ³΄κ° μΌλΆ μμ€λλ€.
βμ΄λ₯Ό 보μ νκΈ° μν΄ μ΄ν μ μ΄ μ¬μ©λ¨
<aside>
π‘νΈλμ€ν¬λ¨Έμ ν΅μ¬ μμ΄λμ΄:
μ΄ μ΄ν μ μ RNNμ 보μ μ μν μ©λλ‘μ μ¬μ©νλ κ²μ΄ μλλΌ μ΄ν μ λ§μΌλ‘ μΈμ½λμ λμ½λλ₯Ό λ§λ€μ΄λ³΄λ©΄ μ΄λ¨κΉ?
</aside>

νΈλμ€ν¬λ¨Έλ RNNμ μ¬μ©νμ§ μμ§λ§ κΈ°μ‘΄μ seq2seqμ²λΌ μΈμ½λ-λμ½λ ꡬ쑰λ₯Ό μ μ§νκ³ μλ€.
μ°¨μ΄μ : μ΄μ seq2seq ꡬ쑰μμλ μΈμ½λμ λμ½λμμ κ°κ° νλμ RNNμ΄ tκ°μ μμ (time step)μ κ°μ§λ ꡬ쑰μλ€λ©΄ μ΄λ²μλ μΈμ½λμ λμ½λλΌλ λ¨μκ° Nκ°λ‘ ꡬμ±λλ ꡬ쑰μ΄λ€.

