"Exploring the Architecture and Application of Transformer Models in NLP and Media Generation"
Authors: Zhairui Shen, Tianwei Wang
Institution: Arcadia University, Glenside, PA, USA
Introduction to the impact of Transformer models in NLP and AI-driven content generation. Highlights the significant advances in generating images and videos from textual descriptions using Transformers and GANs.
Reference: Vaswani, A., Shazeer, N., Parmar, N., et al. "Attention Is All You Need." NeurIPS 2017 [1].
Outline the goals of optimizing Transformer architecture:
Reference: Brown, T. B., Mann, B., Ryder, N., et al. "Language Models are Few-Shot Learners." NeurIPS 2020 [2].
Detailed breakdown of key Transformer components:
Mixed precision training effectively balances computational efficiency and training accuracy.
Reference: Micikevicius, P., Narang, S., Alben, J., et al. "Mixed Precision Training." ICLR 2018 [3].
Distributed training using multiple GPUs helps to overcome the computational challenges posed by large Transformer models.
Reference: Goyal, P., Dollár, P., Girshick, R., et al. "Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour." CVPR 2018 [4].
Investigation into sparse attention mechanisms to enhance computational efficiency.
Reference: Child, R., Gray, S., Radford, A., et al. "Generating Long Sequences with Sparse Transformers." arXiv 2019 [5].
Enhancements to model architecture using pruning and quantization.
Reference: Han, S., Pool, J., Tran, J., et al. "Learning both Weights and Connections for Efficient Neural Networks." NeurIPS 2015 [6].
Descriptions of generating images and videos from textual descriptions using GANs and Transformers.
Reference: Ramesh, A., Pavlov, M., Goh, G., et al. "Zero-Shot Text-to-Image Generation." ICML 2021 [7].
Reference: Koh, J., Park, S., Song, J. "Improving Text Generation on Images with Synthetic Captions." arXiv 2024 [12].
Use of memory access optimizations and operator fusion to increase computational efficiency.
Reference: Jia, Y., Shelhamer, E., Donahue, J., et al. "Caffe: Convolutional Architecture for Fast Feature Embedding." MM 2014 [8].
Analysis of improvements made to the original Transformer model:
Reference: Heusel, M., Ramsauer, H., Unterthiner, T., et al. "GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium." NeurIPS 2017 [9].
Future expansion of Transformer models to other creative domains such as music and 3D modeling.
Reference: Dhariwal, P., Nichol, A. "Diffusion Models Beat GANs on Image Synthesis." NeurIPS 2021 [10].