← All · ai

Large language model - Wikipedia

https://en.wikipedia.org/wiki/Large_language_model

📅 2026-03-21 📁 ai 🏷️ Large language models, AI, machine learning, natural language processing, computational models
Large language models (LLMs) are computational models trained on vast data for natural language processing tasks, especially language generation.

Linked skills: Ai News Feed

url: "https://en.wikipedia.org/wiki/Large_language_model"

title: "Large language model - Wikipedia"

date_saved: 2026-03-21

category: ai

tags: [Large language models, AI, machine learning, natural language processing, computational models]

source: direct

reminder: false

cross_skills: [ai-news-feed]

session_mention: always

url_hash: "dcc1964c3e14"

Large language model - Wikipedia

**Summary**: Large language models (LLMs) are computational models trained on vast data for natural language processing tasks, especially language generation.

Key Points

Content

Toggle the table of contents 64 languages Afrikaans Aragonés العربية Azərbaycanca Boarisch Беларуская বাংলা Bosanski Català کوردی Čeština Dansk Deutsch Ελληνικά Esperanto Español Eesti Euskara فارسی Suomi Français Gaeilge Galego עברית हिन्दी Magyar Հայերեն Bahasa Indonesia Ido Italiano 日本語 La .lojban. Qaraqalpaqsha Қазақша 한국어 Македонски Монгол Nederlands Norsk nynorsk Norsk bokmål Polski Português Runa Simi Română Русский Simple English Slovenščina Shqip Српски / srpski தமிழ் తెలుగు ไทย Tagalog Türkçe ئۇيغۇرچە / Uyghurche Українська اردو Oʻzbekcha / ўзбекча Tiếng Việt 文言 閩南語 / Bân-lâm-gí 粵語 中文 IsiZulu Edit links Article Talk English Read Edit View history Tools Tools move to sidebar hide Actions Read Edit View history General What links here Related changes Upload file Permanent link Page information Cite this page Get shortened URL Print/export Download as PDF Printable version In other projects Wikimedia Commons Wikidata item Appearance move to sidebar hide From Wikipedia, the free encyclopedia Type of machine learning model Not to be confused with Logic learning machine . "LLM" redirects here. For other uses, see LLM (disambiguation) . This article's lead section may be too technical for most readers to understand . Please help improve it to make it understandable to non-experts . ( January 2026 ) ( Learn how and when to remove this message ) Part of a series on Machine learning and data mining Paradigms Supervised learning Unsupervised learning Semi-supervised learning Self-supervised learning Reinforcement learning Meta-learning Online learning Batch learning Curriculum learning Rule-based learning Neuro-symbolic AI Neuromorphic engineering Quantum machine learning Problems Classification Generative modeling Regression Clustering Dimensionality reduction Density estimation Anomaly detection Data cleaning AutoML Association rules Semantic analysis Structured prediction Feature engineering Feature learning Learning to rank Grammar induction Ontology learning Multimodal learning Supervised learning ( classification • regression ) Apprenticeship learning Decision trees Ensembles Bagging Boosting Random forest k -NN Linear regression Naive Bayes Artificial neural networks Logistic regression Perceptron Relevance vector machine (RVM) Support vector machine (SVM) Clustering BIRCH CURE Hierarchical k -means Fuzzy Expectation–maximization (EM) DBSCAN OPTICS Mean shift Dimensionality reduction Factor analysis CCA ICA LDA NMF PCA PGD t-SNE SDL Structured prediction Graphical models Bayes net Conditional random field Hidden Markov Anomaly detection RANSAC k -NN Local outlier factor Isolation forest Neural networks Autoencoder Deep learning Feedforward neural network Recurrent neural network LSTM GRU ESN reservoir computing Boltzmann machine Restricted GAN Diffusion model SOM Convolutional neural network U-Net LeNet AlexNet DeepDream Neural field Neural radiance field Physics-informed neural networks Transformer Vision Mamba Spiking neural network Memtransistor Electrochemical RAM (ECRAM) Reinforcement learning Q-learning Policy gradient SARSA Temporal difference (TD) Multi-agent Self-play Learning with humans Active learning Crowdsourcing Human-in-the-loop Mechanistic interpretability RLHF Model diagnostics Coefficient of determination Confusion matrix Learning curve ROC curve Mathematical foundations Kernel machines Bias–variance tradeoff Computational learning theory Empirical risk minimization Occam learning PAC learning Statistical learning VC theory Topological deep learning Journals and conferences AAAI ECML PKDD NeurIPS ICML ICLR IJCAI ML JMLR Related articles Glossary of artificial intelligence List of datasets for machine-learning research List of datasets in computer vision and image processing Outline of machine learning v t e A large language model ( LLM ) is a computational model trained on a vast amount of data, designed for natural language processing tasks, especially language generation . [ 1 ] [ 2 ] The largest and most capable LLMs are generative pre-trained transformers (GPTs) that provide the core capabilities of modern chatbots . LLMs can be fine-tuned for specific tasks or guided by prompt engineering . [ 3 ] These models acquire predictive power regarding syntax , semantics , and ontologies [ 4 ] inherent in human language corpora , but they also inherit inaccuracies and biases present in the data they are trained on. [ 5 ] LLMs consist of billions to trillions of parameters and operate as general-purpose sequence models, generating, summarizing, translating, and reasoning over text. LLMs represent a significant new technology in their ability to generalize across tasks with minimal task-specific supervision, enabling capabilities like conversational agents , code generation , knowledge retrieval , and automated reasoning that previously required bespoke systems. [ 6 ] LLMs evolved from earlier statistical and recurrent neural network approaches to language modeling. The transformer architecture , introduced in 2017, replaced recurrence with self-attention , allowing efficient parallelization , longer context handling, and scalable training on unprecedented data volumes. [ 7 ] This innovation enabled models like GPT , BERT , and their successors, which demonstrated emergent behaviors at scale, such as few-shot learning and compositional reasoning. [ 8 ] Reinforcement learning , particularly policy gradient algorithms , has been adapted to fine-tune LLMs for desired behaviors beyond raw next-token prediction. [ 9 ] Reinforcement learning from human feedback (RLHF) applies these methods to optimize a policy, the LLM's output distribution, against reward signals derived from human or automated preference judgments. [ 10 ] This has been critical for aligning model outputs with user expectations, improving factuality, reducing harmful responses, and enhancing task performance. Benchmark evaluations for LLMs have evolved from narrow linguistic assessments toward comprehensive, multi-task evaluations measuring reasoning , factual accuracy , alignment , and safety . [ 11 ] [ 12 ] Hill climbing , iteratively optimizing models against benchmarks, has emerged as a dominant strategy, producing rapid incremental performance gains but raising concerns of overfitting to benchmarks rather than achieving genuine generalization or robust capability improvements. [ 13 ] History [ edit ] The number of publications about large language models by year grouped by publication types The training compute of notable large models in FLOPs vs publication date over the period 2010–2024. For overall notable models (top left), frontier models (top right), top language models (bottom left) and top models within leading companies (bottom right). The majority of these models are language models. The training compute of notable large AI models in FLOPs vs publication date over the period 2017–2024. The majority of large models are language models or multimodal models with language capacity. Before the emergence of transformer-based models in 2017, some language models were considered large relative to the computational and data constraints of their time. In the early 1990s, IBM 's statistical models pioneered word alignment techniques for machine translation, laying the groundwork for corpus-based language modeling . In 2001, a smoothed n -gram model , such as those employing Kneser–Ney smoothing , trained on 300 million words, achieved state-of-the-art perplexity on benchmark tests. [ 14 ] During the 2000s, with the rise of widespread internet access, researchers began compiling massive text datasets from the web ("web as corpus" [ 15 ] ) to train statistical language models. [ 16 ] [ 17 ] Moving beyond n -gram models, researchers started in 2000 to use neural networks to learn language models. [ 18 ] Following the breakthrough of deep neural networks in image classification around 2012, [ 19 ] similar architectures were adapted for language tasks. This shift was m

Images

![image-0](https://en.wikipedia.org/static/images/icons/enwiki-25.svg)

![image-1](https://en.wikipedia.org/static/images/mobile/copyright/wikipedia-wordmark-en-25.svg)

![image-2](https://en.wikipedia.org/static/images/mobile/copyright/wikipedia-tagline-en-25.svg)

![image-3](https://upload.wikimedia.org/wikipedia/en/thumb/f/f2/Edit-clear.svg/40px-Edit-clear.svg.png)

![image-4](https://upload.wikimedia.org/wikipedia/commons/thumb/8/81/The_number_of_publications_about_Large_Language_Models_by_year.png/250px-The_number_of_publications_about_Large_Language_Models_by_year.png)

![image-5](https://upload.wikimedia.org/wikipedia/commons/thumb/9/9b/Trends_in_AI_training_FLOP_over_time_%282010-2025%29.svg/250px-Trends_in_AI_training_FLOP_over_time_%282010-2025%29.svg.png)

![image-6](https://upload.wikimedia.org/wikipedia/commons/thumb/0/06/Large-scale_AI_training_compute_%28FLOP%29_vs_Publication_date_%282017-2024%29.svg/250px-Large-scale_AI_training_compute_%28FLOP%29_vs_Publication_date_%282017-2024%29.svg.png)

![image-7](https://upload.wikimedia.org/wikipedia/commons/thumb/8/8f/The-Transformer-model-architecture.png/330px-The-Transformer-model-architecture.png)

![image-8](https://upload.wikimedia.org/wikipedia/commons/thumb/6/64/Estimated_training_cost_of_some_AI_models_-_2024_AI_index.jpg/500px-Estimated_training_cost_of_some_AI_models_-_2024_AI_index.jpg)

![image-9](https://upload.wikimedia.org/wikipedia/commons/thumb/e/e9/Multiple_attention_heads.png/330px-Multiple_attention_heads.png)

Related Skills