DOCS = {
“transformer_architecture.md”: textwrap.dedent(“””
# Transformer Structure
## Overview
The Transformer is a deep studying structure launched in “Consideration Is All
You Want” (Vaswani et al., 2017). It changed recurrent networks with a
self-attention mechanism, enabling parallel coaching and higher long-range
dependency modelling.
## Key Parts
– **Multi-Head Self-Consideration**: Computes consideration in h parallel heads, every
with its personal discovered Q/Ok/V projections, then concatenates and tasks.
– **Feed-Ahead Community (FFN)**: Two linear layers with a ReLU activation,
utilized position-wise.
– **Positional Encoding**: Sinusoidal or discovered embeddings that inject
sequence-order info, since consideration is permutation-invariant.
– **Layer Normalisation**: Utilized earlier than (Pre-LN) or after (Publish-LN) every
sub-layer, stabilising gradients.
– **Residual Connections**: Added round every sub-layer to ease gradient move.
## Encoder vs Decoder
The encoder stack processes enter tokens bidirectionally (e.g. BERT).
The decoder stack makes use of causal (masked) consideration over earlier outputs plus
cross-attention over encoder outputs (e.g. GPT, T5).
## Scaling Legal guidelines
Kaplan et al. (2020) confirmed that mannequin loss decreases predictably as an influence
regulation with compute, knowledge, and parameter rely. This motivated GPT-3 (175B) and
subsequent massive language fashions.
## Limitations
– Quadratic complexity in sequence size: O(n^2)
– No inherent recurrence -> long-context challenges
– Excessive reminiscence footprint throughout coaching
## References
Vaswani et al. (2017). Consideration Is All You Want. NeurIPS.
Kaplan et al. (2020). Scaling Legal guidelines for Neural Language Fashions. arXiv:2001.08361.
“””),
“rag_systems.md”: textwrap.dedent(“””
# Retrieval-Augmented Era (RAG)
## Definition
RAG augments a generative LLM with a retrieval step: given a question, related
paperwork are fetched from a corpus and prepended to the immediate, giving the
mannequin grounded context past its coaching knowledge.
## Structure
1. **Indexing Part** — Paperwork are chunked, embedded through a bi-encoder
(e.g. text-embedding-3-large), and saved in a vector database (e.g.
Faiss, Pinecone, Weaviate).
2. **Retrieval Part** — The consumer question is embedded; approximate nearest-
neighbour (ANN) search returns the top-k chunks.
3. **Era Part** — Retrieved chunks + question are handed to the LLM
which synthesises a last reply.
## Variants
– **Dense Retrieval**: DPR, Contriever — queries and docs in the identical area.
– **Sparse Retrieval**: BM25 — time period frequency-based, no embeddings wanted.
– **Hybrid Retrieval**: Reciprocal Rank Fusion (RRF) combines dense + sparse.
– **Re-ranking**: A cross-encoder re-scores the top-k earlier than the LLM sees them.
## Challenges
– Context window limits: lengthy retrieved passages might not match.
– Retrieval high quality is a tough ceiling on era high quality.
– Chunking technique considerably impacts recall.
– Multi-hop questions require iterative retrieval (IRCoT, ReAct).
## Relationship to Transformers
RAG methods depend on transformer-based encoders for embedding and decoder
fashions for era. The standard of the embedding mannequin immediately determines
retrieval precision and recall.
## References
Lewis et al. (2020). RAG for Data-Intensive NLP Duties. NeurIPS.
Gao et al. (2023). RAG for Giant Language Fashions. arXiv:2312.10997.
“””),
“knowledge_graph_integration.md”: textwrap.dedent(“””
# Data Graphs and LLM Integration
## What’s a Data Graph?
A information graph (KG) is a directed labelled graph of entities (nodes) and
relations (edges): (topic, predicate, object) triples, e.g.
(Vaswani, authored, “Consideration Is All You Want”).
## Why Mix KGs with LLMs?
LLMs hallucinate details; KGs present structured, verifiable floor reality.
KGs are onerous to question in pure language; LLMs present the interface.
Collectively they permit devoted, grounded, explainable query answering.
## Integration Methods
### KG-Augmented Era (KGAG)
Retrieve triples or sub-graphs as an alternative of textual content chunks, serialise into textual content,
then feed to the LLM immediate.
### LLM-Assisted KG Development
LLMs extract (topic, relation, object) triples from unstructured textual content,
decreasing guide curation effort considerably.
### GraphRAG (Microsoft Analysis, 2024)
GraphRAG clusters doc communities, generates neighborhood summaries, and
shops them in a KG. Queries answered by map-reduce over neighborhood summaries
outperform flat-vector RAG on sensemaking duties.
## Challenges
– KG development high quality is determined by extraction LLM accuracy.
– Graph databases add infrastructure complexity.
– Ontology design requires area experience.
– KGs go stale with out steady replace pipelines.
## Relation to RAG and Transformers
KG integration addresses two key RAG limitations: lack of structured reasoning
and incapacity to observe multi-hop relations.
## References
Pan et al. (2023). Unifying LLMs and KGs. IEEE Clever Techniques.
“””),
}

