<project title="llms.txt">> A proposal that those interested in providing LLM-friendly content add a /llms.txt file to their site. This is a markdown file that provides brief background information and guidance, along with links to markdown files providing more detailed information.<docs><doc title="llms.txt proposal" desc="The proposal for llms.txt"># The /llms.txt file
Jeremy Howard
2024-09-03

## Background

Large language models increasingly rely on website information, but face
a critical limitation: context windows are too small to handle most
websites in their entirety. Converting complex HTML pages with
navigation, ads, and JavaScript into LLM-friendly plain text is both
difficult and imprecise.

While websites serve both human readers and LLMs, the latter benefit
from more concise, expert-level information gathered in a single,
accessible location. This is particularly important for use cases like
development environments, where LLMs need quick access to programming
documentation and APIs.

## Proposal

<figure>
<img src="logo.png" class="lightbox floatr" width="150"
alt="llms.txt logo" />
<figcaption aria-hidden="true">llms.txt logo</figcaption>
</figure>

We propose adding a `/llms.txt` markdown file to websites to provide
LLM-friendly content. This file offers brief background information,
guidance, and links to detailed markdown files.

llms.txt markdown is human and LLM readable, but is also in a precise
format allowing fixed processing methods (i.e. classical programming
techniques such as parsers and regex).

We furthermore propose that pages on websites that have information that
might be useful for LLMs to read provide a clean markdown version of
those pages at the same URL as the original page, but with `.md`
appended. (URLs without file names should append `index.html.md`
instead.)

The [FastHTML project](https://fastht.ml) follows these two proposals
for its documentation. For instance, here is the [FastHTML docs
llms.txt](https://fastht.ml/docs/llms.txt). And here is an example of a
[regular HTML docs
page](https://fastht.ml/docs/tutorials/by_example.html), along with
exact same URL but with [a .md
extension](https://fastht.ml/docs/tutorials/by_example.html.md).

This proposal does not include any particular recommendation for how to
process the llms.txt file, since it will depend on the application. For
example, the FastHTML project opted to automatically expand the llms.txt
to two markdown files with the contents of the linked URLs, using an
XML-based structure suitable for use in LLMs such as Claude. The two
files are: [llms-ctx.txt](https://fastht.ml/docs/llms-ctx.txt), which
does not include the optional URLs, and
[llms-ctx-full.txt](https://fastht.ml/docs/llms-ctx-full.txt), which
does include them. They are created using the
[`llms_txt2ctx`](https://llmstxt.org/intro.html#cli) command line
application, and the FastHTML documentation includes information for
users about how to use them.

The versatility of llms.txt files means they can serve many purposes -
from helping developers find their way around software documentation, to
giving businesses a way to outline their structure, or even breaking
down complex legislation for stakeholders. They’re just as useful for
personal websites where they can help answer questions about someone’s
CV, for e-commerce sites to explain products and policies, or for
schools and universities to provide quick access to their course
information and resources.

Note that all [nbdev](https://nbdev.fast.ai/) projects now create .md
versions of all pages by default. All Answer.AI and fast.ai software
projects using nbdev have had their docs regenerated with this feature.
For an example, see the [markdown
version](https://fastcore.fast.ai/docments.html.md) of [fastcore’s
docments module](https://fastcore.fast.ai/docments.html).

## Format

At the moment the most widely and easily understood format for language
models is Markdown. Simply showing where key Markdown files can be found
is a great first step. Providing some basic structure helps a language
model to find where the information it needs can come from.

The `llms.txt` file is unusual in that it uses Markdown to structure the
information rather than a classic structured format such as XML. The
reason for this is that we expect many of these files to be read by
language models and agents. Having said that, the information in
llms.txt follows a specific format and can be read using standard
programmatic-based tools.

The llms.txt file spec is for files located in the root path `/llms.txt`
of a website (or, optionally, in a subpath). A file following the spec
contains the following sections as markdown, in the specific order:

- An H1 with the name of the project or site. This is the only required
  section
- A blockquote with a short summary of the project, containing key
  information necessary for understanding the rest of the file
- Zero or more markdown sections (e.g. paragraphs, lists, etc) of any
  type except headings, containing more detailed information about the
  project and how to interpret the provided files
- Zero or more markdown sections delimited by H2 headers, containing
  “file lists” of URLs where further detail is available
  - Each “file list” is a markdown list, containing a required markdown
    hyperlink `[name](url)`, then optionally a `:` and notes about the
    file.

Here is a mock example:

``` markdown
# Title

> Optional description goes here

Optional details go here

## Section name

- [Link title](https://link_url): Optional link details

## Optional

- [Link title](https://link_url)
```

Note that the “Optional” section has a special meaning—if it’s included,
the URLs provided there can be skipped if a shorter context is needed.
Use it for secondary information which can often be skipped.

## Existing standards

llms.txt is designed to coexist with current web standards. While
sitemaps list all pages for search engines, `llms.txt` offers a curated
overview for LLMs. It can complement robots.txt by providing context for
allowed content. The file can also reference structured data markup used
on the site, helping LLMs understand how to interpret this information
in context.

The approach of standardising on a path for the file follows the
approach of `/robots.txt` and `/sitemap.xml`. robots.txt and `llms.txt`
have different purposes—robots.txt is generally used to let automated
tools what access to a site is considered acceptable, such as for search
indexing bots. On the other hand, `llms.txt` information will often be
used on demand when a user explicitly requesting information about a
topic, such as when including a coding library’s documentation in a
project, or when asking a chat bot with search functiontionality for
information. Our expectation is that `llms.txt` will mainly be useful
for *inference*, i.e. at the time a user is seeking assistance, as
opposed to for *training*. However, perhaps if `llms.txt` usage becomes
widespread, future training runs could take advantage of the information
in `llms.txt` files too.

sitemap.xml is a list of all the indexable human-readable information
available on a site. This isn’t a substitute for `llms.txt` since it:

- Often won’t have the LLM-readable versions of pages listed
- Doesn’t include URLs to external sites, even although they might be
  helpful to understand the information
- Will generally cover documents that in aggregate will be too large to
  fit in an LLM context window, and will include a lot of information
  that isn’t necessary to understand the site.

## Example

Here’s an example of `llms.txt`, in this case a cut down version of the
file used for the FastHTML project (see also the [full
version](https://fastht.ml/docs/llms.txt):

``` markdown
# FastHTML

> FastHTML is a python library which brings together Starlette, Uvicorn, HTMX, and fastcore's `FT` "FastTags" into a library for creating server-rendered hypermedia applications.

Important notes:

- Although parts of its API are inspired by FastAPI, it is *not* compatible with FastAPI syntax and is not targeted at creating API services
- FastHTML is compatible with JS-native web components and any vanilla JS library, but not with React, Vue, or Svelte.

## Docs

- [FastHTML quick start](https://fastht.ml/docs/tutorials/quickstart_for_web_devs.html.md): A brief overview of many FastHTML features
- [HTMX reference](https://raw.githubusercontent.com/path/reference.md): Brief description of all HTMX attributes, CSS classes, headers, events, extensions, js lib methods, and config options

## Examples

- [Todo list application](https://raw.githubusercontent.com/path/adv_app.py): Detailed walk-thru of a complete CRUD app in FastHTML showing idiomatic use of FastHTML and HTMX patterns.

## Optional

- [Starlette full documentation](https://gist.githubusercontent.com/path/starlette-sml.md): A subset of the Starlette documentation useful for FastHTML development.
```

To create effective `llms.txt` files, consider these guidelines: Use
concise, clear language. When linking to resources, include brief,
informative descriptions. Avoid ambiguous terms or unexplained jargon.
Run a tool that expands your `llms.txt` file into an LLM context file
and test a number of language models to see if they can answer questions
about your content.

## Directories

Here are a few directories that list the `llms.txt` files available on
the web:

- [llmstxt.site](https://llmstxt.site/)
- [directory.llmstxt.cloud](https://directory.llmstxt.cloud/)

## Next steps

The `llms.txt` specification is open for community input. A [GitHub
repository](https://github.com/AnswerDotAI/llms-txt) hosts [this
informal
overview](https://github.com/AnswerDotAI/llms-txt/blob/main/nbs/index.md),
allowing for version control and public discussion. A [community discord
channel](https://discord.gg/aJPygMvPEN) is available for sharing
implementation experiences and discussing best practices.</doc><doc title="Python library docs" desc="Docs for `llms-txt` python lib"># Python module & CLI


Given an `llms.txt` file, this provides a CLI and Python API to parse
the file and create an XML context file from it. The input file should
follow this format:

    # FastHTML

    > FastHTML is a python library which...

    When writing FastHTML apps remember to:

    - Thing to remember

    ## Docs

    - [Surreal](https://host/README.md): Tiny jQuery alternative with Locality of Behavior
    - [FastHTML quick start](https://host/quickstart.html.md): An overview of FastHTML features

    ## Examples

    - [Todo app](https://host/adv_app.py)

    ## Optional

    - [Starlette docs](https://host/starlette-sml.md): A subset of the Starlette docs

## Install

``` sh
pip install llms-txt
```

## How to use

### CLI

After installation,
[`llms_txt2ctx`](https://llmstxt.org/core.html#llms_txt2ctx) is
available in your terminal.

To get help for the CLI:

``` sh
llms_txt2ctx -h
```

To convert an `llms.txt` file to XML context and save to `llms.md`:

``` sh
llms_txt2ctx llms.txt > llms.md
```

Pass `--optional True` to add the ‘optional’ section of the input file.

### Python module

``` python
from llms_txt import *
```

``` python
samp = Path('llms-sample.txt').read_text()
```

Use [`parse_llms_file`](https://llmstxt.org/core.html#parse_llms_file)
to create a data structure with the sections of an llms.txt file (you
can also add `optional=True` if needed):

``` python
parsed = parse_llms_file(samp)
list(parsed)
```

    ['title', 'summary', 'info', 'sections']

``` python
parsed.title,parsed.summary
```

    ('FastHTML',
     'FastHTML is a python library which brings together Starlette, Uvicorn, HTMX, and fastcore\'s `FT` "FastTags" into a library for creating server-rendered hypermedia applications.')

``` python
list(parsed.sections)
```

    ['Docs', 'Examples', 'Optional']

``` python
parsed.sections.Optional[0]
```

``` json
{ 'desc': 'A subset of the Starlette documentation useful for FastHTML '
          'development.',
  'title': 'Starlette full documentation',
  'url': 'https://gist.githubusercontent.com/jph00/809e4a4808d4510be0e3dc9565e9cbd3/raw/9b717589ca44cedc8aaf00b2b8cacef922964c0f/starlette-sml.md'}
```

Use [`create_ctx`](https://llmstxt.org/core.html#create_ctx) to create
an LLM context file with XML sections, suitable for systems such as
Claude (this is what the CLI calls behind the scenes).

``` python
ctx = create_ctx(samp)
```

``` python
print(ctx[:300])
```

    <project title="FastHTML" summary='FastHTML is a python library which brings together Starlette, Uvicorn, HTMX, and fastcore&#39;s `FT` "FastTags" into a library for creating server-rendered hypermedia applications.'>
    Remember:

    - Use `serve()` for running uvicorn (`if __name__ == "__main__"` is not

### Implementation and tests

To show how simple it is to parse `llms.txt` files, here’s a complete
parser in \<20 lines of code with no dependencies:

``` python
from pathlib import Path
import re,itertools

def chunked(it, chunk_sz):
    it = iter(it)
    return iter(lambda: list(itertools.islice(it, chunk_sz)), [])

def parse_llms_txt(txt):
    "Parse llms.txt file contents in `txt` to a `dict`"
    def _p(links):
        link_pat = '-\s*\[(?P<title>[^\]]+)\]\((?P<url>[^\)]+)\)(?::\s*(?P<desc>.*))?'
        return [re.search(link_pat, l).groupdict()
                for l in re.split(r'\n+', links.strip()) if l.strip()]

    start,*rest = re.split(fr'^##\s*(.*?$)', txt, flags=re.MULTILINE)
    sects = {k: _p(v) for k,v in dict(chunked(rest, 2)).items()}
    pat = '^#\s*(?P<title>.+?$)\n+(?:^>\s*(?P<summary>.+?$)$)?\n+(?P<info>.*)'
    d = re.search(pat, start.strip(), (re.MULTILINE|re.DOTALL)).groupdict()
    d['sections'] = sects
    return d
```

We have provided a test suite in `tests/test-parse.py` and confirmed
that this implementation passes all tests.</doc><doc title="ed demo" desc="Tongue-in-cheek example of how llms.txt could be used in the classic `ed` editor, used to show how editors could incorporate llms.txt in general.">

# `ed`, the standard text editor

In order to understand how llms.txt can be used with editors and IDEs,
let’s look at how `ed`, the [standard text
editor](https://www.gnu.org/fun/jokes/ed-msg.html), could work (assuming
it’s updated to use this proposal). In our example we will look at how
the user might then tell `ed` to retrieve the LLM docs from
[fastht.ml/docs](https://fastht.ml/docs), and then use the results to
write a simple [FastHTML](https://fastht.ml) web app.

Even if you use a non-standard editor or IDE such as vscode, Cursor,
vim, or Emacs, your software’s interaction with `/llms.txt` would look
similar to this general approach.

``` sh
$ ed
* H
```

Our user starts `ed` and enables helpful error messages (just for the
purpose of this walkthru - obviously a real `ed` user doesn’t need
“helpful error messages”).

``` sh
* L fastht.ml/docs
Checking for /llms.txt at fastht.ml/docs...
Found /llms.txt. Parsing...
Fetching URLs from "Docs" section...  Fetching URLs from "Examples" section...
Skipping "Optional" section for brevity.
Creating XML-based context for Claude...  Context created and loaded.
```

The user invokes the hypothetical `L` (load) command, which in this
LLM-enhanced version of `ed` retrieves and processes the `llms.txt`
file. `ed` checks for the file (if it didn’t exist, it would fall back
to scraping the HTML of the website the old-fashioned way), parses it,
fetches the relevant URLs, and creates an XML-based context suitable for
Claude (perhaps an `ed` config file could be used to choose what LLM to
use, and would determine how the context is formatted). All of this
happens with the characteristic silence of `ed`, broken only by these
reassuring progress messages.

``` sh
* x Create a simple FastHTML app which outputs 'Hello, World!', in a <div>.
Analyzing context and prompt...
Generating FastHTML app...
App written to buffer.
```

Next, our user invokes the hypothetical `x` (eXecute AI) command,
providing instructions for the LLM to create a simple FastHTML app. In
the world of LLM-enhanced `ed`, this is understood as a request to
generate code based on the given prompt and the previously loaded
context.

``` sh
* n
5
* p
from fasthtml.common import *
app,rt = fast_app()
@rt
def index(): return div("Hello, World!")
serve()
```

The editor analyzes the loaded context along with the provided prompt,
generates the FastHTML app, and writes it to the buffer. The user then
views the generated app line count (`n`) and contents (`p`), marveling
at how much functionality is packed into those 5 lines.

``` sh
*w hello_world.py
5
*q
```

Finally, our user saves the app to a file and quits `ed`, presumably to
run their new FastHTML app and reflect on the unexpected productivity
boost provided by their trusty line editor.</doc></docs></project>