Source code for llms_txt Python module, containing helpers to create and use llms.txt files
Introduction
The llms.txt file spec is for files located in the path llms.txt of a website (or, optionally, in a subpath). llms-sample.txt is a simple example. A file following the spec contains the following sections as markdown, in the specific order:
An H1 with the name of the project or site. This is the only required section
A blockquote with a short summary of the project, containing key information necessary for understanding the rest of the file
Zero or more markdown sections (e.g. paragraphs, lists, etc) of any type, except headings, containing more detailed information about the project and how to interpret the provided files
Zero or more markdown sections delimited by H2 headers, containing “file lists” of URLs where further detail is available
Each “file list” is a markdown list, containing a required markdown hyperlink [name](url), then optionally a : and notes about the file.
Here’s the start of a sample llms.txt file we’ll use for testing:
# FastHTML
> FastHTML is a python library which brings together Starlette, Uvicorn, HTMX, and fastcore's `FT` "FastTags" into a library for creating server-rendered hypermedia applications.
Remember:
- Use `serve()` for running uvicorn (`if __name__ == "__main__"` is not needed since it's automatic)
- When a title is needed with a response, use `Titled`; note that that already wraps children in `Container`, and already includes both the meta title as well as the H1 element
Reading
We’ll implement parse_llms_file to pull out the sections of llms.txt into a simple data structure.
Create a function that uses the above steps to parse an llms.txt into start and a dict with keys like d and parsed list of links as values.
start, sects = _parse_llms(samp)start
'# FastHTML\n\n> FastHTML is a python library which brings together Starlette, Uvicorn, HTMX, and fastcore\'s `FT` "FastTags" into a library for creating server-rendered hypermedia applications.\n\nRemember:\n\n- Use `serve()` for running uvicorn (`if __name__ == "__main__"` is not needed since it\'s automatic)\n- When a title is needed with a response, use `Titled`; note that that already wraps children in `Container`, and already includes both the meta title as well as the H1 element.'
pat =fr'^#\s*{title}\n+{summ_pat}\n+{info}'search(pat, start, (re.MULTILINE|re.DOTALL))
{'title': 'FastHTML',
'summary': 'FastHTML is a python library which brings together Starlette, Uvicorn, HTMX, and fastcore\'s `FT` "FastTags" into a library for creating server-rendered hypermedia applications.',
'info': 'Remember:\n\n- Use `serve()` for running uvicorn (`if __name__ == "__main__"` is not needed since it\'s automatic)\n- When a title is needed with a response, use `Titled`; note that that already wraps children in `Container`, and already includes both the meta title as well as the H1 element.'}
Parse llms.txt file contents in txt to an AttrDict
llmsd = parse_llms_file(samp)llmsd.summary
'FastHTML is a python library which brings together Starlette, Uvicorn, HTMX, and fastcore\'s `FT` "FastTags" into a library for creating server-rendered hypermedia applications.'
llmsd.sections.Examples
(#1) [{'title': 'Todo list application', 'url': 'https://raw.githubusercontent.com/AnswerDotAI/fasthtml/main/examples/adv_app.py', 'desc': 'Detailed walk-thru of a complete CRUD app in FastHTML showing idiomatic use of FastHTML and HTMX patterns.'}]
XML conversion
For some LLMs such as Claude, XML format is preferred, so we’ll provide a function to create that format.
<project title="FastHTML" summary='FastHTML is a python library which brings together Starlette, Uvicorn, HTMX, and fastcore's `FT` "FastTags" into a library for creating server-rendered hypermedia applications.'>Remember:
- Use `serve()` for running uvic...