---
title: "MeSH Tables"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{MeSH Tables}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(
message = FALSE,
warning = FALSE,
comment = "#>"
)
```
`puremoe` ships three MeSH reference tables: a thesaurus of descriptors and entry terms, a tree of hierarchical classifications, and a bundled table of MeSH annotation counts per descriptor across PubMed.
```{r libs}
library(puremoe)
library(dplyr)
library(DT)
```
## MeSH thesaurus
`data_mesh_thesaurus()` downloads and combines the MeSH Descriptor Thesaurus and Supplementary Concept Records (SCR). One row per term, including synonyms and entry terms for each descriptor.
```{r thesaurus}
thesaurus <- puremoe::data_mesh_thesaurus()
```
```{r thesaurus-table}
thesaurus |>
head(20) |>
DT::datatable(rownames = FALSE, options = list(scrollX = TRUE))
```
## MeSH trees
`data_mesh_trees()` provides the hierarchical classification structure. Each descriptor can appear in multiple branches; `tree_location` encodes the full path (e.g., `I01.880.604` = Social Sciences > Political Science > Political Systems).
```{r trees}
trees <- puremoe::data_mesh_trees()
```
```{r trees-table}
trees |>
head(20) |>
DT::datatable(rownames = FALSE, options = list(scrollX = TRUE))
```
## MeSH descriptor frequencies
`data_mesh_frequencies` is a bundled dataset giving the annotation frequency of
each MeSH descriptor across the full PubMed corpus (39.7 M PMIDs, April 2026).
Counts reflect the number of records indexed with each descriptor by NLM curators,
not text frequency, making them suitable as a baseline for enrichment analyses
against arbitrary PubMed subsets.
```{r frequencies}
puremoe::data_mesh_frequencies |>
head(20) |>
dplyr::mutate(prop_total = round(prop_total, 4)) |>
DT::datatable(rownames = FALSE, options = list(scrollX = TRUE))
```
## Persistent storage
The downloaded MeSH thesaurus and tree tables are fetched from GitHub on each
call by default. To avoid re-downloading every session, set
`use_persistent_storage = TRUE`; the files are cached to a system data directory
and reused on subsequent calls. `data_mesh_frequencies` is bundled with the
package and does not need to be downloaded.
```{r persistent, eval=FALSE}
thesaurus <- puremoe::data_mesh_thesaurus(use_persistent_storage = TRUE)
trees <- puremoe::data_mesh_trees(use_persistent_storage = TRUE)
```