Spaces:
Running
Running
File size: 6,822 Bytes
15236dc de98650 15236dc 894be38 15236dc 3cd10ab ef10e9f 126773e d98d8da 126773e 90fdcda 3cd10ab eab4e05 3c0c20a 6889278 ef10e9f 166ca92 6889278 166ca92 6889278 166ca92 ef10e9f 6889278 ef10e9f 6889278 f970c67 6889278 ef10e9f 3c0c20a 3cd10ab ef10e9f 3c0c20a ef10e9f f94b9ea fc807c3 ef10e9f fc807c3 ef10e9f f94b9ea fc807c3 ef10e9f 6889278 ef10e9f 6889278 ef10e9f 8d0fdb1 9e5bce8 8d0fdb1 ef10e9f 8d0fdb1 ef10e9f 8d0fdb1 42fdd24 ef10e9f 42fdd24 ef10e9f 6889278 fc807c3 6889278 ef10e9f c878cdf |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 |
---
title: ArxivDigest-extra
emoji: π₯
colorFrom: pink
colorTo: red
sdk: gradio
sdk_version: 3.29.0
pinned: false
app_file: src/app.py
license: mit
---
<p align="center"><img src="./readme_images/banner.png" width=500 /></p>
**ArXiv Digest (extra version) and Personalized Recommendations using Large Language Models.**
*(Note: This is an adjusted repo to match my needs. For original repo please refer to **AutoLLM** that I forked from)*
This repo aims to provide a better daily digest for newly published arXiv papers based on your own research interests and natural-language descriptions, using relevancy ratings from GPT.
You can try it out on [Hugging Face](https://huggingface.co/spaces/AutoLLM/ArxivDigest) using your own OpenAI API key.
You can also create a daily subscription pipeline to email you the results.
## π Contents
- [What this repo does](#π-what-this-repo-does)
* [Examples](#some-examples)
- [Usage](#π‘-usage)
* [Running as a github action using SendGrid (Recommended)](#running-as-a-github-action-using-sendgrid-recommended)
* [Running as a github action with SMTP credentials](#running-as-a-github-action-with-smtp-credentials)
* [Running as a github action without emails](#running-as-a-github-action-without-emails)
* [Running from the command line](#running-from-the-command-line)
* [Running with a user interface](#running-with-a-user-interface)
- [Roadmap](#β
-roadmap)
- [Extending and Contributing](#π-extending-and-contributing)
## π What this repo does
Staying up to date on [arXiv](https://arxiv.org) papers can take a considerable amount of time, with on the order of hundreds of new papers each day to filter through. There is an [official daily digest service](https://info.arxiv.org/help/subscribe.html), however large categories like [cs.AI](https://arxiv.org/list/cs.AI/recent) still have 50-100 papers a day. Determining if these papers are relevant and important to you means reading through the title and abstract, which is time-consuming.
This repository offers a method to curate a daily digest, sorted by relevance, using large language models. These models are conditioned based on your personal research interests, which are described in natural language.
* You modify the configuration file `config.yaml` with an arXiv Subject, some set of Categories, and a natural language statement about the type of papers you are interested in.
* The code pulls all the abstracts for papers in those categories and ranks how relevant they are to your interest on a scale of 1-10 using `gpt-3.5-turbo-16k`.
* The code then emits an HTML digest listing all the relevant papers, and optionally emails it to you using [SendGrid](https://sendgrid.com). You will need to have a SendGrid account with an API key for this functionality to work.
### Testing it out with Hugging Face:
We provide a demo at [https://huggingface.co/spaces/AutoLLM/ArxivDigest](https://huggingface.co/spaces/AutoLLM/ArxivDigest). Simply enter your [OpenAI API key](https://platform.openai.com/account/api-keys) and then fill in the configuration on the right. Note that we do not store your key.

You can also send yourself an email of the digest by creating a SendGrid account and [API key](https://app.SendGrid.com/settings/api_keys).
### Some examples of results:
#### Digest Configuration:
- Subject/Topic: Computer Science
- Categories: Artificial Intelligence, Computation and Language, Machine Learning
- Interest:
1. Large language model pretraining and finetunings
2. Multimodal machine learning
3. RAGs, Information retrieval
4. Optimization of LLM and GenAI
5. Do not care about specific application, for example, information extraction, summarization, etc.
#### Result:
<p align="left"><img src="./readme_images/example_custom_1.png" width=580 /></p>
## π‘ Usage
### Running as a github action using SendGrid (Recommended).
The recommended way to get started using this repository is to:
1. Fork the repository
2. Modify `config.yaml` and merge the changes into your main branch.
3. Set the following secrets [(under settings, Secrets and variables, repository secrets)](https://docs.github.com/en/actions/security-guides/encrypted-secrets#creating-encrypted-secrets-for-a-repository). See [Advanced Usage](./advanced_usage.md#create-and-fetch-your-api-keys) for more details on how to create and get OpenAi and SendGrid API keys:
- `OPENAI_API_KEY` From [OpenAI](https://platform.openai.com/account/api-keys)
- `SENDGRID_API_KEY` From [SendGrid](https://app.SendGrid.com/settings/api_keys)
- `FROM_EMAIL` This value must match the email you used to create the SendGrid API Key.
- `TO_EMAIL`
4. Manually trigger the action or wait until the scheduled action takes place.
See [Advanced Usage](./advanced_usage.md) for more details, including step-by-step images, further customization, and alternate usage.
### Running with a user interface
To locally run the same UI as the Huggign Face space:
1. Install the requirements in `src/requirements.txt` as well as `gradio`.
2. Run `python src/app.py` and go to the local URL. From there you will be able to preview the papers from today, as well as the generated digests.
3. If you want to use a `.env` file for your secrets, you can copy `.env.template` to `.env` and then set the environment variables in `.env`.
- Note: These file may be hidden by default in some operating systems due to the dot prefix.
- The .env file is one of the files in .gitignore, so git does not track it and it will not be uploaded to the repository.
- Do not edit the original `.env.template` with your keys or your email address, since `.template.env` is tracked by git and editing it might cause you to commit your secrets.
> **WARNING:** Do not edit and commit your `.env.template` with your personal keys or email address! Doing so may expose these to the world!
## β
Roadmap
- [x] Support personalized paper recommendation using LLM.
- [x] Send emails for daily digest.
- [x] Further read from the paper itself via its HTML format (.pdf version will be implemented in the next phase)
- [ ] Implement a ranking factor to prioritize content from specific authors.
- [ ] Support open-source models, e.g., LLaMA, Vicuna, MPT etc.
- [ ] Fine-tune an open-source model to better support paper ranking and stay updated with the latest research concepts..
## π Extending and Contributing
You may (and are encourage to) modify the code in this repository to suit your personal needs. If you think your modifications would be in any way useful to others, please submit a pull request.
These types of modifications include things like changes to the prompt, different language models, or additional ways for the digest is delivered to you. |