# Configuration Guide This document provides detailed instructions for configuring and deploying Anveshak: Spirituality Q&A, covering environment setup, authentication, customization options, and deployment strategies. ## Environment Configuration ### Configuration Parameters Anveshak: Spirituality Q&A uses the following configuration parameters, which can be set through environment variables or Hugging Face Spaces secrets: | Parameter | Description | Example Value | |-----------|-------------|---------------| | `BUCKET_NAME_GCS` | GCS bucket name for data storage | `"your-bucket-name"` | | `METADATA_PATH_GCS` | Path to metadata file in GCS | `"metadata/metadata.jsonl"` | | `EMBEDDINGS_PATH_GCS` | Path to embeddings file in GCS | `"processed/embeddings/all_embeddings.npy"` | | `INDICES_PATH_GCS` | Path to FAISS index in GCS | `"processed/indices/faiss_index.faiss"` | | `CHUNKS_PATH_GCS` | Path to text chunks file in GCS | `"processed/chunks/text_chunks.txt"` | | `RAW_TEXTS_UPLOADED_PATH_GCS` | Path to uploaded raw texts in GCS | `"raw-texts/uploaded"` | | `RAW_TEXTS_DOWNLOADED_PATH_GCS` | Path to downloaded raw texts in GCS | `"raw-texts/downloaded/"` | | `CLEANED_TEXTS_PATH_GCS` | Path to cleaned texts in GCS | `"cleaned-texts/"` | | `EMBEDDING_MODEL` | Hugging Face model ID for embeddings | `"intfloat/e5-large-v2"` | | `LLM_MODEL` | OpenAI model for answer generation | `"gpt-3.5-turbo"` | | `OPENAI_API_KEY` | OpenAI API key | `"sk-..."` | | `GCP_CREDENTIALS` | GCP service account credentials (JSON) | `{"type":"service_account",...}` | ### Streamlit Secrets Configuration (Optional) If developing locally with Streamlit, you can create a `.streamlit/secrets.toml` file with the following structure: ```toml # GCS Configuration BUCKET_NAME_GCS = "your-bucket-name" METADATA_PATH_GCS = "metadata/metadata.jsonl" EMBEDDINGS_PATH_GCS = "processed/embeddings/all_embeddings.npy" INDICES_PATH_GCS = "processed/indices/faiss_index.faiss" CHUNKS_PATH_GCS = "processed/chunks/text_chunks.txt" RAW_TEXTS_UPLOADED_PATH_GCS = "raw-texts/uploaded" RAW_TEXTS_DOWNLOADED_PATH_GCS = "raw-texts/downloaded/" CLEANED_TEXTS_PATH_GCS = "cleaned-texts/" EMBEDDING_MODEL = "intfloat/e5-large-v2" LLM_MODEL = "gpt-3.5-turbo" # OpenAI API Configuration openai_api_key = "your-openai-api-key" # GCP Service Account Credentials (JSON format) [gcp_credentials] type = "service_account" project_id = "your-project-id" private_key_id = "your-private-key-id" private_key = "your-private-key" client_email = "your-client-email" client_id = "your-client-id" auth_uri = "https://accounts.google.com/o/oauth2/auth" token_uri = "https://oauth2.googleapis.com/token" auth_provider_x509_cert_url = "https://www.googleapis.com/oauth2/v1/certs" client_x509_cert_url = "your-client-cert-url" ``` ### Environment Variables for Alternative Deployments For deployments that support environment variables (like Heroku or Docker), you can use the following environment variables: ```bash # GCS Configuration export BUCKET_NAME_GCS="your-bucket-name" export METADATA_PATH_GCS="metadata/metadata.jsonl" export EMBEDDINGS_PATH_GCS="processed/embeddings/all_embeddings.npy" export INDICES_PATH_GCS="processed/indices/faiss_index.faiss" export CHUNKS_PATH_GCS="processed/chunks/text_chunks.txt" export RAW_TEXTS_UPLOADED_PATH_GCS="raw-texts/uploaded" export RAW_TEXTS_DOWNLOADED_PATH_GCS="raw-texts/downloaded/" export CLEANED_TEXTS_PATH_GCS="cleaned-texts/" export EMBEDDING_MODEL="intfloat/e5-large-v2" export LLM_MODEL="gpt-3.5-turbo" # OpenAI API Configuration export OPENAI_API_KEY="your-openai-api-key" # GCP Service Account (as a JSON string) export GCP_CREDENTIALS='{"type":"service_account","project_id":"your-project-id",...}' ``` ## Authentication Setup ### Google Cloud Storage (GCS) Authentication Anveshak: Spirituality Q&A supports multiple methods for authenticating with GCS: #### Setting Up a GCP Service Account (Required) Before configuring authentication methods, you'll need to create a Google Cloud Platform (GCP) service account: 1. **Create a GCP project** (if you don't already have one): - Go to the [Google Cloud Console](https://console.cloud.google.com/) - Click on "Select a project" at the top right and then "New Project" - Enter a project name and click "Create" 2. **Enable the Cloud Storage API**: - Go to "APIs & Services" > "Library" in the left sidebar - Search for "Cloud Storage" - Click on "Cloud Storage API" and then "Enable" 3. **Create a service account**: - Go to "IAM & Admin" > "Service Accounts" in the left sidebar - Click "Create Service Account" - Enter a service account name and description - Click "Create and Continue" 4. **Assign roles to the service account**: - Add the "Storage Object Admin" role for access to GCS objects - Add the "Viewer" role for basic read permissions - Click "Continue" and then "Done" 5. **Create and download service account key**: - Find your new service account in the list and click on it - Go to the "Keys" tab - Click "Add Key" > "Create new key" - Choose "JSON" as the key type - Click "Create" to download the key file (This is your GCP credentials JSON file) 6. **Create a GCS bucket**: - Go to "Cloud Storage" > "Buckets" in the left sidebar - Click "Create" - Enter a globally unique bucket name - Choose your settings for location, class, and access control - Click "Create" Once you have created your service account and GCS bucket, you can use any of the following authentication methods: #### Option 1: HF Spaces Environment Variable (Recommended Production Method) For Hugging Face Spaces, set the `GCP_CREDENTIALS` environment variable in the Spaces UI: 1. Go to your Space settings 2. Under "Repository secrets" 3. Add a new secret with name `GCP_CREDENTIALS` and value containing your JSON credentials #### Option 2: Local Development with Application Default Credentials For local development, you can use Application Default Credentials: ```bash # Export path to your service account key file export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your-service-account-file.json" ``` #### Option 3: Streamlit Secrets Add your service account credentials to the `.streamlit/secrets.toml` file as shown in the example above. The authentication logic is handled by the `setup_gcp_auth()` function in `utils.py`: ```python def setup_gcp_auth(): """ Setup Google Cloud Platform (GCP) authentication using various methods. This function tries multiple authentication methods in order of preference: 1. HF Spaces environment variable (GCP_CREDENTIALS) - primary production method 2. Local environment variable pointing to credentials file (GOOGLE_APPLICATION_CREDENTIALS) 3. Streamlit secrets (gcp_credentials) Note: In production, credentials are stored exclusively in HF Spaces secrets. """ try: # Option 1: HF Spaces environment variable if "GCP_CREDENTIALS" in os.environ: gcp_credentials = json.loads(os.getenv("GCP_CREDENTIALS")) print("✅ Using GCP credentials from HF Spaces environment variable") credentials = service_account.Credentials.from_service_account_info(gcp_credentials) return credentials # Option 2: Local environment variable pointing to file elif "GOOGLE_APPLICATION_CREDENTIALS" in os.environ: credentials_path = os.environ["GOOGLE_APPLICATION_CREDENTIALS"] print(f"✅ Using GCP credentials from file at {credentials_path}") credentials = service_account.Credentials.from_service_account_file(credentials_path) return credentials # Option 3: Streamlit secrets elif "gcp_credentials" in st.secrets: gcp_credentials = st.secrets["gcp_credentials"] # Handle different secret formats if isinstance(gcp_credentials, dict) or hasattr(gcp_credentials, 'to_dict'): # Convert AttrDict to dict if needed if hasattr(gcp_credentials, 'to_dict'): gcp_credentials = gcp_credentials.to_dict() print("✅ Using GCP credentials from Streamlit secrets (dict format)") credentials = service_account.Credentials.from_service_account_info(gcp_credentials) return credentials else: # Assume it's a JSON string try: gcp_credentials_dict = json.loads(gcp_credentials) print("✅ Using GCP credentials from Streamlit secrets (JSON string)") credentials = service_account.Credentials.from_service_account_info(gcp_credentials_dict) return credentials except json.JSONDecodeError: print("⚠️ GCP credentials in Streamlit secrets is not valid JSON, trying as file path") if os.path.exists(gcp_credentials): credentials = service_account.Credentials.from_service_account_file(gcp_credentials) return credentials else: raise ValueError("GCP credentials format not recognized") else: raise ValueError("No GCP credentials found in environment or Streamlit secrets") except Exception as e: error_msg = f"❌ Authentication error: {str(e)}" print(error_msg) st.error(error_msg) raise ``` ### OpenAI API Authentication Similarly, OpenAI API authentication can be configured in multiple ways: #### Option 1: HF Spaces Environment Variable (Recommended Production Method) Set the `OPENAI_API_KEY` environment variable in the Hugging Face Spaces UI. #### Option 2: Environment Variables Set the `OPENAI_API_KEY` environment variable: ```bash export OPENAI_API_KEY="your-openai-api-key" ``` #### Option 3: Streamlit Secrets Add your OpenAI API key to the `.streamlit/secrets.toml` file: ```toml openai_api_key = "your-openai-api-key" ``` The authentication logic is handled by the `setup_openai_auth()` function in `utils.py`: ```python def setup_openai_auth(): """ Setup OpenAI API authentication using various methods. This function tries multiple authentication methods in order of preference: 1. Standard environment variable (OPENAI_API_KEY) 2. HF Spaces environment variable (OPENAI_KEY) - primary production method 3. Streamlit secrets (openai_api_key) Note: In production, the API key is stored exclusively in HF Spaces secrets. """ try: # Option 1: Standard environment variable if "OPENAI_API_KEY" in os.environ: openai.api_key = os.getenv("OPENAI_API_KEY") print("✅ Using OpenAI API key from environment variable") return # Option 2: HF Spaces environment variable with different name elif "OPENAI_KEY" in os.environ: openai.api_key = os.getenv("OPENAI_KEY") print("✅ Using OpenAI API key from HF Spaces environment variable") return # Option 3: Streamlit secrets elif "openai_api_key" in st.secrets: openai.api_key = st.secrets["openai_api_key"] print("✅ Using OpenAI API key from Streamlit secrets") return else: raise ValueError("No OpenAI API key found in environment or Streamlit secrets") except Exception as e: error_msg = f"❌ OpenAI authentication error: {str(e)}" print(error_msg) st.error(error_msg) raise ``` ## Application Customization ### UI Customization Anveshak's UI can be customized through the CSS in the `app.py` file: ```python # Custom CSS st.markdown("""
Anveshak
Spirituality Q&A
""", unsafe_allow_html=True) ``` To change the appearance: 1. Modify the CSS variables in the `