Now local OCR outputs can be saved to file and reloaded to save preparation time. Bug fixing in logs and tabular data redaction. Update to documentation
Added compatibility with gradio_image_annotation for passing through id and text properties to annotator. Corrected csv location for Textract api calls. Other minor changes
Minor function documentation changes. Requirements update for new Gradio and version of Gradio annotator that allows for saving preferred redaction format and to include box id
Fixed issue in Docker containers built locally without correct folder permissions. Improved config file. Updated Gradio version to fix issue with selecting filtered rows. Minor bug fixes.
Implemented Textract document API calls and associated output tracking/download. Fixes to config and cost code implementation. General minor bug fixes.
Modified config entries to not assume allow list or cost codes file exists. Reduced concurrency to 3 and put input and output files in user subfolders by default
Major update. General code revision. Improved config variables. Dataframe based review frame now includes text, items can be searched and excluded. Costs now estimated. Option for adding cost codes added. Option to extract text only.
Added features to review dataframe to filter and exclude features based on text. Text should now appear consistently in review_df (for boxes not modified). Larger spacy model returned to use. Gradio upgrade.
Now redact on whole PDF mediabox size (larger than viewable size sometimes), then converted back to cropbox size for print and Adobe review. Improved some error raising and app flow