document_redaction / tools /data_anonymise.py

Commit History

Now local OCR outputs can be saved to file and reloaded to save preparation time. Bug fixing in logs and tabular data redaction. Update to documentation
f93e49c

seanpedrickcase commited on

Major update. General code revision. Improved config variables. Dataframe based review frame now includes text, items can be searched and excluded. Costs now estimated. Option for adding cost codes added. Option to extract text only.
0ea8b9e

seanpedrickcase commited on

More config options. Fixed some bugs with removing elements from review page and Adobe export. Some UI rearrangements
6319afc

seanpedrickcase commited on

Integrated AWS Comprehend and fuzzy matching functions with tabular data redaction.
ff290e1

seanpedrickcase commited on

Allowed for output files to be saved into user-specific folders. Added deny list capability to xlsx/csv file redaction
dacc782

seanpedrickcase commited on

Zoom and rotate features from forked gradio_annotation package. Fixed csv/xlsx redaction. Updated guide on creating exe.
20d940b

seanpedrickcase commited on

Fuzzy match implementation for deny list. Added option to merge multiple review files. Review files from redaction step should now include text.
bde6e5b

seanpedrickcase commited on

Added support for AWS Comprehend for PII identification. OCR and detection results now written to main output
f0f9378

seanpedrickcase commited on

Optimised Textract and Tesseract workings
8652429

seanpedrickcase commited on

Handles multiple runs with multiple files correctly now. Logging and feedback improvements.
bbf818d

seanpedrickcase commited on

Updated decision making output files, log locations
93ac94f

seanpedrickcase commited on

Decision process now saved as log files. Other log files and feedback added
8c33828

seanpedrickcase commited on

Added logging, anonymising all Excel sheets, simple redaction tags, some Dockerfile optimisation
01c88c0

seanpedrickcase commited on

Can now redaction text or csv/xlsx files. Can redact multiple files. Embeds redactions as image-based file by default
7810536

seanpedrickcase commited on