File size: 4,348 Bytes
ff90387 eea0816 fdd5b10 5ec0e32 eea0816 9e4f92e fdd5b10 5ec0e32 9e4f92e 2113210 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 |
---
title: Named Entity Recognition Tool
emoji: 🌍
colorFrom: purple
colorTo: pink
sdk: gradio
sdk_version: 5.27.0
app_file: app.py
pinned: false
tags:
- tool
---
# Advanced Named Entity Recognition (NER) Tool for smolagents
This repository contains an enhanced Named Entity Recognition tool built for the `smolagents` library from Hugging Face. This tool allows you to:
- Identify named entities (people, organizations, locations, dates, etc.) in text
- Choose from multiple NER models for different languages and use cases
- Configure different output formats and confidence thresholds
- Use with smolagents for AI agents that can understand entities in text
## Installation
```bash
pip install smolagents transformers torch gradio
```
For faster inference on GPU:
```bash
pip install smolagents transformers torch gradio accelerate
```
## Basic Usage
```python
from ner_tool import NamedEntityRecognitionTool
# Initialize the NER tool
ner_tool = NamedEntityRecognitionTool()
# Analyze text with default settings
result = ner_tool("Apple Inc. is planning to open a new store in Paris, France next year.")
print(result)
# Analyze with custom settings
detailed_result = ner_tool(
text="Apple Inc. is planning to open a new store in Paris, France next year.",
model="Babelscape/wikineural-multilingual-ner", # Different model
aggregation="detailed", # More detailed output format
min_score=0.7 # Lower confidence threshold
)
print(detailed_result)
```
## Available Models
The tool includes several pre-configured models:
| Model ID | Description |
|----------|-------------|
| dslim/bert-base-NER | Standard NER (English) - Default |
| jean-baptiste/camembert-ner | French NER |
| Davlan/bert-base-multilingual-cased-ner-hrl | Multilingual NER |
| Babelscape/wikineural-multilingual-ner | WikiNeural Multilingual NER |
| flair/ner-english-ontonotes-large | OntoNotes English (fine-grained) |
| elastic/distilbert-base-cased-finetuned-conll03-english | CoNLL (fast) |
## Output Formats
The tool supports three output formats:
1. **Simple** - A simple list of entities found with their types and confidence scores
2. **Grouped** - Entities grouped by their category (default)
3. **Detailed** - A detailed analysis including the original text with entity markers
## Using with an Agent
```python
from smolagents import CodeAgent, InferenceClientModel
from ner_tool import NamedEntityRecognitionTool
# Initialize the NER tool
ner_tool = NamedEntityRecognitionTool()
# Create an agent model
model = InferenceClientModel(
model_id="mistralai/Mistral-7B-Instruct-v0.2",
token="your_huggingface_token"
)
# Create the agent with our NER tool
agent = CodeAgent(tools=[ner_tool], model=model)
# Run the agent
result = agent.run(
"Analyze this text and identify all entities: 'The European Union and United Kingdom finalized a trade deal on Tuesday.'"
)
print(result)
```
## Interactive Gradio Interface
For an interactive experience, run the Gradio app:
```bash
python gradio_app.py
```
This provides a web interface where you can:
- Enter custom text or select from samples
- Choose different NER models
- Configure display formats and confidence thresholds
- See immediate results
## Customization Options
### Entity Confidence Score
- Use `min_score` parameter to filter entities by confidence
- Range: 0.0 (include all) to 1.0 (only highest confidence)
- Default: 0.8
### Entity Types
The tool can identify various entity types including:
- People (PER, PERSON)
- Organizations (ORG, ORGANIZATION)
- Locations (LOC, LOCATION, GPE)
- Dates and Times (DATE, TIME)
- Money and Percentages (MONEY, PERCENT)
- Products (PRODUCT)
- Events (EVENT)
- Works of Art (WORK_OF_ART)
- Laws (LAW)
- Languages (LANGUAGE)
- Facilities (FAC)
- Miscellaneous (MISC)
The exact entity types available depend on the chosen model.
## Sharing Your Tool
You can share your tool on the Hugging Face Hub:
```python
ner_tool.push_to_hub("your-username/advanced-ner-tool", token="your_huggingface_token")
```
## Limitations
- First-time model loading may take some time
- Some models may require significant memory (especially larger ones)
- Entity recognition accuracy varies by model and language
## Contributing
Contributions are welcome! Feel free to open an issue or submit a pull request.
## License
MIT |