File size: 4,348 Bytes
ff90387
eea0816
 
fdd5b10
5ec0e32
eea0816
9e4f92e
fdd5b10
 
5ec0e32
9e4f92e
2113210
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
---
title: Named Entity Recognition Tool
emoji: 🌍
colorFrom: purple
colorTo: pink
sdk: gradio
sdk_version: 5.27.0
app_file: app.py
pinned: false
tags:
- tool
---

# Advanced Named Entity Recognition (NER) Tool for smolagents

This repository contains an enhanced Named Entity Recognition tool built for the `smolagents` library from Hugging Face. This tool allows you to:

- Identify named entities (people, organizations, locations, dates, etc.) in text
- Choose from multiple NER models for different languages and use cases
- Configure different output formats and confidence thresholds
- Use with smolagents for AI agents that can understand entities in text

## Installation

```bash
pip install smolagents transformers torch gradio
```

For faster inference on GPU:
```bash
pip install smolagents transformers torch gradio accelerate
```

## Basic Usage

```python
from ner_tool import NamedEntityRecognitionTool

# Initialize the NER tool
ner_tool = NamedEntityRecognitionTool()

# Analyze text with default settings
result = ner_tool("Apple Inc. is planning to open a new store in Paris, France next year.")
print(result)

# Analyze with custom settings
detailed_result = ner_tool(
    text="Apple Inc. is planning to open a new store in Paris, France next year.",
    model="Babelscape/wikineural-multilingual-ner",  # Different model
    aggregation="detailed",  # More detailed output format
    min_score=0.7  # Lower confidence threshold
)
print(detailed_result)
```

## Available Models

The tool includes several pre-configured models:

| Model ID | Description |
|----------|-------------|
| dslim/bert-base-NER | Standard NER (English) - Default |
| jean-baptiste/camembert-ner | French NER |
| Davlan/bert-base-multilingual-cased-ner-hrl | Multilingual NER |
| Babelscape/wikineural-multilingual-ner | WikiNeural Multilingual NER |
| flair/ner-english-ontonotes-large | OntoNotes English (fine-grained) |
| elastic/distilbert-base-cased-finetuned-conll03-english | CoNLL (fast) |

## Output Formats

The tool supports three output formats:

1. **Simple** - A simple list of entities found with their types and confidence scores
2. **Grouped** - Entities grouped by their category (default)
3. **Detailed** - A detailed analysis including the original text with entity markers

## Using with an Agent

```python
from smolagents import CodeAgent, InferenceClientModel
from ner_tool import NamedEntityRecognitionTool

# Initialize the NER tool
ner_tool = NamedEntityRecognitionTool()

# Create an agent model
model = InferenceClientModel(
    model_id="mistralai/Mistral-7B-Instruct-v0.2",
    token="your_huggingface_token"
)

# Create the agent with our NER tool
agent = CodeAgent(tools=[ner_tool], model=model)

# Run the agent
result = agent.run(
    "Analyze this text and identify all entities: 'The European Union and United Kingdom finalized a trade deal on Tuesday.'"
)
print(result)
```

## Interactive Gradio Interface

For an interactive experience, run the Gradio app:

```bash
python gradio_app.py
```

This provides a web interface where you can:
- Enter custom text or select from samples
- Choose different NER models
- Configure display formats and confidence thresholds
- See immediate results

## Customization Options

### Entity Confidence Score

- Use `min_score` parameter to filter entities by confidence
- Range: 0.0 (include all) to 1.0 (only highest confidence)
- Default: 0.8

### Entity Types

The tool can identify various entity types including:
- People (PER, PERSON)
- Organizations (ORG, ORGANIZATION)
- Locations (LOC, LOCATION, GPE)
- Dates and Times (DATE, TIME)
- Money and Percentages (MONEY, PERCENT)
- Products (PRODUCT)
- Events (EVENT)
- Works of Art (WORK_OF_ART)
- Laws (LAW)
- Languages (LANGUAGE)
- Facilities (FAC)
- Miscellaneous (MISC)

The exact entity types available depend on the chosen model.

## Sharing Your Tool

You can share your tool on the Hugging Face Hub:

```python
ner_tool.push_to_hub("your-username/advanced-ner-tool", token="your_huggingface_token")
```

## Limitations

- First-time model loading may take some time
- Some models may require significant memory (especially larger ones)
- Entity recognition accuracy varies by model and language

## Contributing

Contributions are welcome! Feel free to open an issue or submit a pull request.

## License

MIT