feat: Enable parsing of rich (smart) tags in google docs #215
feat: Enable parsing of rich (smart) tags in google docs #215MrwanBaghdad wants to merge 3 commits intogemini-cli-extensions:mainfrom
Conversation
Currently docs.getText skip over smart chip elements within Google docs. Leaving out information. This commit adds support to render smart chip elements in google docs to simple text
Summary of ChangesHello @MrwanBaghdad, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request successfully adds support for parsing rich 'smart tags' (like person, date, and rich link elements) from Google Docs, which was previously a missing feature. The implementation correctly extracts text from these elements and includes fallback logic. The accompanying tests are thorough and cover the new functionality and its edge cases.
I've provided a couple of suggestions to improve code consistency and reduce duplication in tests, which should enhance long-term maintainability.
| if (pElement.textRun && pElement.textRun.content) { | ||
| text += pElement.textRun.content; | ||
| } else if (pElement.person?.personProperties) { | ||
| const { name, email } = pElement.person.personProperties; | ||
| text += name || email || ''; | ||
| } else if (pElement.richLink?.richLinkProperties) { | ||
| const { title, uri } = pElement.richLink.richLinkProperties; | ||
| text += title || uri || ''; | ||
| } else if (pElement.dateElement?.dateElementProperties) { | ||
| const { displayText, timestamp } = | ||
| pElement.dateElement.dateElementProperties; | ||
| text += displayText || timestamp || ''; | ||
| } |
There was a problem hiding this comment.
For improved consistency and robustness, consider using optional chaining (?.) for pElement.textRun and the nullish coalescing operator (??) for fallbacks.
Using pElement.textRun?.content aligns with how other optional properties are checked in this block.
Using ?? instead of || is generally safer as it only falls back for null or undefined, not for other falsy values like an empty string (''). While || works in this specific case, ?? is more explicit about the intended behavior and prevents potential bugs if an empty string becomes a valid, distinct value in the future.
| if (pElement.textRun && pElement.textRun.content) { | |
| text += pElement.textRun.content; | |
| } else if (pElement.person?.personProperties) { | |
| const { name, email } = pElement.person.personProperties; | |
| text += name || email || ''; | |
| } else if (pElement.richLink?.richLinkProperties) { | |
| const { title, uri } = pElement.richLink.richLinkProperties; | |
| text += title || uri || ''; | |
| } else if (pElement.dateElement?.dateElementProperties) { | |
| const { displayText, timestamp } = | |
| pElement.dateElement.dateElementProperties; | |
| text += displayText || timestamp || ''; | |
| } | |
| if (pElement.textRun?.content) { | |
| text += pElement.textRun.content; | |
| } else if (pElement.person?.personProperties) { | |
| const { name, email } = pElement.person.personProperties; | |
| text += name ?? email ?? ''; | |
| } else if (pElement.richLink?.richLinkProperties) { | |
| const { title, uri } = pElement.richLink.richLinkProperties; | |
| text += title ?? uri ?? ''; | |
| } else if (pElement.dateElement?.dateElementProperties) { | |
| const { displayText, timestamp } = | |
| pElement.dateElement.dateElementProperties; | |
| text += displayText ?? timestamp ?? ''; | |
| } |
There was a problem hiding this comment.
Double check your understanding from here, most of these fields "always exist" according to the client library
https://github.com/googleapis/google-api-nodejs-client/blob/main/src/apis/docs/v1.ts#L2204
| it('should fall back to email when person name is not available', async () => { | ||
| const mockDoc = { | ||
| data: { | ||
| tabs: [ | ||
| { | ||
| documentTab: { | ||
| body: { | ||
| content: [ | ||
| { | ||
| paragraph: { | ||
| elements: [ | ||
| { | ||
| person: { | ||
| personProperties: { | ||
| email: 'jane@example.com', | ||
| }, | ||
| }, | ||
| }, | ||
| ], | ||
| }, | ||
| }, | ||
| ], | ||
| }, | ||
| }, | ||
| }, | ||
| ], | ||
| }, | ||
| }; | ||
| mockDocsAPI.documents.get.mockResolvedValue(mockDoc); | ||
|
|
||
| const result = await docsService.getText({ documentId: 'test-doc-id' }); | ||
|
|
||
| expect(result.content[0].text).toBe('jane@example.com'); | ||
| }); | ||
|
|
||
| it('should fall back to uri when rich link title is not available', async () => { | ||
| const mockDoc = { | ||
| data: { | ||
| tabs: [ | ||
| { | ||
| documentTab: { | ||
| body: { | ||
| content: [ | ||
| { | ||
| paragraph: { | ||
| elements: [ | ||
| { | ||
| richLink: { | ||
| richLinkProperties: { | ||
| uri: 'https://docs.google.com/spreadsheets/d/xyz', | ||
| }, | ||
| }, | ||
| }, | ||
| ], | ||
| }, | ||
| }, | ||
| ], | ||
| }, | ||
| }, | ||
| }, | ||
| ], | ||
| }, | ||
| }; | ||
| mockDocsAPI.documents.get.mockResolvedValue(mockDoc); | ||
|
|
||
| const result = await docsService.getText({ documentId: 'test-doc-id' }); | ||
|
|
||
| expect(result.content[0].text).toBe( | ||
| 'https://docs.google.com/spreadsheets/d/xyz', | ||
| ); | ||
| }); | ||
|
|
||
| it('should fall back to timestamp when date displayText is not available', async () => { | ||
| const mockDoc = { | ||
| data: { | ||
| tabs: [ | ||
| { | ||
| documentTab: { | ||
| body: { | ||
| content: [ | ||
| { | ||
| paragraph: { | ||
| elements: [ | ||
| { | ||
| dateElement: { | ||
| dateElementProperties: { | ||
| timestamp: '1736899200', | ||
| }, | ||
| }, | ||
| }, | ||
| ], | ||
| }, | ||
| }, | ||
| ], | ||
| }, | ||
| }, | ||
| }, | ||
| ], | ||
| }, | ||
| }; | ||
| mockDocsAPI.documents.get.mockResolvedValue(mockDoc); | ||
|
|
||
| const result = await docsService.getText({ documentId: 'test-doc-id' }); | ||
|
|
||
| expect(result.content[0].text).toBe('1736899200'); | ||
| }); |
There was a problem hiding this comment.
The tests for fallback scenarios are quite similar and contain a lot of boilerplate code. To improve maintainability and reduce duplication, you could use Jest's it.each to parameterize these tests. This would make the test suite more concise and easier to extend with new fallback cases in the future.
it.each([
{
description: 'should fall back to email when person name is not available',
elements: [
{
person: {
personProperties: {
email: 'jane@example.com',
},
},
},
],
expectedText: 'jane@example.com',
},
{
description: 'should fall back to uri when rich link title is not available',
elements: [
{
richLink: {
richLinkProperties: {
uri: 'https://docs.google.com/spreadsheets/d/xyz',
},
},
},
],
expectedText: 'https://docs.google.com/spreadsheets/d/xyz',
},
{
description: 'should fall back to timestamp when date displayText is not available',
elements: [
{
dateElement: {
dateElementProperties: {
timestamp: '1736899200',
},
},
},
],
expectedText: '1736899200',
},
])('$description', async ({ elements, expectedText }) => {
const mockDoc = {
data: {
tabs: [
{
documentTab: {
body: {
content: [
{
paragraph: {
elements,
},
},
],
},
},
},
],
},
};
mockDocsAPI.documents.get.mockResolvedValue(mockDoc);
const result = await docsService.getText({ documentId: 'test-doc-id' });
expect(result.content[0].text).toBe(expectedText);
});|
Hey @allenhutchison I am submitting this change as it's a required feature by my organisation. Please let me know if you have any comments or feedbacks. I'm happy to incorporate them in the PR |
allenhutchison
left a comment
There was a problem hiding this comment.
Hey @MrwanBaghdad, thanks for the PR! This is a really useful addition — smart chips were definitely a gap in the text extraction. A few things to address before we merge:
Must Fix
1. CI — Prettier Formatting
The verify step is failing because of formatting issues in the test file. Please run:
npx prettier --write workspace-server/src/__tests__/services/DocsService.test.ts2. Rich Link Fallback Bug
The person and dateElement cases handle missing fields with fallbacks (name || email, displayText || timestamp), but the richLink case doesn't handle a missing title:
// Current — if title is undefined, renders as [undefined](url)
text += `[${title}](${uri})`;
// Should be
text += `[${title || uri}](${uri})`;Also consider what the Gemini bot suggested here — using ?? instead of || across all three cases for more precise nullish handling.
3. Missing Fallback Test for Rich Links
There are fallback tests for person (no name → email) and dateElement (no displayText → timestamp), but no test for richLink without a title. Please add one to match the coverage pattern.
Suggested Improvement
4. Use it.each for Fallback Tests
+1 to the Gemini bot's suggestion here — the fallback tests share a lot of boilerplate. Using it.each would make the test suite more concise and easier to extend when new smart chip types are added. It also makes it trivial to add the missing rich link fallback case. The template from the bot's review is a great starting point.
Overall this is a clean, well-scoped change. Looking forward to the update! 🚀
Follow-up: On the Gemini Bot's First Suggestion (
|
| Type | Field | API Doc |
|---|---|---|
PersonProperties |
email |
"This field is always present." |
PersonProperties |
name |
"The name...if it's displayed" (not always present) |
RichLinkProperties |
title |
"This field is always present." |
RichLinkProperties |
uri |
"This is always present." |
DateElementProperties |
displayText |
"Output only." (no always-present guarantee) |
DateElementProperties |
timestamp |
(no always-present guarantee) |
Verdict: @MrwanBaghdad is right that most of these fields are documented as always present. However, the TypeScript types are still string | null, so defensive fallbacks are still good practice.
More importantly, the Gemini bot's suggestion to switch to plain text (title ?? uri ?? '') would lose the markdown link rendering, which is the better design for LLM consumption. The current approach of rendering [title](uri) and [name](mailto:email) is clearly preferable.
That said, please still add a title || uri fallback for the rich link case to match the defensive pattern used in the person and dateElement cases. And on the ?? vs || question — || is actually better here since falling back on empty string is the right behavior.
TL;DR on the Gemini bot's suggestion #1
- ❌ Don't switch to plain text — markdown links are better
- ❌
??vs||— not worth changing,||is arguably more correct here - ✅ Do add a
title || urifallback forrichLinkto be consistent
Reiteration: the Gemini bot's suggestion #2 (it.each)
- ✅ Strongly recommend adopting the
it.eachpattern for the fallback tests — it reduces boilerplate and makes adding the missing rich link fallback test trivial
Currently docs.getText skip over smart chip elements within Google docs.
Leaving out information.
This commit adds support to render smart chip elements in google docs to
simple text