The Real Deal About Word Counts: A Practical Guide
Hello Navigators! 👋
While the industry is starting to embrace AI-powered services with new pricing structures (hourly rates, tokens), word counts remain the fundamental metric for project planning. Understanding your word count helps you navigate traditional per-word pricing. It provides important contrasting information when weighing the pros and cons of human vs AI-driven vs machine translation approaches.
Word counts also are the perfect lead-in to program building. This short article covers the main points to consider if you are scoping a localization project for the first time.
The Foundation: Why Word Counts Matter
Your source word count isn't a single-lane highway—it will branch into multiple 'lanes,' or pricing paths, depending on your destination language and service level.
Think of it like a plumbing system:
- The basic machine translation pipe might cost you $1 for pumping 100 English (United States) words into Spanish (Latin America)
- But route those same 100 English words through a human Japanese translation pipeline, with translation and review stages, and you may be looking at a $100 investment or more
Different language pairs come with different complexity levels and market rates. Just like how installing copper pipes costs more than PVC, translating into Japanese or Korean typically costs more than Spanish or French.
Where to Start
Getting an accurate word count is therefore essential. Here are practical steps to get started.
What NOT to Do (Common Pitfalls)
- Do not hastily rely on website scrapers
They're like giant trawlers that will catch a lot more than what you're fishing for. I recommend scraping only when you have engineering support and sufficient time to let the engineer do a proper scrape.
- Avoid the copy-paste marathon
Sure, it works for a page or two, but it's error prone and not scalable. Work with your web developers or SEs to properly externalize content into a well-formed file format. The time spent harvesting data from the database will be well worth it.
- Never share rough estimates with stakeholders
It's tempting to want to show progress to your manager. I've been there. However, rough numbers will start to shape expectations, regardless of how strenously you caveat them.
Keep reports high level until you've got a qualified set of numbers. Communicate information carefully. Here's a boilerplate script you could use:
"We're currently retrieving content from the database and conducting an initial assessment. The preliminary word count includes everything, even content that is out of scope. As we refine the dataset and apply filtering rules, we will have a clearer picture of the final scope. Right now, I only have early figures that serve as rough directional indicators rather than final counts."
Strategic Communication
Managing expectations goes beyond your immediate supervisor. You'll likely need to ascertain word counts from multiple systems.
Take the time to identify stakeholders in your org who will be impacted by the localization initiative. This paves the way for asking for that 15-30 minutes of time from your colleague at key junctures.
1. Get Engineering Buy-in Early
Document where your content lives and how to access it programmatically across all systems
2. Create a Content Inventory
Map out all content types, from UI strings to marketing materials
3. Plan for Maintenance and Growth
Factor in future content updates and new features
4. Anticipate Product Releases
Take a good look at the roadmap in light of localization and learn about what new features are planned
5. IT and Authentication
Remember to account for content behind login screens
The Playbook
1. Source Data Management
- Get access to your content management system's backend
- Export content in structured formats (XLIFF, XML, JSON, CSV)
- Be discriminating about metadata. Include it where it adds contextual value for the translation process, such as validation fields and UI elements. Exclude Booleans, systems information, etc.
2. Content Scoping
- Map out which content needs translation
- Identify content behind authentication. Work with IT to expose that content to the translation process
- Document the identified 'do not translate' content (proper names, branded terms, product names)
3. Cost Awareness and Impact
- Study the rates shared with you by the LSP. Take time to dig into them and understand what's expensive vs affordable
- Maintain a healthy awareness that automation, like machine translation, is cheaper but comes with its own tradeoffs (needs human post-editing)
- Translation memory is still relevant (see my article here for details). It can reduce costs for repetitive content, which has been the case for decades. It can also be used to train GPT models.
- Special content types (marketing vs technical) will have different rates. Learn the differences
4. Working with an LSP
You will receive an extensive rate sheet from the LSP. I recommend doing some constructive thinking on your own. Create a checklist covering the salient points. This is a good exercise for positioning all the price points and understanding how they impact your budget.
- Total word count by content type
- Per-word price by target language
- Content update frequency
- Technical requirements, one-off tasks (eg, authentication), special workarounds
- Team considerations--who will have access to what
- Timeline expectations
The Bottom Line
The word count is the gateway to your localization program. Getting it right is like having a solid foundation for a house – everything else builds on top of it. Take the time to do it properly, and you'll save yourself from surprises down the road.
Remember: A thorough word count analysis might take more time upfront, but it's the difference between smooth sailing and rough waters in your localization journey.
Need more help? Drop me a line!