UnGovr Crawler Information

What is UnGovr?

UnGovr is a nonprofit civic infrastructure platform dedicated to making government information accessible, searchable, and machine-readable. We believe that public information should be easy to find and use by residents, journalists, researchers, and civic organizations.

What We Crawl

Our crawler (UnGovrBot) automatically discovers and indexes:

  • Government websites: City, county, state, and federal agency websites
  • Public records: Meeting agendas, minutes, budgets, contracts, and reports
  • Laws and codes: Municipal codes and ordinances (including those hosted on commercial platforms)
  • Public data: Any information that is legally considered a public record under state and federal law

Our Commitment to Responsible Crawling

We take website performance and server load seriously:

  • Respectful crawling: We follow a conservative crawl delay (typically 1-2 seconds between requests)
  • Off-peak hours: When possible, we crawl during low-traffic periods
  • Bandwidth consideration: We limit concurrent requests to avoid overwhelming servers
  • Error handling: We back off immediately if we detect server issues

Robots.txt and Public Records

We generally respect robots.txt directives. However, for certain commercial platforms that host legally public government records (municipal codes, meeting agendas, etc.), we may need to access content even when robots.txt contains restrictive directives.

Why? These records are:

  • Required to be publicly accessible under state and federal public records laws
  • Essential for government transparency and civic participation
  • Already paid for by taxpayer dollars

Our approach:

  • We are extra cautious on these sites (slower crawl rates, careful monitoring)
  • We only access government-specific content (not the entire commercial site)
  • We maintain strict boundaries using entity markers and URL patterns
  • We immediately stop if we detect we’re accessing non-public content

Contact Us

If you have questions, concerns, or need to report an issue:

Email: [email protected]

Common reasons to contact us:

  • Our crawler is causing performance issues on your server
  • We’re accessing content that should not be public
  • You’d like to provide a data feed or API instead of crawling
  • You have questions about what we’re collecting

How to identify our crawler:

User-Agent: Mozilla/5.0 (compatible; UnGovrBot/0.1.XXX; +https://ungovr.org/crawler)

You can use this user agent string for analytics and monitoring. We update the version number (XXX) with each release.

For Webmasters

Preferred: Provide a Data Feed

Rather than crawl, we’d prefer to work with you directly! If you manage a government website or a platform hosting government data, we can:

  • Consume structured data feeds (JSON, XML, APIs)
  • Work with you to schedule crawls during off-peak times
  • Receive notifications when content is updated (webhooks)
  • Collaborate on data standards and formats

Please contact us at [email protected] to discuss options.

If You Must Block Us

We understand there may be legitimate reasons to restrict automated access. If you need to block our crawler:

  1. Specify clearly: Use robots.txt to indicate which paths should not be accessed
  2. Provide reasoning: Contact us to explain why specific content should be blocked
  3. Alternative access: Let us know if there’s another way to access the public data

Note: We may need to override robots.txt for content that is legally required to be public (municipal codes, meeting records, etc.). If this is a concern, please contact us to discuss alternative solutions.

Our Data Practices

What we do with crawled data:

  • Index and make searchable through ungovr.org
  • Provide free public access to all indexed data
  • Offer APIs for developers and researchers
  • Generate transparency metrics and compliance reports

What we DON’T do:

  • Use data for targeted advertising
  • Share personally identifiable information
  • Republish non-public or sensitive information

Our crawling is conducted under the understanding that:

  • Public records laws (FOIA, California PRA, etc.) require government information to be publicly accessible
  • Courts have generally upheld the right to access publicly available web data
  • We operate as a nonprofit public benefit corporation serving the civic good

We are committed to operating within applicable laws and welcome dialogue with webmasters, government agencies, and platform providers.

Technical Details

Crawl Rate: 1-2 seconds between requests (conservative)
Concurrent Requests: 1-2 per domain
Respect for Cache: We use If-Modified-Since headers to avoid re-downloading unchanged content
Peak Hour Avoidance: Scheduled crawls typically run during low-traffic periods

Source: Our crawler is based on open-source tools and follows industry best practices for web scraping and data collection.


Last updated: November 14, 2025