Skip to content

24th February 2025

Activism

Hacktivism

Collectives

  • New: Add critical switch.

    Critical Switch: una colectiva transhackfeminista no mixta1 interesades en la cultura libre, la privacidad y la seguridad digital. Promovemos la cultura de la seguridad para generar espacios más seguros en los movimientos sociales y activistas.

  • New: Add méxico collectives.

Laboral

Trabajadoras del hogar

  • New: Introducir investigación sobre las trabajadoras del hogar.

    Nota: para nada soy un experto en este tema, estas son las claves que he ido deduciendo a través de hablar del tema con trabajadoras y gestoras. Así que verificar todo antes de tomarlo por verdad!

    Cuidado de personas mayores dependientes

    El cuidado de las personas mayores es una movida, especialmente cuando empiezan a ser dependientes. Normalmente necesitan cuidados la mayor parte del tiempo del día. Actualmente existen las siguientes opciones para impartir dichos cuidados:

    • La red cercana (normalmente las mujeres de la familia) se encargan de dichos cuidados.
    • Parte o todos los cuidados se externalizan ya sea a una residencia de ancianos, centros de día o contratando a trabajadoras que acuden al hogar.

    En este mundo podrido donde los servicios públicos están siendo desmantelados, la oferta pública de centros de día o residencias es insuficiente y generalmente en manos de políticas incompetentes (nunca olvidemos las 7291 muertes que pesan sobre los hombros de Ayuso {hija de puta!}).

    Esto sumado a que las mujeres de la familia ahora trabajan y la precarización del sector de las empleadas del hogar hace que (sobre todo peña que tiene pasta) recurra a contratar trabajadoras en régimen de interna.

    Trabajo del hogar en régimen de interna

    En realidad este trabajo es esclavismo encubierto bajo una pátina legal. Aprovechandose de que la profesión está feminizada y generalmente por personas migrantes, se imponen unas condiciones laborales que no cumplen el estatuto de las trabajadoras.

    Con un salario que normalmente no supera el mínimo estas trabajadoras:

    • Trabajan muchas más horas que las 40 a la semana
    • Hacen trabajos por fuera de su contrato como limpiar la casa o servir la comida.
    • Se encuentran encerradas en su lugar de trabajo. Incluso cuando "dejan de trabajar" están dentro del control de sus empleadores.
    • Al encontrarse solas con las personas a las que cuidan, 24 horas al día, es común encontrarse casos de violencia de género. Muchas cuentan testimonios de atrancar la puerta de su cuarto por la noche.
    • Los espacios que se les ceden (habitaciones o cuartos de baño) no se respetan y generalmente son usados por otras personas de la familia cuando lo desean, arrebatándoles incluso su cuarto propio.
    • Generalmente tienen unas dos horas al día para librar. Pero generalmente trabajan en barrios muy lejanos de su hogar, con precios y oferta de ocio muy lejana a sus posibilidades, así que normalmente usan esas horas para pasear. En invierno se pone más complicado con el frío y la lluvia.
    • Las que libran los fines de semana tienen que pagar una habitación o piso que sólo pueden disfrutar unos pocos días a la semana.
    • Tienen que soportar el maltrato y la tiranía propia de las personas mayores que ya empiezan a perder la cabeza. A esas edades se exacerban el clasismo y el racismo a la vez que desaparecen los mecanismos de control propio y filtro. Lo que genera situaciones muy desagradables que muchas veces desembocan en maltrato psicológico y físico.

    Y aunque esto es harto conocido por la sociedad, es un modelo que se sigue usando con frecuencia.

    Horario de trabajo

    En algunos casos las trabajadoras libran el fin de semana, 36 horas consecutivas según la ley, lo que podría ser de sábado a las 9:00 hasta el domingo a las 21:00, además de 2 horas al día (no remuneradas) en los días de entre semana. Esto hace un total de 122 horas trabajadas a la semana, mucho mayor de las 40 horas establecidas.

    Y aunque en teoría los empleadores están obligados a implementar un sistema para registrar la jornada de sus trabajadoras se torna difícil en la práctica

    La ley además establece que además de las 40 horas semanales se pueden tener 20 horas extras de presencia. Las horas de presencia se pagan a precio de hora normal porque el régimen de empleadas de hogar lo establece así. No es como otros convenios.

    La mayor parte de las trabajadoras no conoce que tienen derecho a estas 20 horas adicionales. Esas horas de presencia se pueden reclamar por los últimos 12 meses, las anteriores prescriben. Esto pueden ser unos 18.000 euros a reclamar. Para ello hacen falta pruebas de que la trabajadora está haciendo esa jornada. Una manera de pelearlo es pedir al empleador que justifique qué otras personas tiene contratadas para cuidar a la persona dependiente. Porque la prueba dentro de un domicilio es muy difícil. Si en el contrato no figura ningún horario se puede asumir que es 24h. También se puede preguntar a vecinos o si está empadronada.

    Lo que si es claro es que la empresa tiene que definir el horario de trabajo en el contrato, lo que no siempre hacen.

    Pernocta

    En el contrato tiene que figurar si la trabajadora duerme en el lugar de trabajo.

    Es muy difícil de regular las veces que se despiertan en la noche, así que pelear eso es aún complicado. Aunque se van haciendo avances.

    Violencia y acoso en el empleo doméstico

    Según el Real Decreto 893/2024 a ojos de noticias.juridicas.com:

    El abandono del domicilio ante una situación de violencia o acoso sufrida por la persona trabajadora no podrá considerarse dimisión ni podrá ser causa de despido, sin perjuicio de la posibilidad de la persona trabajadora de solicitar la extinción del contrato en virtud del artículo 50 ET y de la solicitud de medidas cautelares en caso de formulación de demandas, de conformidad con la LRJS.

    Salario

    Lo normal es que se pague el salario mínimo, aunque Cuatro de cada diez ni llegan a eso. Hay que tener en cuenta que el salario mínimo ha sido actualizado desde enero de 2025 a 1383 euros. Es probable que a muchas ni se lo suban.

    Si se cuentan las 20 horas de presencia, el salario serían aproximadamente 2000 euros al mes (2094 según la tabla salarial de senda de cuidados de 2024 para un régimen de 6 noches a la semana).

    Pelear por sus derechos*

    Normalmente aunque les cuentes todos los derechos que tienen, las trabajadoras no quieren reclamar ni ejercer sus derechos porque no se atreven. Por miedo a perder el trabajo u otras represalias.

    Datos personales

    Las asesorías pueden sacar el número de la seguridad social con un nombre y un DNI. Esto se hace para facilitar los trámites. Pero si no lo sabes puede rayarte.

    Forma de pago

    El empleador es el encargado de hacer la transferencia al trabajador como una nómina, no como una transferencia regular. Ya que si no el trabajador no obtiene las bonificaciones de tener domiciliada la nómina.

    Aunque haya agencias de por medio, estas generalmente hacen de intermediarias y sólo un trabajo de asesoría, por lo tanto el contrato se suele hacer con la familia de la persona que es cuidada, y esta es la que ha de hacer el ingreso a la trabajadora. A no ser que la agencia sea una ETT de empleadas del hogar, que en ese caso es la agencia la que las contrata directamente.

    Denuncias de inspección de trabajo

    Las denuncias de inspección de trabajo en este régimen tienen un recorrido diferente dependiendo de a qué inspector le toque, porque como inspección de trabajo no puede entrar en los domicilios particulares por sorpresa aunque sea una empresa. Entonces hay inspectores que las denuncias por maltrato de las empleadas del hogar las meten en el cajón. Otros no, requieren y de más.

    Impuestos

    En las nóminas de las empleadas del hogar no hay IRPF. Normalmente el salario y la prorrata se desglosa, si está junto es una nómina cutre.

    Papeleos

    Una vez firmado el contrato, la empresa ha de entregar a la trabajadora la huella digital de su contrato comunicado al servicio público de empleo. Cuando haces un alta de una trabajadora hay que mandar dos ficheros, uno a la tesorería general con el alta y otro al servicio público de empleo estatal con el contrato. Si no lo hacen es un defecto formal, no es gravísimo.

    Referencias

    Empresas decentes: No todo es horrendo, existen cooperativas de trabajadoras que ofrecen estos servicios bajo unas condiciones que ellas han decidido:

    Senda de cuidados publica su tabla salarial en la que te puedes hacer una idea del salario y de los diferentes tipos de régimen de trabajo.

    Mejoras legales

    Artículos sobre el trabajo interno

Conference organisation

pretalx

Life Management

Time management

Org Mode

  • New: Footnotes.

    A footnote is started by a footnote marker in square brackets in column 0, no indentation allowed. It ends at the next footnote definition, headline, or after two consecutive empty lines. The footnote reference is simply the marker in square brackets, inside text. Markers always start with ‘fn:’. For example:

    The Org website[fn:1] now looks a lot better than it used to.
    ...
    [fn:50] The link is: https://orgmode.org
    

    Nvim-orgmode has some basic support for footnotes.

  • New: Custom agendas.

    You an use custom agenda commands

    Define custom agenda views that are available through the org_agenda mapping. It is possible to combine multiple agenda types into single view. An example:

    require('orgmode').setup({
      org_agenda_files = {'~/org/**/*'},
      org_agenda_custom_commands = {
        -- "c" is the shortcut that will be used in the prompt
        c = {
          description = 'Combined view', -- Description shown in the prompt for the shortcut
          types = {
            {
              type = 'tags_todo', -- Type can be agenda | tags | tags_todo
              match = '+PRIORITY="A"', --Same as providing a "Match:" for tags view <leader>oa + m, See: https://orgmode.org/manual/Matching-tags-and-properties.html
              org_agenda_overriding_header = 'High priority todos',
              org_agenda_todo_ignore_deadlines = 'far', -- Ignore all deadlines that are too far in future (over org_deadline_warning_days). Possible values: all | near | far | past | future
            },
            {
              type = 'agenda',
              org_agenda_overriding_header = 'My daily agenda',
              org_agenda_span = 'day' -- can be any value as org_agenda_span
            },
            {
              type = 'tags',
              match = 'WORK', --Same as providing a "Match:" for tags view <leader>oa + m, See: https://orgmode.org/manual/Matching-tags-and-properties.html
              org_agenda_overriding_header = 'My work todos',
              org_agenda_todo_ignore_scheduled = 'all', -- Ignore all headlines that are scheduled. Possible values: past | future | all
            },
            {
              type = 'agenda',
              org_agenda_overriding_header = 'Whole week overview',
              org_agenda_span = 'week', -- 'week' is default, so it's not necessary here, just an example
              org_agenda_start_on_weekday = 1 -- Start on Monday
              org_agenda_remove_tags = true -- Do not show tags only for this view
            },
         }
        },
        p = {
          description = 'Personal agenda',
          types = {
            {
              type = 'tags_todo',
              org_agenda_overriding_header = 'My personal todos',
              org_agenda_category_filter_preset = 'todos', -- Show only headlines from `todos` category. Same value providad as when pressing `/` in the Agenda view
              org_agenda_sorting_strategy = {'todo-state-up', 'priority-down'} -- See all options available on org_agenda_sorting_strategy
            },
            {
              type = 'agenda',
              org_agenda_overriding_header = 'Personal projects agenda',
              org_agenda_files = {'~/my-projects/**/*'}, -- Can define files outside of the default org_agenda_files
            },
            {
              type = 'tags',
              org_agenda_overriding_header = 'Personal projects notes',
              org_agenda_files = {'~/my-projects/**/*'},
              org_agenda_tag_filter_preset = 'NOTES-REFACTOR' -- Show only headlines with NOTES tag that does not have a REFACTOR tag. Same value providad as when pressing `/` in the Agenda view
            },
          }
        }
      }
    })
    

    You can also define the org_agenda_sorting_strategy. The default value is { agenda = {'time-up', 'priority-down', 'category-keep'}, todo = {'priority-down', 'category-keep'}, tags = {'priority-down', 'category-keep'}}.

    The available list of sorting strategies to apply to a given view are:

    • time-up: Sort entries by time of day. Applicable only in agenda view
    • time-down: Opposite of time-up
    • priority-down: Sort by priority, from highest to lowest
    • priority-up: Sort by priority, from lowest to highest
    • tag-up: Sort by sorted tags string, ascending
    • tag-down: Sort by sorted tags string, descending
    • todo-state-up: Sort by todo keyword by position (example: 'TODO, PROGRESS, DONE' has a sort value of 1, 2 and 3), ascending
    • todo-state-down: Sort by todo keyword, descending
    • clocked-up: Show clocked in headlines first
    • clocked-down: Show clocked in headines last
    • category-up: Sort by category name, ascending
    • category-down: Sort by category name, descending
    • category-keep: Keep default category sorting, as it appears in org-agenda-files

    You can open the custom agendas with the API too. For example to open the agenda stored under t:

         keys = {
           {
            "gt",
             function()
              vim.notify("Opening today's agenda", vim.log.levels.INFO)
              require("orgmode.api.agenda").open_by_key("t")
            end,
            desc = "Open orgmode agenda for today",
          },
        },
    

    In that case I'm configuring the keys section of the lazyvim plugin. Through the API you can also configure these options:

    • org_agenda_files
    • org_agenda_sorting_strategy
    • org_agenda_category_filter_preset
    • org_agenda_todo_ignore_deadlines: Ignore all deadlines that are too far in future (over org_deadline_warning_days). Possible values: all | near | far | past | future
    • org_agenda_todo_ignore_scheduled: Ignore all headlines that are scheduled. Possible values: past | future | all
  • New: Load different agendas with the same binding depending on the time.

    I find it useful to bind gt to Today's agenda, but what today means is different between week days. Imagine that you want to load an agenda if you're from monday to friday before 18:00 (a work agenda) versus a personal agenda the rest of the time.

    You could then configure this function:

        keys = {
         {
            "gt",
            function()
              local current_time = os.date("*t")
              local day = current_time.wday -- 1 = Sunday, 2 = Monday, etc.
              local hour = current_time.hour
    
              local agenda_key = "t"
              local agenda_name = "Today's" -- default
    
              -- Monday (2) through Friday (6)
              if day >= 2 and day <= 6 then
                if hour < 17 then
                  agenda_key = "w"
                  agenda_name = "Today + Work"
                end
              end
    
              vim.notify("Opening " .. agenda_name .. " agenda", vim.log.levels.INFO)
              require("orgmode.api.agenda").open_by_key(agenda_key)
            end,
            desc = "Open orgmode agenda for today",
          },
        }
    
  • New: Better handle indentations.

    There is something called virtual indents that will prevent you from many indentation headaches. To enable them set the org_startup_indented = true configuration.

    If you need to adjust the indentation of your document (for example after enabling the option on existent orgmode code), visually select the lines to correct the indentation (V) and then press =. You can do this with the whole file (╥﹏╥).

  • New: Remove some tags when the state has changed so DONE.

    For example if you want to remove them for recurrent tasks

          local function remove_specific_tags(headline)
            local tagsToRemove = { "t", "w", "m", "q", "y" }
            local currentTags = headline:get_tags()
            local newTags = {}
            local needsUpdate = false
    
            -- Build new tags list excluding t, w, m
            for _, tag in ipairs(currentTags) do
              local shouldKeep = true
              for _, removeTag in ipairs(tagsToRemove) do
                if tag == removeTag then
                  shouldKeep = false
                  needsUpdate = true
                  break
                end
              end
              if shouldKeep then
                table.insert(newTags, tag)
              end
            end
            -- Only update if we actually removed something
            if needsUpdate then
              headline:set_tags(table.concat(newTags, ":"))
              headline:refresh()
            end
          end
    
          local EventManager = require("orgmode.events")
          EventManager.listen(EventManager.event.TodoChanged, function(event)
            ---@cast event OrgTodoChangedEvent
            if event.headline then
              if type == "DONE" then
                remove_specific_tags(event.headline)
              end
            end
          end)
    
  • New: Register the todo changes in the logbook.

    You can now register the changes with events. Add this to your plugin config. If you're using lazyvim:

    return {
      {
        "nvim-orgmode/orgmode",
      config = function()
          require("orgmode").setup({...})
    
          local EventManager = require("orgmode.events")
          local Date = require("orgmode.objects.date")
    
          EventManager.listen(EventManager.event.TodoChanged, function(event)
            ---@cast event OrgTodoChangedEvent
            if event.headline then
             local current_todo, _, _ = event.headline:get_todo()
              local now = Date.now()
    
              event.headline:add_note({
                'State "' .. current_todo .. '" from "' .. event.old_todo_state .. '"  [' .. now:to_string() .. "]",
              })
            end
          end)
        end,
      },
    }
    
  • New: API usage.

    Get the headline under the cursor

    Read and write files

    You have information on how to do it in this pr

    Create custom hyperlink types

    Custom types can trigger functionality such as opening the terminal and pings the provided URL .

    To add your own custom hyperlink type, provide a custom handler to hyperlinks.sources setting. Each handler needs to have a get_name() method that returns a name for the handler. Additionally, follow(link) and autocomplete(link) optional methods are available to open the link and provide the autocompletion. ## Refile a headline to another destination

    Refile a headline to another destination

    You can do this with the API.

    Assuming you are in the filewhere your TODOs are:

    local api = require('orgmode.api') local closest_headline = api.current():get_closest_headline() local destination_file = api.load('~/org/journal.org') ocal destination_headline = vim.tbl_filter(function(headline) return headline.title == 'My journal' end, destination_file.headlines)[1]

    api.refile({ source = closest_headline, destination = destination_headline })

    Use events

Orgzly

Roadmap Adjustment

  • New: Adjust the month review process.

    To record the results of the review create the section in pages/reviews.org with the following template:

    * winter
    ** january review
    *** work
    *** personal
    **** month review
    ***** mental dump
    ****** What worries you right now?
    ****** What drained your energy or brought you down emotionally this last month?
    ****** What are the little things that burden you or slow you down?
    ****** What do you desire right now?
    ****** Where is your mind these days?
    ****** What did you enjoy most this last month?
    ****** What did help you most this last month?
    ****** What things would you want to finish throughout the month so you can carry them to the next?
    ****** What things do you feel you need to do?
    ****** What are you most proud of this month?
    ***** month checks
    ***** analyze
    ***** decide
    

    I'm assuming it's the january's review and that you have two kinds of reviews, one personal and one for work.

    Dump your mind

    The first thing we want to do in the review is to dump all that's in our mind into our system to free up mental load.

    Try not to, but if you think of decisions you want to make that address the elements you're discovering, write them down in the Decide section of your review document.

    There are different paths to discover actionable items:

    • Analyze what is in your mind: Take 10 minutes to answer to the questions of the template under the "mental dump" section (you don't need to answer them all). Notice that we do not need to review our life logging tools (diary, action manager, ...) to answer these questions. This means that we're doing an analysis of what is in our minds right now, not throughout the month. It's flawed but as we do this analysis often, it's probably fine. We add more importance to the latest events in our life anyway.

    Clean your notebook

    • Empty the elements you added to the review box. I have them in my inbox with the tag :review: (you have it in the month agenda view gM)
    • Clean your life notebook by:
    • Iterate over the areas of proyects.org only checking the first level of projects, don't go deeper and for each element:
      • Move the done elements either to archive.org or logbook.org.
      • Move to backlog.org the elements that don't make sense to be active anymore
    • Check if you have any DONE element in calendar.org.
    • Empty the inbox.org
    • Empty the DONE elements of talk.org
    • Clean the elements that don't make sense anymore from think.org

    • Process your month checks. For each of them:

    • If you need, add action elements in the mental dump section of the review.

    • Think of whether you've met the check.

    Refresh your idea of how the month go

    • Open your bitácora.org agenda view to see what has been completed in the last month match = 'CLOSED>"<-30d>"-work-steps-done', ordered by name org_agenda_sorting_strategy = { "category-keep" }, and change the priority of the elements according to the impact. Open your recurrent.org agenda view to see what has been done the last month match = 'LAST_REPEAT>"<-30d>"-work'
    • Check what has been left of your month objectives +m and refile the elements that don't make sense anymore.
    • Check the reports of your weekly reviews of the month in the reviews.org document.

    Check your close compromises

    Check all your action management tools (in my case orgmode and ikhal) to identify: - Arranged compromises - trips

  • New: Life roadmap adjustment.

    Create next stage's life notebook

    After reading "The Bulletproof Journal", I was drawn to the idea of changing notebooks each year, carrying over only the necessary things.

    I find this to be a powerful concept since you start each stage with a clean canvas. This brings you closer to desire versus duty as it removes the commitments you made to yourself, freeing up significant mental load. From this point, it's much easier to allow yourself to dream about what you want to do in this new stage.

    I want to apply this concept to my digital life notebook as I see the following advantages:

    • It lightens my files making them easier to manage and faster to process with orgmode
    • It's a very easy way to clean up
    • It's an elegant way to preserve what you've recorded without it becoming a hindrance
    • In each stage, you can start with a different notebook structure, meaning new axes, tools, and structures. This helps avoid falling into the rigidity of a constrained system or artifacts defined by inertia rather than conscious decision
    • It allows you to avoid maintaining files that follow an old scheme or having to migrate them to the new system
    • Additionally, you get rid of all those actions you've been reluctant to delete in one fell swoop

    The notebook change can be done in two phases:

    • Notebook Construction
    • Stage Closure

    Notebook Construction

    This phase spans from when you start making stage adjustments until you finally close the current stage. You can follow these steps:

    • Create a directory with the name of the new stage. In my case, it's the number of my predominant age during the stage
    • Create a directory for the current stage's notebook within "notebooks" in your references. Here we'll move everything that doesn't make sense to maintain. It's important that this directory isn't within your agenda files
    • Quickly review the improvements you've noted that you want to implement in next year's notebook to keep them in mind. You can note the references in the "Create new notebook" action

    As you review the stage, decide if it makes sense for the file you're viewing to exist as-is in the new notebook. Remember that the idea is to migrate minimal structure and data.

    • If it makes sense:
    • Create a symbolic link in the new notebook. When closing the stage, we'll replace the link with the file's final state
    • If the file no longer makes sense, move it to references/notebooks

Year reviews

Life chores management

himalaya

  • New: Configure GPG.

    Himalaya relies on cargo features to enable gpg. You can see the default enabled features in the Cargo.toml file. As of 2025-01-27 the pgp-commands is enabled.

    You only need to add the next section to your config:

    pgp.type = "commands"
    

    And then you can use both the cli and the vim plugin with gpg. Super easy

Instant Messages Management

Coding

Languages

PDM

  • Correction: Suggest to check uv.

    Maybe use uv instead (although so far I'm still using pdm)

Coding tools

File management configuration

  • New: How to exclude some files from the search.

    If anyone else comes here in the future and have the following setup

    • Using fd as default command: export FZF_DEFAULT_COMMAND='fd --type file --hidden --follow'
    • Using :Rg to grep in files

    And want to exclude a specific path in a git project say path/to/exclude (but that should not be included in .gitignore) from both fd and rg as used by fzf.vim, then the easiest way I found to solve to create ignore files for the respective tool then ignore this file in the local git clone (as they are only used by me)

    cd git_proj/
    echo "path/to/exclude" > .rgignore
    echo "path/to/exclude" > .fdignore
    printf ".rgignore\n.fdignore" >> .git/info/exclude
    

DevSecOps

Infrastructure Solutions

Kubectl Commands

  • New: Get the node architecture of the pods of a deployment.

    Here are a few ways to check the node architecture of pods in a deployment:

    1. Get the nodes where the pods are running:

      kubectl get pods -l app=your-deployment-label -o wide
      
      This will show which nodes are running your pods.

    2. Then check the architecture of those nodes:

      kubectl get nodes -o custom-columns=NAME:.metadata.name,ARCH:.status.nodeInfo.architecture
      

    Or you can combine this into a single command:

    kubectl get pods -l app=your-deployment-label -o json | jq -r '.items[].spec.nodeName' | xargs -I {} kubectl get node {} -o custom-columns=NAME:.metadata.name,ARCH:.status.nodeInfo.architecture
    

    You can also check if your deployment is explicitly targeting specific architectures through node selectors or affinity rules:

    kubectl get deployment your-deployment-name -o yaml | grep -A 5 nodeSelector
    

Automating Processes

renovate

  • New: Installation in gitea actions.

    • Create Renovate Bot Account and generate a token for the Gitea Action secret
    • Add the renovate bot account as collaborator with write permissions to the repository you want to update.
    • Create a repository to store our Renovate bot configurations, assuming called renovate-config.

    In renovate-config, create a file config.js to configure Renovate:

    module.exports = {
        "endpoint": "https://gitea.com/api/v1", // replace it with your actual endpoint
        "gitAuthor": "Renovate Bot <renovate-bot@yourhost.com>",
        "platform": "gitea",
        "onboardingConfigFileName": "renovate.json",
        "autodiscover": true,
        "optimizeForDisabled": true,
    };
    

    If you're using mysql or you see errors like .../repository/pulls 500 internal error you may need to set unicodeEmoji: false.

Storage

NAS

  • New: Suggest to look at the slimbook.

    I built a server pretty much the same as the slimbook.

  • New: Introduce smartctl.

    Self-Monitoring, Analysis, and Reporting Technology (S.M.A.R.T. or SMART) is a monitoring system included in computer hard disk drives (HDDs) and solid-state drives (SSDs). Its primary function is to detect and report various indicators of drive reliability, or how long a drive can function while anticipating imminent hardware failures.

    When S.M.A.R.T. data indicates a possible imminent drive failure, software running on the host system may notify the user so action can be taken to prevent data loss, and the failing drive can be replaced and no data is lost.

    General information

    Accuracy

    A field study at Google covering over 100,000 consumer-grade drives from December 2005 to August 2006 found correlations between certain S.M.A.R.T. information and annualized failure rates:

    • In the 60 days following the first uncorrectable error on a drive (S.M.A.R.T. attribute 0xC6 or 198) detected as a result of an offline scan, the drive was, on average, 39 times more likely to fail than a similar drive for which no such error occurred.
    • First errors in reallocations, offline reallocations (S.M.A.R.T. attributes 0xC4 and 0x05 or 196 and 5) and probational counts (S.M.A.R.T. attribute 0xC5 or 197) were also strongly correlated to higher probabilities of failure.
    • Conversely, little correlation was found for increased temperature and no correlation for usage level. However, the research showed that a large proportion (56%) of the failed drives failed without recording any count in the "four strong S.M.A.R.T. warnings" identified as scan errors, reallocation count, offline reallocation, and probational count.
    • Further, 36% of failed drives did so without recording any S.M.A.R.T. error at all, except the temperature, meaning that S.M.A.R.T. data alone was of limited usefulness in anticipating failures.

    Installation

    On Debian systems:

    sudo apt-get install smartmontools
    

    By default when you install it all your drives are checked periodically with the smartd daemon under the smartmontools systemd service.

    Usage

    Running the tests

    Test types

    S.M.A.R.T. drives may offer a number of self-tests:

    • Short: Checks the electrical and mechanical performance as well as the read performance of the disk. Electrical tests might include a test of buffer RAM, a read/write circuitry test, or a test of the read/write head elements. Mechanical test includes seeking and servo on data tracks. Scans small parts of the drive's surface (area is vendor-specific and there is a time limit on the test). Checks the list of pending sectors that may have read errors, and it usually takes under two minutes.
    • Long/extended: A longer and more thorough version of the short self-test, scanning the entire disk surface with no time limit. This test usually takes several hours, depending on the read/write speed of the drive and its size. It is possible for the long test to pass even if the short test fails.
    • Conveyance: Intended as a quick test to identify damage incurred during transporting of the device from the drive manufacturer to the computer manufacturer. Only available on ATA drives, and it usually takes several minutes.

    Drives remain operable during self-test, unless a "captive" option (ATA only) is requested.

    Long test

    Start with a long self test with smartctl. Assuming the disk to test is /dev/sdd:

    smartctl -t long /dev/sdd
    

    The command will respond with an estimate of how long it thinks the test will take to complete.

    To check progress use:

    martctl -A /dev/sdd | grep remaining
    smartctl -c /dev/sdd | grep remaining
    

    Don't check too often because it can abort the test with some drives. If you receive an empty output, examine the reported status with:

    `bash smartctl -l selftest /dev/sdd

    If errors are shown, check the dmesg as there are usually useful traces of the error.

  • New: Understanding the tests.

    The output of a smartctl command is difficult to read:

    smartctl 5.40 2010-03-16 r3077 [x86_64-unknown-linux-gnu] (local build)
    Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
    
    === START OF INFORMATION SECTION ===
    Model Family:     SAMSUNG SpinPoint F2 EG series
    Device Model:     SAMSUNG HD502HI
    Serial Number:    S1VZJ9CS712490
    Firmware Version: 1AG01118
    User Capacity:    500,107,862,016 bytes
    Device is:        In smartctl database [for details use: -P show]
    ATA Version is:   8
    ATA Standard is:  ATA-8-ACS revision 3b
    Local Time is:    Wed Feb  9 15:30:42 2011 CET
    SMART support is: Available - device has SMART capability.
    SMART support is: Enabled
    
    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED
    
    General SMART Values:
    Offline data collection status:  (0x00)    Offline data collection activity
                        was never started.
                        Auto Offline Data Collection: Disabled.
    Self-test execution status:      (   0)    The previous self-test routine completed
                        without error or no self-test has ever
                        been run.
    Total time to complete Offline
    data collection:          (6312) seconds.
    Offline data collection
    capabilities:              (0x7b) SMART execute Offline immediate.
                        Auto Offline data collection on/off support.
                        Suspend Offline collection upon new
                        command.
                        Offline surface scan supported.
                        Self-test supported.
                        Conveyance Self-test supported.
                        Selective Self-test supported.
    SMART capabilities:            (0x0003)    Saves SMART data before entering
                        power-saving mode.
                        Supports SMART auto save timer.
    Error logging capability:        (0x01)    Error logging supported.
                        General Purpose Logging supported.
    Short self-test routine
    recommended polling time:      (   2) minutes.
    Extended self-test routine
    recommended polling time:      ( 106) minutes.
    Conveyance self-test routine
    recommended polling time:      (  12) minutes.
    SCT capabilities:            (0x003f)    SCT Status supported.
                        SCT Error Recovery Control supported.
                        SCT Feature Control supported.
                        SCT Data Table supported.
    
    SMART Attributes Data Structure revision number: 16
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x000f   099   099   051    Pre-fail  Always       -       2376
      3 Spin_Up_Time            0x0007   091   091   011    Pre-fail  Always       -       3620
      4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       405
      5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
      7 Seek_Error_Rate         0x000f   253   253   051    Pre-fail  Always       -       0
      8 Seek_Time_Performance   0x0025   100   100   015    Pre-fail  Offline      -       0
      9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       717
     10 Spin_Retry_Count        0x0033   100   100   051    Pre-fail  Always       -       0
     11 Calibration_Retry_Count 0x0012   100   100   000    Old_age   Always       -       0
     12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       405
     13 Read_Soft_Error_Rate    0x000e   099   099   000    Old_age   Always       -       2375
    183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
    84 End-to-End_Error        0x0033   100   100   000    Pre-fail  Always       -       0
    187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       2375
    188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
    190 Airflow_Temperature_Cel 0x0022   084   074   000    Old_age   Always       -       16 (Lifetime Min/Max 16/16)
    194 Temperature_Celsius     0x0022   084   071   000    Old_age   Always       -       16 (Lifetime Min/Max 16/16)
    195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       3558
    196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
    197 Current_Pending_Sector  0x0012   098   098   000    Old_age   Always       -       81
    198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
    199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age   Always       -       1
    200 Multi_Zone_Error_Rate   0x000a   100   100   000    Old_age   Always       -       0
    201 Soft_Read_Error_Rate    0x000a   253   253   000    Old_age   Always       -       0
    
    MART Error Log Version: 1
    No Errors Logged
    
    SMART Self-test log structure revision number 1
    No self-tests have been logged.  [To run self-tests, use: smartctl -t]
    
    SMART Selective self-test log data structure revision number 1
     SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
        1        0        0  Not_testing
        2        0        0  Not_testing
        3        0        0  Not_testing
        4        0        0  Not_testing
        5        0        0  Not_testing
    Selective self-test flags (0x0):
      After scanning selected spans, do NOT read-scan remainder of disk.
    If Selective self-test is pending on power-up, resume after 0 minute delay.
    

    Checking overall health

    Somewhere in your report you'll see something like:

    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED
    

    If it doesn’t return PASSED, you should immediately backup all your data. Your hard drive is probably failing.

    That message can also be shown with smartctl -H /dev/sda

    Checking the SMART attributes

    Each drive manufacturer defines a set of attributes, and sets threshold values beyond which attributes should not pass under normal operation. But they do not agree on precise attribute definitions and measurement units, the following list of attributes is a general guide only.

    If one or more attribute have the "prefailure" flag, and the "current value" of such prefailure attribute is smaller than or equal to its "threshold value" (unless the "threshold value" is 0), that will be reported as a "drive failure". In addition, a utility software can send SMART RETURN STATUS command to the ATA drive, it may report three status: "drive OK", "drive warning" or "drive failure".

    SMART attributes columns

    Every of the SMART attributes has several columns as shown by “smartctl -a ”:

    • ID: The ID number of the attribute, good for comparing with other lists like Wikipedia: S.M.A.R.T.: Known ATA S.M.A.R.T. attributes because the attribute names sometimes differ. Name: The name of the SMART attribute.
    • Value: The current, normalized value of the attribute. Higher values are always better (except for temperature for hard disks of some manufacturers). The range is normally 0-100, for some attributes 0-255 (so that 100 resp. 255 is best, 0 is worst). There is no standard on how manufacturers convert their raw value to this normalized one: when the normalized value approaches threshold, it can do linearily, exponentially, logarithmically or any other way, meaning that a doubled normalized value does not necessarily mean “twice as good”.
    • Worst: The worst (normalized) value that this attribute had at any point of time where SMART was enabled. There seems to be no mechanism to reset current SMART attribute values, but this still makes sense as some SMART attributes, for some manufacturers, fluctuate over time so that keeping the worst one ever is meaningful.
    • Threshold: The threshold below which the normalized value will be considered “exceeding specifications”. If the attribute type is “Pre-fail”, this means that SMART thinks the hard disk is just before failure. This will “trigger” SMART: setting it from “SMART test passed” to “SMART impending failure” or similar status.
    • Type: The type of the attribute. Either “Pre-fail” for attributes that are said to indicate impending failure, or “Old_age” for attributes that just indicate wear and tear. Note that one and the same attribute can be classified as “Pre-fail” by one manufacturer or for one model and as “Old_age” by another or for another model. This is the case for example for attribute Seek_Error_Rate (ID 7), which is a widespread phenomenon on many disks and not considered critical by some manufacturers, but Seagate has it as “Pre-fail”.
    • Raw value: The current raw value that was converted to the normalized value above. smartctl shows all as decimal values, but some attribute values of some manufacturers cannot be reasonably interpreted that way
  • New: Reacting to SMART Values.

    It is said that a drive that starts getting bad sectors (attribute ID 5) or “pending” bad sectors (attribute ID 197; they most likely are bad, too) will usually be trash in 6 months or less. The only exception would be if this does not happen: that is, bad sector count increases, but then stays stable for a long time, like a year or more. For that reason, one normally needs a diagramming / journaling tool for SMART. Many admins will exchange the hard drive if it gets reallocated sectors (ID 5) or sectors “under investigation” (ID 197)

    Critical SMART attributes

    Of all the attributes I'm going to analyse only the critical ones

    Read Error Rate

    ID: 01 (0x01) deal: Low +Correlation with probability of failure: not clear

    (Vendor specific raw value.) Stores data related to the rate of hardware read errors that occurred when reading data from a disk surface. The raw value has different structure for different vendors and is often not meaningful as a decimal number. For some drives, this number may increase during normal operation without necessarily signifying errors.

    Reallocated Sectors Count

    ID: 05 (0x05) Ideal: Low Correlation with probability of failure: Strong

    Count of reallocated sectors. The raw value represents a count of the bad sectors that have been found and remapped. Thus, the higher the attribute value, the more sectors the drive has had to reallocate. This value is primarily used as a metric of the life expectancy of the drive; a drive which has had any reallocations at all is significantly more likely to fail in the immediate months. If Raw value of 0x05 attribute is higher than its Threshold value, that will reported as "drive warning".

    Spin Retry Count

    ID: 10 (0x0A) Ideal: Low Correlation with probability of failure: Strong

    Count of retry of spin start attempts. This attribute stores a total count of the spin start attempts to reach the fully operational speed (under the condition that the first attempt was unsuccessful). An increase of this attribute value is a sign of problems in the hard disk mechanical subsystem.

    Current Pending Sector Count

    ID: 197 (0xC5) Ideal: Low Correlation with probability of failure: Strong

    Count of "unstable" sectors (waiting to be remapped, because of unrecoverable read errors). If an unstable sector is subsequently read successfully, the sector is remapped and this value is decreased. Read errors on a sector will not remap the sector immediately (since the correct value cannot be read and so the value to remap is not known, and also it might become readable later); instead, the drive firmware remembers that the sector needs to be remapped, and will remap it the next time it has been successfully read.[76]

    However, some drives will not immediately remap such sectors when successfully read; instead the drive will first attempt to write to the problem sector, and if the write operation is successful the sector will then be marked as good (in this case, the "Reallocation Event Count" (0xC4) will not be increased). This is a serious shortcoming, for if such a drive contains marginal sectors that consistently fail only after some time has passed following a successful write operation, then the drive will never remap these problem sectors. If Raw value of 0xC5 attribute is higher than its Threshold value, that will reported as "drive warning"

    (Offline) Uncorrectable Sector Count

    ID: 198 (0xC6) Ideal: Low Correlation with probability of failure: Strong

    The total count of uncorrectable errors when reading/writing a sector. A rise in the value of this attribute indicates defects of the disk surface and/or problems in the mechanical subsystem.

    In the 60 days following the first uncorrectable error on a drive (S.M.A.R.T. attribute 0xC6 or 198) detected as a result of an offline scan, the drive was, on average, 39 times more likely to fail than a similar drive for which no such error occurred.

    Non critical SMART attributes

    The next attributes appear to change in the logs but that doesn't mean that there is anything going wrong

    Hardware ECC Recovered

    ID: 195 (0xC3) Ideal: Varies Correlation with probability of failure: Low

    (Vendor-specific raw value.) The raw value has different structure for different vendors and is often not meaningful as a decimal number. For some drives, this number may increase during normal operation without necessarily signifying errors.

  • New: Monitorization.

    To monitor your drive health you can use prometheus with alertmanager for alerts and grafana for dashboards.

    Installing the exporter

    The prometheus community has it's own smartctl exporter

    Using the binary

    You can download the latest binary from the repository releases and configure the systemd service

    unp smartctl_exporter-0.13.0.linux-amd64.tar.gz
    sudo mv smartctl_exporter-0.13.0.linux-amd64/smartctl_exporter /usr/bin
    

    Add the service to /etc/systemd/system/smartctl-exporter.service

    [Unit]
    Description=smartctl exporter service
    After=network-online.target
    
    [Service]
    Type=simple
    PIDFile=/run/smartctl_exporter.pid
    ExecStart=/usr/bin/smartctl_exporter
    User=root
    Group=root
    SyslogIdentifier=smartctl_exporter
    Restart=on-failure
    RemainAfterExit=no
    RestartSec=100ms
    StandardOutput=journal
    StandardError=journal
    
    [Install]
    WantedBy=multi-user.target
    

    hen enable it:

    sudo systemctl enable smartctl-exporter
    sudo service smartctl-exporter start
    

    Using docker

    ---
    services:
      smartctl-exporter:
        container_name: smartctl-exporter
        image: prometheuscommunity/smartctl-exporter
        privileged: true
        user: root
        ports:
          - "9633:9633"
    

    Configuring prometheus

    Add the next scraping metrics:

    - job_name: smartctl_exporter
      metrics_path: /metrics
      scrape_timeout: 60s
      static_configs:
        - targets: [smartctl-exporter:9633]
          labels:
            hostname: "your-hostname"
    

    Configuring the alerts

    Taking as a reference the awesome prometheus rules and this wired post I'm using the next rules:

    ---
    groups:
      - name: smartctl exporter
        rules:
          - alert: SmartDeviceTemperatureWarning
            expr: smartctl_device_temperature > 60
            for: 2m
           labels:
              severity: warning
            annotations:
              summary: Smart device temperature warning (instance {{ $labels.hostname }})
              description: "Device temperature  warning (instance {{ $labels.hostname }})\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
          - alert: SmartDeviceTemperatureCritical
            expr: smartctl_device_temperature > 80
            for: 2m
            labels:
              severity: critical
            annotations:
              summary: Smart device temperature critical (instance {{ $labels.hostname }})
              description: "Device temperature critical  (instance {{ $labels.hostname }})\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
          - alert: SmartCriticalWarning
            expr: smartctl_device_critical_warning > 0
            for: 15m
            labels:
              severity: critical
            annotations:
              summary: Smart critical warning (instance {{ $labels.hostname }})
              description: "device has critical warning (instance {{ $labels.hostname }})\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
          - alert: SmartNvmeWearoutIndicator
            expr: smartctl_device_available_spare{device=~"nvme.*"} < smartctl_device_available_spare_threshold{device=~"nvme.*"}
            for: 15m
            labels:
              severity: critical
            annotations:
              summary: Smart NVME Wearout Indicator (instance {{ $labels.hostname }})
              description: "NVMe device is wearing out (instance {{ $labels.hostname }})\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
          - alert: SmartNvmeMediaError
            expr: smartctl_device_media_errors > 0
            for: 15m
            labels:
              severity: warning
           annotations:
              summary: Smart NVME Media errors (instance {{ $labels.hostname }})
              description: "Contains the number of occurrences where the controller detected an unrecovered data integrity error. Errors such as uncorrectable ECC, CRC checksum failure, or LBA tag mismatch are included in this field (instance {{ $labels.hostname }})\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
          - alert: SmartSmartStatusError
            expr: smartctl_device_smart_status < 1
            for: 15m
            labels:
              severity: critical
            annotations:
              summary: Smart general status error (instance {{ $labels.hostname }})
              description: " (instance {{ $labels.hostname }})\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
         - alert: DiskReallocatedSectorsIncreased
            expr: smartctl_device_attribute{attribute_id="5", attribute_value_type="raw"} > max_over_time(smartctl_device_attribute{attribute_id="5", attribute_value_type="raw"}[1h])
            labels:
              severity: warning
            annotations:
              summary: "SMART Attribute Reallocated Sectors Count Increased"
              description: "The SMART attribute 5 (Reallocated Sectors Count) has increased on {{ $labels.device }} (instance {{ $labels.hostname }})\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
          - alert: DiskSpinRetryCountIncreased
            expr: smartctl_device_attribute{attribute_id="10", attribute_value_type="raw"} > max_over_time(smartctl_device_attribute{attribute_id="10", attribute_value_type="raw"}[1h])
            labels:
              severity: warning
            annotations:
              summary: "SMART Attribute Spin Retry Count Increased"
              description: "The SMART attribute 10 (Spin Retry Count) has increased on {{ $labels.device }} (instance {{ $labels.hostname }})\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
          - alert: DiskCurrentPendingSectorCountIncreased
            expr: smartctl_device_attribute{attribute_id="197", attribute_value_type="raw"} > max_over_time(smartctl_device_attribute{attribute_id="197", attribute_value_type="raw"}[1h])
            labels:
              severity: warning
            annotations:
              summary: "SMART Attribute Current Pending Sector Count Increased"
              description: "The SMART attribute 197 (Current Pending Sector Count) has increased on {{ $labels.device }} (instance {{ $labels.hostname }})\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
          - alert: DiskUncorrectableSectorCountIncreased
            expr: smartctl_device_attribute{attribute_id="198", attribute_value_type="raw"} > max_over_time(smartctl_device_attribute{attribute_id="198", attribute_value_type="raw"}[1h])
            labels:
              severity: warning
            annotations:
              summary: "SMART Attribute Uncorrectable Sector Count Increased"
              description: "The SMART attribute 198 (Uncorrectable Sector Count) has increased on {{ $labels.device }} (instance {{ $labels.hostname }})\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    

    Configuring the grafana dashboards

    Of the different grafana dashboards (1, 2, 3) I went for the first one.

    Import it with the UI of grafana, make it work and then export the json to store it in your infra as code respository.

    References

  • New: Thoughts on adding new disks to ZFS.

    When it comes to expanding an existing ZFS storage system, careful consideration is crucial. In my case, I faced a decision point with my storage cluster: after two years of reliable service from my 8TB drives, I needed more capacity. This led me to investigate the best way to integrate newly acquired refurbished 12TB drives into the system. Here's my journey through this decision-making process and the insights gained along the way.

    The Starting Point

    My existing setup consisted of 8TB drives purchased new, which had been running smoothly for two years. The need for expansion led me to consider refurbished 12TB drives as a cost-effective solution. However, mixing new and refurbished drives, especially of different capacities, raised several important considerations that needed careful analysis.

    Initial Drive Assessment

    The first step was to evaluate the reliability of all drives. Using smartctl, I analyzed the SMART data across both the existing and new drives:

    for disk in a b c d e f g h i; do
        echo "/dev/sd$disk: old $(smartctl -a /dev/sd$disk | grep Old | wc -l) pre-fail: $(smartctl -a /dev/sd$disk | grep Pre- | wc -l)"
    done
    

    The results showed similar values across all drives, with "Old_Age" attributes ranging from 14-17 and "Pre-fail" attributes between 3-6. While this indicated all drives were aging, they were still functioning with acceptable parameters. However, raw SMART data doesn't tell the whole story, especially when comparing new versus refurbished drives.

    Drive Reliability Considerations

    After careful evaluation, I found myself trusting the existing 8TB drives more than the newer refurbished 12TB ones. This conclusion was based on several factors:

    • The 8TB drives had a proven track record in my specific environment
    • Their smaller size meant faster resilver times, reducing the window of vulnerability during recovery
    • One of the refurbished 12TB drives was already showing concerning symptoms (8 reallocated sectors, although a badblocks didn't increase that number), which reduced confidence in the entire batch
    • The existing drives were purchased new, while the 12TB drives were refurbished, adding an extra layer of uncertainty

    Layout Options Analysis

    When expanding a ZFS system, there's always the temptation to simply add more vdevs to the existing pool. However, I investigated two main approaches:

    1. Creating a new separate ZFS pool with the new disks
    2. Add another vdev to the existent pool

    Resilver time

    Adding the 12TB drives to the pool and redistributing the data across all 8 drives will help reduce the resilver time. Here's a detailed breakdown:

    1. Current Situation

    2. 4x 8TB drives at 95% capacity means each drive is heavily packed

    3. High data density means longer resilver times
    4. Limited free space for data movement and reconstruction

    5. After Adding 12TB Drives

    6. Total pool capacity increases significantly

    7. ZFS will automatically start rebalancing data across all 8 drives
    8. This process (sometimes called "data shuffling" or "data redistribution") has several benefits:
    9. Reduces data density per drive
    10. Creates more free space
    11. Improves overall pool performance
    12. Potentially reduces future resilver times

    13. Resilver Time Reduction Mechanism

    14. With data spread across more drives, each individual drive has less data to resilver

    15. Less data per drive = faster resilver process
    16. The redistribution happens gradually and in the background

    Understanding Failure Scenarios

    The key differentiator between these approaches came down to failure scenarios:

    Single Drive Failure

    Both configurations handle single drive failures similarly, though the 12TB drives' longer resilver time creates a longer window of vulnerability in the two-vdev configuration if the data load is evenly shared between the disks. This is particularly concerning with refurbished drives, where the failure probability might be higher.

    However if as soon as you add the other vdev to the pool you defragment the data inside zfs, the 8TB drives will be less filled, so until more data is added you may reduce the resilver time as they have less data.

    Double Drive Failure

    This is where the configurations differ significantly:

    • In a two-vdev pool, losing two drives from the same vdev would cause complete pool failure
    • With separate pools, a double drive failure would only affect one pool, allowing the other to continue operating. This way you can store the critical data on the pool you trust more.
    • Given the mixed drive origins (new vs refurbished), isolating potential failures becomes more critical

    Performance Considerations

    While investigating performance implications, I found several interesting points about IOPS and throughput:

    • ZFS stripes data across vdevs, meaning more vdevs generally means better IOPS
    • In RAIDZ configurations, IOPS are limited by the slowest drive in the vdev
    • Multiple mirrored vdevs provide the best combined IOPS performance
    • Streaming speeds scale with the number of data disks in a RAIDZ vdev
    • When mixing drive sizes, ZFS tends to favor larger vdevs, which could lead to uneven wear

    Easiness of configuration

    Cache and log

    If you already have a zpool with a cache and logs in nvme, then if you were to use two pools, you'd need to reformat your nvme drives to create space for the new partitions needed for the new zpool.

    This would allow you to specify different cache sizes for each pool. But it comes at the cost of a more complex operation.

    New pool creation

    Adding a vdev to an existing pool is quicker and easier than to create a zpool. You need to make sure that you initialise it with the correct configuration.

    Storage management

    Having two pools doubles the operation tasks. One of the pools are to be filled soon, so you may need to manually move files and directories around to rebalance it.

    Final Decision

    After weighing all factors, if you favour reliability over easiness of your life implement two separate ZFS pools. This statement is primarily driven by:

    1. Enhanced Reliability: By separating the pools, we can maintain service availability even if one pool fails completely
    2. Data Prioritization: This allows placing critical application data on the more reliable pool (8TB drives), while using the refurbished drives for less critical data like media files
    3. Risk Isolation: Keeping the proven, new-purchased drives separate from the refurbished ones minimizes the impact of potential issues with the refurbished drives
    4. Consistent Performance: Following the best practice of keeping same-sized drives together in pools

    However I'm currently favouring easiness of life and trust my backup solution (I hope not to read this line in the future with regret :P), so I'll go with two vdevs.

    Key Takeaways

    Through this investigation, I learned several important lessons about ZFS storage design:

    1. Raw parity drive count isn't the only reliability metric - configuration matters more than simple redundancy numbers
    2. Pool layout significantly impacts both performance and failure scenarios
    3. Sometimes simpler configurations (like separate pools) can provide better overall reliability than more complex ones
    4. Consider the full lifecycle of the storage, including maintenance operations like resilver times
    5. When expanding storage, don't underestimate the value of isolating different generations or sources of hardware
    6. The history and source of drives (new vs refurbished) should influence your pool design decisions

    This investigation reinforced that storage design isn't just about maximizing space or performance - it's about finding the right balance of reliability, performance, and manageability for your specific needs. When dealing with mixed drive sources and different capacities, this balance becomes even more critical.

    References and further reading

    diff --git a/mkdocs.yml b/mkdocs.yml index 74292dd717..596f1506e8 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -103,6 +103,8 @@ nav: - Email clients: - himalaya: himalaya.md - alot: alot.md + - k9: k9.md + - Email protocols: - Maildir: maildir.md - Instant Messages Management: @@ -371,6 +373,7 @@ nav: - File management configuration: - NeoTree: neotree.md - Telescope: telescope.md + - fzf.nvim: fzf_nvim.md - Editing specific configuration: - vim_editor_plugins.md - Vim formatters: vim_formatters.md @@ -566,7 +569,10 @@ nav: - OpenZFS storage planning: zfs_storage_planning.md - Sanoid: sanoid.md - ZFS Prometheus exporter: zfs_exporter.md - - Hard drive health: hard_drive_health.md + - Hard drive health: + - hard_drive_health.md + - Smartctl: smartctl.md + - badblocks: badblocks.md - Resilience: - linux_resilience.md - Memtest: memtest.md @@ -768,7 +774,8 @@ nav: # - Streaming channels: streaming_channels.md - Music: - Sister Rosetta Tharpe: sister_rosetta_tharpe.md - - Video Gaming: + - Videogames: + - DragonSweeper: dragonsweeper.md - King Arthur Gold: kag.md - The Battle for Wesnoth: - The Battle for Wesnoth: wesnoth.md

badblocks

  • New: Check the health of a disk with badblocks.

    The badblocks command will write and read the disk with different patterns, thus overwriting the whole disk, so you will loose all the data in the disk.

    This test is good for rotational disks as there is no disk degradation on massive writes, do not use it on SSD though.

    WARNING: be sure that you specify the correct disk!!

    badblocks -wsv -b 4096 /dev/sde | tee disk_analysis_log.txt
    

    If errors are shown is that all of the spare sectors of the disk are used, so you must not use this disk anymore. Again, check dmesg for traces of disk errors.

  • New: Removing a disk from the pool.

    zpool remove tank0 sda
    

    This will trigger the data evacuation from the disk. Check zpool status to see when it finishes.

  • New: Encrypting ZFS Drives with LUKS.

    Warning: Proceed with Extreme Caution

    IMPORTANT SAFETY NOTICE:

    • These instructions will COMPLETELY WIPE the target drive
    • Do NOT attempt on production servers
    • Experiment only on drives with no valuable data
    • Seek professional help if anything is unclear

    Prerequisites

    • A drive you want to encrypt (will be referred to as /dev/sdx)
    • Root access
    • Basic understanding of Linux command line
    • Backup of all important data

    Step 1: Create LUKS Encryption Layer

    First, format the drive with LUKS encryption:

    sudo cryptsetup luksFormat /dev/sdx
    
    • You'll be prompted for a sudo password
    • Create a strong encryption password (mix of uppercase, lowercase, numbers, symbols)
    • Note the precise capitalization in commands

    Step 2: Open the Encrypted Disk

    Open the newly encrypted disk:

    sudo cryptsetup luksOpen /dev/sdx sdx_crypt
    

    This creates a mapped device at /dev/mapper/sdx_crypt

    Step 3: Create ZFS Pool or the vdev

    For example to create a ZFS pool on the encrypted device:

    sudo zpool create -f -o ashift=12 \
       -O compression=lz4 \
    +    zpool /dev/mapper/sdx_crypt
    

    Check the create zpool section to know which configuration flags to use.

    Step 4: Set Up Automatic Unlocking

    Generate a Keyfile

    Create a random binary keyfile:

    sudo dd bs=1024 count=4 if=/dev/urandom of=/etc/zfs/keys/sdx.key
    sudo chmod 0400 /etc/zfs/keys/sdx.key
    

    Add Keyfile to LUKS

    Add the keyfile to the LUKS disk:

    sudo cryptsetup luksAddKey /dev/sdx /etc/zfs/keys/sdx.key
    
    • You'll be asked to enter the original encryption password
    • This adds the binary file to the LUKS disk header
    • Now you can unlock the drive using either the password or the keyfile

    Step 5: Configure Automatic Mounting

    Find Drive UUID

    Get the drive's UUID:

    sudo blkid
    

    Look for the line with TYPE="crypto_LUKS". Copy the UUID.

    Update Crypttab

    Edit the crypttab file:

    sudo vim /etc/crypttab
    

    Add an entry like:

    sdx_crypt UUID=your-uuid-here /etc/zfs/keys/sdx.key luks,discard
    

    Final Step: Reboot

    • Reboot your system
    • The drive will be automatically decrypted and imported

    Best Practices

    +- Keep your keyfile and encryption password secure - Store keyfiles with restricted permissions - Consider backing up the LUKS header

    Troubleshooting

    • Double-check UUIDs
    • Verify keyfile permissions
    • Ensure cryptsetup and ZFS are installed

    Security Notes

    • This method provides full-disk encryption at rest
    • Data is inaccessible without the key or password
    • Protects against physical drive theft

    Disclaimer

    While these instructions are comprehensive, they come with inherent risks. Always:

    • Have backups
    • Test in non-critical environments first
    • Understand each step before executing

    Further reading

  • New: Add a disk to an existing vdev.

    zpool add tank /dev/sdx
    
  • New: Add a vdev to an existing pool.

    ``bash zpool add main raidz1-1 /dev/disk-1 /dev/disk-2 /dev/disk-3 /dev/disk-4 ```

    You don't need to specify the ashift or the autoexpand as they are set on zpool creation.

  • New: Add zfs book.

Authentication

Authentik

Operating Systems

Linux

Linux Snippets

  • New: Record the audio from your computer.

    You can record audio being played in a browser using ffmpeg

    1. Check your default audio source:
    pactl list sources | grep -E 'Name|Description'
    
    1. Record using ffmpeg:
    ffmpeg -f pulse -i <your_monitor_source> output.wav
    

    Example:

    ffmpeg -f pulse -i alsa_output.pci-0000_00_1b.0.analog-stereo.monitor output.wav
    
    1. Stop recording with Ctrl+C.

Relevant content

Videogames

DragonSweeper