24th February 2025
Activism⚑
Hacktivism⚑
Collectives⚑
-
New: Add critical switch.
Critical Switch: una colectiva transhackfeminista no mixta1 interesades en la cultura libre, la privacidad y la seguridad digital. Promovemos la cultura de la seguridad para generar espacios más seguros en los movimientos sociales y activistas.
-
New: Add méxico collectives.
Laboral⚑
Trabajadoras del hogar⚑
-
New: Introducir investigación sobre las trabajadoras del hogar.
Nota: para nada soy un experto en este tema, estas son las claves que he ido deduciendo a través de hablar del tema con trabajadoras y gestoras. Así que verificar todo antes de tomarlo por verdad!
Cuidado de personas mayores dependientes
El cuidado de las personas mayores es una movida, especialmente cuando empiezan a ser dependientes. Normalmente necesitan cuidados la mayor parte del tiempo del día. Actualmente existen las siguientes opciones para impartir dichos cuidados:
- La red cercana (normalmente las mujeres de la familia) se encargan de dichos cuidados.
- Parte o todos los cuidados se externalizan ya sea a una residencia de ancianos, centros de día o contratando a trabajadoras que acuden al hogar.
En este mundo podrido donde los servicios públicos están siendo desmantelados, la oferta pública de centros de día o residencias es insuficiente y generalmente en manos de políticas incompetentes (nunca olvidemos las 7291 muertes que pesan sobre los hombros de Ayuso {hija de puta!}).
Esto sumado a que las mujeres de la familia ahora trabajan y la precarización del sector de las empleadas del hogar hace que (sobre todo peña que tiene pasta) recurra a contratar trabajadoras en régimen de interna.
Trabajo del hogar en régimen de interna
En realidad este trabajo es esclavismo encubierto bajo una pátina legal. Aprovechandose de que la profesión está feminizada y generalmente por personas migrantes, se imponen unas condiciones laborales que no cumplen el estatuto de las trabajadoras.
Con un salario que normalmente no supera el mínimo estas trabajadoras:
- Trabajan muchas más horas que las 40 a la semana
- Hacen trabajos por fuera de su contrato como limpiar la casa o servir la comida.
- Se encuentran encerradas en su lugar de trabajo. Incluso cuando "dejan de trabajar" están dentro del control de sus empleadores.
- Al encontrarse solas con las personas a las que cuidan, 24 horas al día, es común encontrarse casos de violencia de género. Muchas cuentan testimonios de atrancar la puerta de su cuarto por la noche.
- Los espacios que se les ceden (habitaciones o cuartos de baño) no se respetan y generalmente son usados por otras personas de la familia cuando lo desean, arrebatándoles incluso su cuarto propio.
- Generalmente tienen unas dos horas al día para librar. Pero generalmente trabajan en barrios muy lejanos de su hogar, con precios y oferta de ocio muy lejana a sus posibilidades, así que normalmente usan esas horas para pasear. En invierno se pone más complicado con el frío y la lluvia.
- Las que libran los fines de semana tienen que pagar una habitación o piso que sólo pueden disfrutar unos pocos días a la semana.
- Tienen que soportar el maltrato y la tiranía propia de las personas mayores que ya empiezan a perder la cabeza. A esas edades se exacerban el clasismo y el racismo a la vez que desaparecen los mecanismos de control propio y filtro. Lo que genera situaciones muy desagradables que muchas veces desembocan en maltrato psicológico y físico.
Y aunque esto es harto conocido por la sociedad, es un modelo que se sigue usando con frecuencia.
Horario de trabajo
En algunos casos las trabajadoras libran el fin de semana, 36 horas consecutivas según la ley, lo que podría ser de sábado a las 9:00 hasta el domingo a las 21:00, además de 2 horas al día (no remuneradas) en los días de entre semana. Esto hace un total de 122 horas trabajadas a la semana, mucho mayor de las 40 horas establecidas.
Y aunque en teoría los empleadores están obligados a implementar un sistema para registrar la jornada de sus trabajadoras se torna difícil en la práctica
La ley además establece que además de las 40 horas semanales se pueden tener 20 horas extras de presencia. Las horas de presencia se pagan a precio de hora normal porque el régimen de empleadas de hogar lo establece así. No es como otros convenios.
La mayor parte de las trabajadoras no conoce que tienen derecho a estas 20 horas adicionales. Esas horas de presencia se pueden reclamar por los últimos 12 meses, las anteriores prescriben. Esto pueden ser unos 18.000 euros a reclamar. Para ello hacen falta pruebas de que la trabajadora está haciendo esa jornada. Una manera de pelearlo es pedir al empleador que justifique qué otras personas tiene contratadas para cuidar a la persona dependiente. Porque la prueba dentro de un domicilio es muy difícil. Si en el contrato no figura ningún horario se puede asumir que es 24h. También se puede preguntar a vecinos o si está empadronada.
Lo que si es claro es que la empresa tiene que definir el horario de trabajo en el contrato, lo que no siempre hacen.
Pernocta
En el contrato tiene que figurar si la trabajadora duerme en el lugar de trabajo.
Es muy difícil de regular las veces que se despiertan en la noche, así que pelear eso es aún complicado. Aunque se van haciendo avances.
Violencia y acoso en el empleo doméstico
Según el Real Decreto 893/2024 a ojos de noticias.juridicas.com:
El abandono del domicilio ante una situación de violencia o acoso sufrida por la persona trabajadora no podrá considerarse dimisión ni podrá ser causa de despido, sin perjuicio de la posibilidad de la persona trabajadora de solicitar la extinción del contrato en virtud del artículo 50 ET y de la solicitud de medidas cautelares en caso de formulación de demandas, de conformidad con la LRJS.
Salario
Lo normal es que se pague el salario mínimo, aunque Cuatro de cada diez ni llegan a eso. Hay que tener en cuenta que el salario mínimo ha sido actualizado desde enero de 2025 a 1383 euros. Es probable que a muchas ni se lo suban.
Si se cuentan las 20 horas de presencia, el salario serían aproximadamente 2000 euros al mes (2094 según la tabla salarial de senda de cuidados de 2024 para un régimen de 6 noches a la semana).
Pelear por sus derechos*
Normalmente aunque les cuentes todos los derechos que tienen, las trabajadoras no quieren reclamar ni ejercer sus derechos porque no se atreven. Por miedo a perder el trabajo u otras represalias.
Datos personales
Las asesorías pueden sacar el número de la seguridad social con un nombre y un DNI. Esto se hace para facilitar los trámites. Pero si no lo sabes puede rayarte.
Forma de pago
El empleador es el encargado de hacer la transferencia al trabajador como una nómina, no como una transferencia regular. Ya que si no el trabajador no obtiene las bonificaciones de tener domiciliada la nómina.
Aunque haya agencias de por medio, estas generalmente hacen de intermediarias y sólo un trabajo de asesoría, por lo tanto el contrato se suele hacer con la familia de la persona que es cuidada, y esta es la que ha de hacer el ingreso a la trabajadora. A no ser que la agencia sea una ETT de empleadas del hogar, que en ese caso es la agencia la que las contrata directamente.
Denuncias de inspección de trabajo
Las denuncias de inspección de trabajo en este régimen tienen un recorrido diferente dependiendo de a qué inspector le toque, porque como inspección de trabajo no puede entrar en los domicilios particulares por sorpresa aunque sea una empresa. Entonces hay inspectores que las denuncias por maltrato de las empleadas del hogar las meten en el cajón. Otros no, requieren y de más.
Impuestos
En las nóminas de las empleadas del hogar no hay IRPF. Normalmente el salario y la prorrata se desglosa, si está junto es una nómina cutre.
Papeleos
Una vez firmado el contrato, la empresa ha de entregar a la trabajadora la huella digital de su contrato comunicado al servicio público de empleo. Cuando haces un alta de una trabajadora hay que mandar dos ficheros, uno a la tesorería general con el alta y otro al servicio público de empleo estatal con el contrato. Si no lo hacen es un defecto formal, no es gravísimo.
Referencias
Empresas decentes: No todo es horrendo, existen cooperativas de trabajadoras que ofrecen estos servicios bajo unas condiciones que ellas han decidido:
Senda de cuidados publica su tabla salarial en la que te puedes hacer una idea del salario y de los diferentes tipos de régimen de trabajo.
Mejoras legales
- Real Decreto 893/2024, de 10 de septiembre, por el que se regula la protección de la seguridad y la salud en el ámbito del servicio del hogar familiar.
- Real Decreto-ley 16/2022, de 6 de septiembre, para la mejora de las condiciones de trabajo y de Seguridad Social de las personas trabajadoras al servicio del hogar.
Artículos sobre el trabajo interno
Conference organisation⚑
pretalx⚑
-
New: Import a pretalx calendar in giggity.
Search the url similar to https://pretalx.com/
/schedule/export/schedule.xml
Life Management⚑
Time management⚑
Org Mode⚑
-
New: Footnotes.
A footnote is started by a footnote marker in square brackets in column 0, no indentation allowed. It ends at the next footnote definition, headline, or after two consecutive empty lines. The footnote reference is simply the marker in square brackets, inside text. Markers always start with ‘fn:’. For example:
The Org website[fn:1] now looks a lot better than it used to. ... [fn:50] The link is: https://orgmode.org
Nvim-orgmode has some basic support for footnotes.
-
New: Custom agendas.
You an use custom agenda commands
Define custom agenda views that are available through the
org_agenda
mapping. It is possible to combine multiple agenda types into single view. An example:require('orgmode').setup({ org_agenda_files = {'~/org/**/*'}, org_agenda_custom_commands = { -- "c" is the shortcut that will be used in the prompt c = { description = 'Combined view', -- Description shown in the prompt for the shortcut types = { { type = 'tags_todo', -- Type can be agenda | tags | tags_todo match = '+PRIORITY="A"', --Same as providing a "Match:" for tags view <leader>oa + m, See: https://orgmode.org/manual/Matching-tags-and-properties.html org_agenda_overriding_header = 'High priority todos', org_agenda_todo_ignore_deadlines = 'far', -- Ignore all deadlines that are too far in future (over org_deadline_warning_days). Possible values: all | near | far | past | future }, { type = 'agenda', org_agenda_overriding_header = 'My daily agenda', org_agenda_span = 'day' -- can be any value as org_agenda_span }, { type = 'tags', match = 'WORK', --Same as providing a "Match:" for tags view <leader>oa + m, See: https://orgmode.org/manual/Matching-tags-and-properties.html org_agenda_overriding_header = 'My work todos', org_agenda_todo_ignore_scheduled = 'all', -- Ignore all headlines that are scheduled. Possible values: past | future | all }, { type = 'agenda', org_agenda_overriding_header = 'Whole week overview', org_agenda_span = 'week', -- 'week' is default, so it's not necessary here, just an example org_agenda_start_on_weekday = 1 -- Start on Monday org_agenda_remove_tags = true -- Do not show tags only for this view }, } }, p = { description = 'Personal agenda', types = { { type = 'tags_todo', org_agenda_overriding_header = 'My personal todos', org_agenda_category_filter_preset = 'todos', -- Show only headlines from `todos` category. Same value providad as when pressing `/` in the Agenda view org_agenda_sorting_strategy = {'todo-state-up', 'priority-down'} -- See all options available on org_agenda_sorting_strategy }, { type = 'agenda', org_agenda_overriding_header = 'Personal projects agenda', org_agenda_files = {'~/my-projects/**/*'}, -- Can define files outside of the default org_agenda_files }, { type = 'tags', org_agenda_overriding_header = 'Personal projects notes', org_agenda_files = {'~/my-projects/**/*'}, org_agenda_tag_filter_preset = 'NOTES-REFACTOR' -- Show only headlines with NOTES tag that does not have a REFACTOR tag. Same value providad as when pressing `/` in the Agenda view }, } } } })
You can also define the
org_agenda_sorting_strategy
. The default value is{ agenda = {'time-up', 'priority-down', 'category-keep'}, todo = {'priority-down', 'category-keep'}, tags = {'priority-down', 'category-keep'}}
.The available list of sorting strategies to apply to a given view are:
time-up
: Sort entries by time of day. Applicable only in agenda viewtime-down
: Opposite of time-uppriority-down
: Sort by priority, from highest to lowestpriority-up
: Sort by priority, from lowest to highesttag-up
: Sort by sorted tags string, ascendingtag-down
: Sort by sorted tags string, descendingtodo-state-up
: Sort by todo keyword by position (example: 'TODO, PROGRESS, DONE' has a sort value of 1, 2 and 3), ascendingtodo-state-down
: Sort by todo keyword, descendingclocked-up
: Show clocked in headlines firstclocked-down
: Show clocked in headines lastcategory-up
: Sort by category name, ascendingcategory-down
: Sort by category name, descendingcategory-keep
: Keep default category sorting, as it appears in org-agenda-files
You can open the custom agendas with the API too. For example to open the agenda stored under
t
:keys = { { "gt", function() vim.notify("Opening today's agenda", vim.log.levels.INFO) require("orgmode.api.agenda").open_by_key("t") end, desc = "Open orgmode agenda for today", }, },
In that case I'm configuring the
keys
section of the lazyvim plugin. Through the API you can also configure these options:org_agenda_files
org_agenda_sorting_strategy
org_agenda_category_filter_preset
org_agenda_todo_ignore_deadlines
: Ignore all deadlines that are too far in future (over org_deadline_warning_days). Possible values: all | near | far | past | futureorg_agenda_todo_ignore_scheduled
: Ignore all headlines that are scheduled. Possible values: past | future | all
-
New: Load different agendas with the same binding depending on the time.
I find it useful to bind
gt
to Today's agenda, but what today means is different between week days. Imagine that you want to load an agenda if you're from monday to friday before 18:00 (a work agenda) versus a personal agenda the rest of the time.You could then configure this function:
keys = { { "gt", function() local current_time = os.date("*t") local day = current_time.wday -- 1 = Sunday, 2 = Monday, etc. local hour = current_time.hour local agenda_key = "t" local agenda_name = "Today's" -- default -- Monday (2) through Friday (6) if day >= 2 and day <= 6 then if hour < 17 then agenda_key = "w" agenda_name = "Today + Work" end end vim.notify("Opening " .. agenda_name .. " agenda", vim.log.levels.INFO) require("orgmode.api.agenda").open_by_key(agenda_key) end, desc = "Open orgmode agenda for today", }, }
-
New: Better handle indentations.
There is something called virtual indents that will prevent you from many indentation headaches. To enable them set the
org_startup_indented = true
configuration.If you need to adjust the indentation of your document (for example after enabling the option on existent orgmode code), visually select the lines to correct the indentation (
V
) and then press=
. You can do this with the whole file(╥﹏╥)
. -
New: Remove some tags when the state has changed so DONE.
For example if you want to remove them for recurrent tasks
local function remove_specific_tags(headline) local tagsToRemove = { "t", "w", "m", "q", "y" } local currentTags = headline:get_tags() local newTags = {} local needsUpdate = false -- Build new tags list excluding t, w, m for _, tag in ipairs(currentTags) do local shouldKeep = true for _, removeTag in ipairs(tagsToRemove) do if tag == removeTag then shouldKeep = false needsUpdate = true break end end if shouldKeep then table.insert(newTags, tag) end end -- Only update if we actually removed something if needsUpdate then headline:set_tags(table.concat(newTags, ":")) headline:refresh() end end local EventManager = require("orgmode.events") EventManager.listen(EventManager.event.TodoChanged, function(event) ---@cast event OrgTodoChangedEvent if event.headline then if type == "DONE" then remove_specific_tags(event.headline) end end end)
-
New: Register the todo changes in the logbook.
You can now register the changes with events. Add this to your plugin config. If you're using lazyvim:
return { { "nvim-orgmode/orgmode", config = function() require("orgmode").setup({...}) local EventManager = require("orgmode.events") local Date = require("orgmode.objects.date") EventManager.listen(EventManager.event.TodoChanged, function(event) ---@cast event OrgTodoChangedEvent if event.headline then local current_todo, _, _ = event.headline:get_todo() local now = Date.now() event.headline:add_note({ 'State "' .. current_todo .. '" from "' .. event.old_todo_state .. '" [' .. now:to_string() .. "]", }) end end) end, }, }
-
New: API usage.
Get the headline under the cursor
You have information on how to do it in this pr
Custom types can trigger functionality such as opening the terminal and pings the provided URL .
To add your own custom hyperlink type, provide a custom handler to
hyperlinks.sources
setting. Each handler needs to have aget_name()
method that returns a name for the handler. Additionally,follow(link)
andautocomplete(link)
optional methods are available to open the link and provide the autocompletion. ## Refile a headline to another destinationRefile a headline to another destination
You can do this with the API.
Assuming you are in the filewhere your TODOs are:
local api = require('orgmode.api') local closest_headline = api.current():get_closest_headline() local destination_file = api.load('~/org/journal.org') ocal destination_headline = vim.tbl_filter(function(headline) return headline.title == 'My journal' end, destination_file.headlines)[1]
api.refile({ source = closest_headline, destination = destination_headline })
Orgzly⚑
-
New: Not adding a todo state when creating a new element by default.
The default state
NOTE
doesn't add any state.
Roadmap Adjustment⚑
-
New: Adjust the month review process.
To record the results of the review create the section in
pages/reviews.org
with the following template:* winter ** january review *** work *** personal **** month review ***** mental dump ****** What worries you right now? ****** What drained your energy or brought you down emotionally this last month? ****** What are the little things that burden you or slow you down? ****** What do you desire right now? ****** Where is your mind these days? ****** What did you enjoy most this last month? ****** What did help you most this last month? ****** What things would you want to finish throughout the month so you can carry them to the next? ****** What things do you feel you need to do? ****** What are you most proud of this month? ***** month checks ***** analyze ***** decide
I'm assuming it's the january's review and that you have two kinds of reviews, one personal and one for work.
Dump your mind
The first thing we want to do in the review is to dump all that's in our mind into our system to free up mental load.
Try not to, but if you think of decisions you want to make that address the elements you're discovering, write them down in the
Decide
section of your review document.There are different paths to discover actionable items:
- Analyze what is in your mind: Take 10 minutes to answer to the questions of the template under the "mental dump" section (you don't need to answer them all). Notice that we do not need to review our life logging tools (diary, action manager, ...) to answer these questions. This means that we're doing an analysis of what is in our minds right now, not throughout the month. It's flawed but as we do this analysis often, it's probably fine. We add more importance to the latest events in our life anyway.
Clean your notebook
- Empty the elements you added to the
review box
. I have them in my inbox with the tag:review:
(you have it in the month agenda viewgM
) - Clean your life notebook by:
- Iterate over the areas of
proyects.org
only checking the first level of projects, don't go deeper and for each element:- Move the done elements either to
archive.org
orlogbook.org
. - Move to
backlog.org
the elements that don't make sense to be active anymore
- Move the done elements either to
- Check if you have any
DONE
element incalendar.org
. - Empty the
inbox.org
- Empty the
DONE
elements oftalk.org
-
Clean the elements that don't make sense anymore from
think.org
-
Process your
month checks
. For each of them: -
If you need, add action elements in the
mental dump
section of the review. - Think of whether you've met the check.
Refresh your idea of how the month go
- Open your
bitácora.org
agenda view to see what has been completed in the last monthmatch = 'CLOSED>"<-30d>"-work-steps-done',
ordered by nameorg_agenda_sorting_strategy = { "category-keep" },
and change the priority of the elements according to the impact. Open yourrecurrent.org
agenda view to see what has been done the last monthmatch = 'LAST_REPEAT>"<-30d>"-work'
- Check what has been left of your month objectives
+m
and refile the elements that don't make sense anymore. - Check the reports of your weekly reviews of the month in the
reviews.org
document.
Check your close compromises
Check all your action management tools (in my case
orgmode
andikhal
) to identify: - Arranged compromises - trips -
Create next stage's life notebook
After reading "The Bulletproof Journal", I was drawn to the idea of changing notebooks each year, carrying over only the necessary things.
I find this to be a powerful concept since you start each stage with a clean canvas. This brings you closer to desire versus duty as it removes the commitments you made to yourself, freeing up significant mental load. From this point, it's much easier to allow yourself to dream about what you want to do in this new stage.
I want to apply this concept to my digital life notebook as I see the following advantages:
- It lightens my files making them easier to manage and faster to process with orgmode
- It's a very easy way to clean up
- It's an elegant way to preserve what you've recorded without it becoming a hindrance
- In each stage, you can start with a different notebook structure, meaning new axes, tools, and structures. This helps avoid falling into the rigidity of a constrained system or artifacts defined by inertia rather than conscious decision
- It allows you to avoid maintaining files that follow an old scheme or having to migrate them to the new system
- Additionally, you get rid of all those actions you've been reluctant to delete in one fell swoop
The notebook change can be done in two phases:
- Notebook Construction
- Stage Closure
Notebook Construction
This phase spans from when you start making stage adjustments until you finally close the current stage. You can follow these steps:
- Create a directory with the name of the new stage. In my case, it's the number of my predominant age during the stage
- Create a directory for the current stage's notebook within "notebooks" in your references. Here we'll move everything that doesn't make sense to maintain. It's important that this directory isn't within your agenda files
- Quickly review the improvements you've noted that you want to implement in next year's notebook to keep them in mind. You can note the references in the "Create new notebook" action
As you review the stage, decide if it makes sense for the file you're viewing to exist as-is in the new notebook. Remember that the idea is to migrate minimal structure and data.
- If it makes sense:
- Create a symbolic link in the new notebook. When closing the stage, we'll replace the link with the file's final state
- If the file no longer makes sense, move it to
references/notebooks
Year reviews⚑
-
New: Cositas del 2025.
Fascismo
En el acto de toma de posesión del cargo de trump, elon musk hace el saludo nazi.
Feminismo
Life chores management⚑
himalaya⚑
-
New: Configure GPG.
Himalaya relies on cargo features to enable gpg. You can see the default enabled features in the Cargo.toml file. As of 2025-01-27 the
pgp-commands
is enabled.You only need to add the next section to your config:
pgp.type = "commands"
And then you can use both the cli and the vim plugin with gpg. Super easy
Instant Messages Management⚑
-
New: Add interesting article to merge all protocols under matrix.
-
New: How to set a master password.
You can't, it's not supported and it doesn't look that it will (1, 2)
Coding⚑
Languages⚑
PDM⚑
-
Correction: Suggest to check uv.
Maybe use uv instead (although so far I'm still using
pdm
)
Coding tools⚑
File management configuration⚑
-
New: How to exclude some files from the search.
If anyone else comes here in the future and have the following setup
- Using
fd
as default command:export FZF_DEFAULT_COMMAND='fd --type file --hidden --follow'
- Using
:Rg
to grep in files
And want to exclude a specific path in a git project say
path/to/exclude
(but that should not be included in.gitignore
) from bothfd
andrg
as used byfzf.vim
, then the easiest way I found to solve to create ignore files for the respective tool then ignore this file in the local git clone (as they are only used by me)cd git_proj/ echo "path/to/exclude" > .rgignore echo "path/to/exclude" > .fdignore printf ".rgignore\n.fdignore" >> .git/info/exclude
- Using
DevSecOps⚑
Infrastructure Solutions⚑
Kubectl Commands⚑
-
New: Get the node architecture of the pods of a deployment.
Here are a few ways to check the node architecture of pods in a deployment:
-
Get the nodes where the pods are running:
This will show which nodes are running your pods.kubectl get pods -l app=your-deployment-label -o wide
-
Then check the architecture of those nodes:
kubectl get nodes -o custom-columns=NAME:.metadata.name,ARCH:.status.nodeInfo.architecture
Or you can combine this into a single command:
kubectl get pods -l app=your-deployment-label -o json | jq -r '.items[].spec.nodeName' | xargs -I {} kubectl get node {} -o custom-columns=NAME:.metadata.name,ARCH:.status.nodeInfo.architecture
You can also check if your deployment is explicitly targeting specific architectures through node selectors or affinity rules:
kubectl get deployment your-deployment-name -o yaml | grep -A 5 nodeSelector
-
Automating Processes⚑
renovate⚑
-
New: Installation in gitea actions.
- Create Renovate Bot Account and generate a token for the Gitea Action secret
- Add the renovate bot account as collaborator with write permissions to the repository you want to update.
- Create a repository to store our Renovate bot configurations, assuming called renovate-config.
In renovate-config, create a file config.js to configure Renovate:
module.exports = { "endpoint": "https://gitea.com/api/v1", // replace it with your actual endpoint "gitAuthor": "Renovate Bot <renovate-bot@yourhost.com>", "platform": "gitea", "onboardingConfigFileName": "renovate.json", "autodiscover": true, "optimizeForDisabled": true, };
If you're using mysql or you see errors like
.../repository/pulls 500 internal error
you may need to setunicodeEmoji: false
.
Storage⚑
NAS⚑
-
New: Suggest to look at the slimbook.
I built a server pretty much the same as the slimbook.
-
New: Introduce smartctl.
Self-Monitoring, Analysis, and Reporting Technology (S.M.A.R.T. or SMART) is a monitoring system included in computer hard disk drives (HDDs) and solid-state drives (SSDs). Its primary function is to detect and report various indicators of drive reliability, or how long a drive can function while anticipating imminent hardware failures.
When S.M.A.R.T. data indicates a possible imminent drive failure, software running on the host system may notify the user so action can be taken to prevent data loss, and the failing drive can be replaced and no data is lost.
General information
A field study at Google covering over 100,000 consumer-grade drives from December 2005 to August 2006 found correlations between certain S.M.A.R.T. information and annualized failure rates:
- In the 60 days following the first uncorrectable error on a drive (S.M.A.R.T. attribute 0xC6 or 198) detected as a result of an offline scan, the drive was, on average, 39 times more likely to fail than a similar drive for which no such error occurred.
- First errors in reallocations, offline reallocations (S.M.A.R.T. attributes 0xC4 and 0x05 or 196 and 5) and probational counts (S.M.A.R.T. attribute 0xC5 or 197) were also strongly correlated to higher probabilities of failure.
- Conversely, little correlation was found for increased temperature and no correlation for usage level. However, the research showed that a large proportion (56%) of the failed drives failed without recording any count in the "four strong S.M.A.R.T. warnings" identified as scan errors, reallocation count, offline reallocation, and probational count.
- Further, 36% of failed drives did so without recording any S.M.A.R.T. error at all, except the temperature, meaning that S.M.A.R.T. data alone was of limited usefulness in anticipating failures.
On Debian systems:
sudo apt-get install smartmontools
By default when you install it all your drives are checked periodically with the
smartd
daemon under thesmartmontools
systemd service.Usage
Running the tests
S.M.A.R.T. drives may offer a number of self-tests:
- Short: Checks the electrical and mechanical performance as well as the read performance of the disk. Electrical tests might include a test of buffer RAM, a read/write circuitry test, or a test of the read/write head elements. Mechanical test includes seeking and servo on data tracks. Scans small parts of the drive's surface (area is vendor-specific and there is a time limit on the test). Checks the list of pending sectors that may have read errors, and it usually takes under two minutes.
- Long/extended: A longer and more thorough version of the short self-test, scanning the entire disk surface with no time limit. This test usually takes several hours, depending on the read/write speed of the drive and its size. It is possible for the long test to pass even if the short test fails.
- Conveyance: Intended as a quick test to identify damage incurred during transporting of the device from the drive manufacturer to the computer manufacturer. Only available on ATA drives, and it usually takes several minutes.
Drives remain operable during self-test, unless a "captive" option (ATA only) is requested.
Long test
Start with a long self test with
smartctl
. Assuming the disk to test is/dev/sdd
:smartctl -t long /dev/sdd
The command will respond with an estimate of how long it thinks the test will take to complete.
To check progress use:
martctl -A /dev/sdd | grep remaining smartctl -c /dev/sdd | grep remaining
Don't check too often because it can abort the test with some drives. If you receive an empty output, examine the reported status with:
`bash smartctl -l selftest /dev/sdd
If errors are shown, check the
dmesg
as there are usually useful traces of the error. -
The output of a
smartctl
command is difficult to read:smartctl 5.40 2010-03-16 r3077 [x86_64-unknown-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: SAMSUNG SpinPoint F2 EG series Device Model: SAMSUNG HD502HI Serial Number: S1VZJ9CS712490 Firmware Version: 1AG01118 User Capacity: 500,107,862,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 3b Local Time is: Wed Feb 9 15:30:42 2011 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (6312) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 106) minutes. Conveyance self-test routine recommended polling time: ( 12) minutes. SCT capabilities: (0x003f) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 099 099 051 Pre-fail Always - 2376 3 Spin_Up_Time 0x0007 091 091 011 Pre-fail Always - 3620 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 405 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 253 253 051 Pre-fail Always - 0 8 Seek_Time_Performance 0x0025 100 100 015 Pre-fail Offline - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 717 10 Spin_Retry_Count 0x0033 100 100 051 Pre-fail Always - 0 11 Calibration_Retry_Count 0x0012 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 405 13 Read_Soft_Error_Rate 0x000e 099 099 000 Old_age Always - 2375 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 84 End-to-End_Error 0x0033 100 100 000 Pre-fail Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 2375 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 084 074 000 Old_age Always - 16 (Lifetime Min/Max 16/16) 194 Temperature_Celsius 0x0022 084 071 000 Old_age Always - 16 (Lifetime Min/Max 16/16) 195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 3558 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0 197 Current_Pending_Sector 0x0012 098 098 000 Old_age Always - 81 198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 100 100 000 Old_age Always - 1 200 Multi_Zone_Error_Rate 0x000a 100 100 000 Old_age Always - 0 201 Soft_Read_Error_Rate 0x000a 253 253 000 Old_age Always - 0 MART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
Checking overall health
Somewhere in your report you'll see something like:
=== START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED
If it doesn’t return PASSED, you should immediately backup all your data. Your hard drive is probably failing.
That message can also be shown with
smartctl -H /dev/sda
Each drive manufacturer defines a set of attributes, and sets threshold values beyond which attributes should not pass under normal operation. But they do not agree on precise attribute definitions and measurement units, the following list of attributes is a general guide only.
If one or more attribute have the "prefailure" flag, and the "current value" of such prefailure attribute is smaller than or equal to its "threshold value" (unless the "threshold value" is 0), that will be reported as a "drive failure". In addition, a utility software can send SMART RETURN STATUS command to the ATA drive, it may report three status: "drive OK", "drive warning" or "drive failure".
Every of the SMART attributes has several columns as shown by “smartctl -a
”: - ID: The ID number of the attribute, good for comparing with other lists like Wikipedia: S.M.A.R.T.: Known ATA S.M.A.R.T. attributes because the attribute names sometimes differ. Name: The name of the SMART attribute.
- Value: The current, normalized value of the attribute. Higher values are always better (except for temperature for hard disks of some manufacturers). The range is normally 0-100, for some attributes 0-255 (so that 100 resp. 255 is best, 0 is worst). There is no standard on how manufacturers convert their raw value to this normalized one: when the normalized value approaches threshold, it can do linearily, exponentially, logarithmically or any other way, meaning that a doubled normalized value does not necessarily mean “twice as good”.
- Worst: The worst (normalized) value that this attribute had at any point of time where SMART was enabled. There seems to be no mechanism to reset current SMART attribute values, but this still makes sense as some SMART attributes, for some manufacturers, fluctuate over time so that keeping the worst one ever is meaningful.
- Threshold: The threshold below which the normalized value will be considered “exceeding specifications”. If the attribute type is “Pre-fail”, this means that SMART thinks the hard disk is just before failure. This will “trigger” SMART: setting it from “SMART test passed” to “SMART impending failure” or similar status.
- Type: The type of the attribute. Either “Pre-fail” for attributes that are said to indicate impending failure, or “Old_age” for attributes that just indicate wear and tear. Note that one and the same attribute can be classified as “Pre-fail” by one manufacturer or for one model and as “Old_age” by another or for another model. This is the case for example for attribute Seek_Error_Rate (ID 7), which is a widespread phenomenon on many disks and not considered critical by some manufacturers, but Seagate has it as “Pre-fail”.
- Raw value: The current raw value that was converted to the normalized value above. smartctl shows all as decimal values, but some attribute values of some manufacturers cannot be reasonably interpreted that way
-
New: Reacting to SMART Values.
It is said that a drive that starts getting bad sectors (attribute ID 5) or “pending” bad sectors (attribute ID 197; they most likely are bad, too) will usually be trash in 6 months or less. The only exception would be if this does not happen: that is, bad sector count increases, but then stays stable for a long time, like a year or more. For that reason, one normally needs a diagramming / journaling tool for SMART. Many admins will exchange the hard drive if it gets reallocated sectors (ID 5) or sectors “under investigation” (ID 197)
Of all the attributes I'm going to analyse only the critical ones
Read Error Rate
ID: 01 (0x01) deal: Low +Correlation with probability of failure: not clear
(Vendor specific raw value.) Stores data related to the rate of hardware read errors that occurred when reading data from a disk surface. The raw value has different structure for different vendors and is often not meaningful as a decimal number. For some drives, this number may increase during normal operation without necessarily signifying errors.
Reallocated Sectors Count
ID: 05 (0x05) Ideal: Low Correlation with probability of failure: Strong
Count of reallocated sectors. The raw value represents a count of the bad sectors that have been found and remapped. Thus, the higher the attribute value, the more sectors the drive has had to reallocate. This value is primarily used as a metric of the life expectancy of the drive; a drive which has had any reallocations at all is significantly more likely to fail in the immediate months. If Raw value of 0x05 attribute is higher than its Threshold value, that will reported as "drive warning".
Spin Retry Count
ID: 10 (0x0A) Ideal: Low Correlation with probability of failure: Strong
Count of retry of spin start attempts. This attribute stores a total count of the spin start attempts to reach the fully operational speed (under the condition that the first attempt was unsuccessful). An increase of this attribute value is a sign of problems in the hard disk mechanical subsystem.
Current Pending Sector Count
ID: 197 (0xC5) Ideal: Low Correlation with probability of failure: Strong
Count of "unstable" sectors (waiting to be remapped, because of unrecoverable read errors). If an unstable sector is subsequently read successfully, the sector is remapped and this value is decreased. Read errors on a sector will not remap the sector immediately (since the correct value cannot be read and so the value to remap is not known, and also it might become readable later); instead, the drive firmware remembers that the sector needs to be remapped, and will remap it the next time it has been successfully read.[76]
However, some drives will not immediately remap such sectors when successfully read; instead the drive will first attempt to write to the problem sector, and if the write operation is successful the sector will then be marked as good (in this case, the "Reallocation Event Count" (0xC4) will not be increased). This is a serious shortcoming, for if such a drive contains marginal sectors that consistently fail only after some time has passed following a successful write operation, then the drive will never remap these problem sectors. If Raw value of 0xC5 attribute is higher than its Threshold value, that will reported as "drive warning"
(Offline) Uncorrectable Sector Count
ID: 198 (0xC6) Ideal: Low Correlation with probability of failure: Strong
The total count of uncorrectable errors when reading/writing a sector. A rise in the value of this attribute indicates defects of the disk surface and/or problems in the mechanical subsystem.
In the 60 days following the first uncorrectable error on a drive (S.M.A.R.T. attribute 0xC6 or 198) detected as a result of an offline scan, the drive was, on average, 39 times more likely to fail than a similar drive for which no such error occurred.
Non critical SMART attributes
The next attributes appear to change in the logs but that doesn't mean that there is anything going wrong
Hardware ECC Recovered
ID: 195 (0xC3) Ideal: Varies Correlation with probability of failure: Low
(Vendor-specific raw value.) The raw value has different structure for different vendors and is often not meaningful as a decimal number. For some drives, this number may increase during normal operation without necessarily signifying errors.
-
New: Monitorization.
To monitor your drive health you can use prometheus with alertmanager for alerts and grafana for dashboards.
Installing the exporter
The prometheus community has it's own smartctl exporter
Using the binary
You can download the latest binary from the repository releases and configure the systemd service
unp smartctl_exporter-0.13.0.linux-amd64.tar.gz sudo mv smartctl_exporter-0.13.0.linux-amd64/smartctl_exporter /usr/bin
Add the service to
/etc/systemd/system/smartctl-exporter.service
[Unit] Description=smartctl exporter service After=network-online.target [Service] Type=simple PIDFile=/run/smartctl_exporter.pid ExecStart=/usr/bin/smartctl_exporter User=root Group=root SyslogIdentifier=smartctl_exporter Restart=on-failure RemainAfterExit=no RestartSec=100ms StandardOutput=journal StandardError=journal [Install] WantedBy=multi-user.target
hen enable it:
sudo systemctl enable smartctl-exporter sudo service smartctl-exporter start
--- services: smartctl-exporter: container_name: smartctl-exporter image: prometheuscommunity/smartctl-exporter privileged: true user: root ports: - "9633:9633"
Configuring prometheus
Add the next scraping metrics:
- job_name: smartctl_exporter metrics_path: /metrics scrape_timeout: 60s static_configs: - targets: [smartctl-exporter:9633] labels: hostname: "your-hostname"
Configuring the alerts
Taking as a reference the awesome prometheus rules and this wired post I'm using the next rules:
--- groups: - name: smartctl exporter rules: - alert: SmartDeviceTemperatureWarning expr: smartctl_device_temperature > 60 for: 2m labels: severity: warning annotations: summary: Smart device temperature warning (instance {{ $labels.hostname }}) description: "Device temperature warning (instance {{ $labels.hostname }})\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: SmartDeviceTemperatureCritical expr: smartctl_device_temperature > 80 for: 2m labels: severity: critical annotations: summary: Smart device temperature critical (instance {{ $labels.hostname }}) description: "Device temperature critical (instance {{ $labels.hostname }})\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: SmartCriticalWarning expr: smartctl_device_critical_warning > 0 for: 15m labels: severity: critical annotations: summary: Smart critical warning (instance {{ $labels.hostname }}) description: "device has critical warning (instance {{ $labels.hostname }})\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: SmartNvmeWearoutIndicator expr: smartctl_device_available_spare{device=~"nvme.*"} < smartctl_device_available_spare_threshold{device=~"nvme.*"} for: 15m labels: severity: critical annotations: summary: Smart NVME Wearout Indicator (instance {{ $labels.hostname }}) description: "NVMe device is wearing out (instance {{ $labels.hostname }})\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: SmartNvmeMediaError expr: smartctl_device_media_errors > 0 for: 15m labels: severity: warning annotations: summary: Smart NVME Media errors (instance {{ $labels.hostname }}) description: "Contains the number of occurrences where the controller detected an unrecovered data integrity error. Errors such as uncorrectable ECC, CRC checksum failure, or LBA tag mismatch are included in this field (instance {{ $labels.hostname }})\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: SmartSmartStatusError expr: smartctl_device_smart_status < 1 for: 15m labels: severity: critical annotations: summary: Smart general status error (instance {{ $labels.hostname }}) description: " (instance {{ $labels.hostname }})\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: DiskReallocatedSectorsIncreased expr: smartctl_device_attribute{attribute_id="5", attribute_value_type="raw"} > max_over_time(smartctl_device_attribute{attribute_id="5", attribute_value_type="raw"}[1h]) labels: severity: warning annotations: summary: "SMART Attribute Reallocated Sectors Count Increased" description: "The SMART attribute 5 (Reallocated Sectors Count) has increased on {{ $labels.device }} (instance {{ $labels.hostname }})\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: DiskSpinRetryCountIncreased expr: smartctl_device_attribute{attribute_id="10", attribute_value_type="raw"} > max_over_time(smartctl_device_attribute{attribute_id="10", attribute_value_type="raw"}[1h]) labels: severity: warning annotations: summary: "SMART Attribute Spin Retry Count Increased" description: "The SMART attribute 10 (Spin Retry Count) has increased on {{ $labels.device }} (instance {{ $labels.hostname }})\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: DiskCurrentPendingSectorCountIncreased expr: smartctl_device_attribute{attribute_id="197", attribute_value_type="raw"} > max_over_time(smartctl_device_attribute{attribute_id="197", attribute_value_type="raw"}[1h]) labels: severity: warning annotations: summary: "SMART Attribute Current Pending Sector Count Increased" description: "The SMART attribute 197 (Current Pending Sector Count) has increased on {{ $labels.device }} (instance {{ $labels.hostname }})\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: DiskUncorrectableSectorCountIncreased expr: smartctl_device_attribute{attribute_id="198", attribute_value_type="raw"} > max_over_time(smartctl_device_attribute{attribute_id="198", attribute_value_type="raw"}[1h]) labels: severity: warning annotations: summary: "SMART Attribute Uncorrectable Sector Count Increased" description: "The SMART attribute 198 (Uncorrectable Sector Count) has increased on {{ $labels.device }} (instance {{ $labels.hostname }})\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
Configuring the grafana dashboards
Of the different grafana dashboards (1, 2, 3) I went for the first one.
Import it with the UI of grafana, make it work and then export the json to store it in your infra as code respository.
References
-
New: Thoughts on adding new disks to ZFS.
When it comes to expanding an existing ZFS storage system, careful consideration is crucial. In my case, I faced a decision point with my storage cluster: after two years of reliable service from my 8TB drives, I needed more capacity. This led me to investigate the best way to integrate newly acquired refurbished 12TB drives into the system. Here's my journey through this decision-making process and the insights gained along the way.
The Starting Point
My existing setup consisted of 8TB drives purchased new, which had been running smoothly for two years. The need for expansion led me to consider refurbished 12TB drives as a cost-effective solution. However, mixing new and refurbished drives, especially of different capacities, raised several important considerations that needed careful analysis.
Initial Drive Assessment
The first step was to evaluate the reliability of all drives. Using
smartctl
, I analyzed the SMART data across both the existing and new drives:for disk in a b c d e f g h i; do echo "/dev/sd$disk: old $(smartctl -a /dev/sd$disk | grep Old | wc -l) pre-fail: $(smartctl -a /dev/sd$disk | grep Pre- | wc -l)" done
The results showed similar values across all drives, with "Old_Age" attributes ranging from 14-17 and "Pre-fail" attributes between 3-6. While this indicated all drives were aging, they were still functioning with acceptable parameters. However, raw SMART data doesn't tell the whole story, especially when comparing new versus refurbished drives.
Drive Reliability Considerations
After careful evaluation, I found myself trusting the existing 8TB drives more than the newer refurbished 12TB ones. This conclusion was based on several factors:
- The 8TB drives had a proven track record in my specific environment
- Their smaller size meant faster resilver times, reducing the window of vulnerability during recovery
- One of the refurbished 12TB drives was already showing concerning symptoms (8 reallocated sectors, although a badblocks didn't increase that number), which reduced confidence in the entire batch
- The existing drives were purchased new, while the 12TB drives were refurbished, adding an extra layer of uncertainty
Layout Options Analysis
When expanding a ZFS system, there's always the temptation to simply add more vdevs to the existing pool. However, I investigated two main approaches:
- Creating a new separate ZFS pool with the new disks
- Add another vdev to the existent pool
Resilver time
Adding the 12TB drives to the pool and redistributing the data across all 8 drives will help reduce the resilver time. Here's a detailed breakdown:
-
Current Situation
-
4x 8TB drives at 95% capacity means each drive is heavily packed
- High data density means longer resilver times
-
Limited free space for data movement and reconstruction
-
After Adding 12TB Drives
-
Total pool capacity increases significantly
- ZFS will automatically start rebalancing data across all 8 drives
- This process (sometimes called "data shuffling" or "data redistribution") has several benefits:
- Reduces data density per drive
- Creates more free space
- Improves overall pool performance
-
Potentially reduces future resilver times
-
Resilver Time Reduction Mechanism
-
With data spread across more drives, each individual drive has less data to resilver
- Less data per drive = faster resilver process
- The redistribution happens gradually and in the background
Understanding Failure Scenarios
The key differentiator between these approaches came down to failure scenarios:
Single Drive Failure
Both configurations handle single drive failures similarly, though the 12TB drives' longer resilver time creates a longer window of vulnerability in the two-vdev configuration if the data load is evenly shared between the disks. This is particularly concerning with refurbished drives, where the failure probability might be higher.
However if as soon as you add the other vdev to the pool you defragment the data inside zfs, the 8TB drives will be less filled, so until more data is added you may reduce the resilver time as they have less data.
Double Drive Failure
This is where the configurations differ significantly:
- In a two-vdev pool, losing two drives from the same vdev would cause complete pool failure
- With separate pools, a double drive failure would only affect one pool, allowing the other to continue operating. This way you can store the critical data on the pool you trust more.
- Given the mixed drive origins (new vs refurbished), isolating potential failures becomes more critical
Performance Considerations
While investigating performance implications, I found several interesting points about IOPS and throughput:
- ZFS stripes data across vdevs, meaning more vdevs generally means better IOPS
- In RAIDZ configurations, IOPS are limited by the slowest drive in the vdev
- Multiple mirrored vdevs provide the best combined IOPS performance
- Streaming speeds scale with the number of data disks in a RAIDZ vdev
- When mixing drive sizes, ZFS tends to favor larger vdevs, which could lead to uneven wear
Easiness of configuration
Cache and log
If you already have a zpool with a cache and logs in nvme, then if you were to use two pools, you'd need to reformat your nvme drives to create space for the new partitions needed for the new zpool.
This would allow you to specify different cache sizes for each pool. But it comes at the cost of a more complex operation.
New pool creation
Adding a vdev to an existing pool is quicker and easier than to create a zpool. You need to make sure that you initialise it with the correct configuration.
Storage management
Having two pools doubles the operation tasks. One of the pools are to be filled soon, so you may need to manually move files and directories around to rebalance it.
Final Decision
After weighing all factors, if you favour reliability over easiness of your life implement two separate ZFS pools. This statement is primarily driven by:
- Enhanced Reliability: By separating the pools, we can maintain service availability even if one pool fails completely
- Data Prioritization: This allows placing critical application data on the more reliable pool (8TB drives), while using the refurbished drives for less critical data like media files
- Risk Isolation: Keeping the proven, new-purchased drives separate from the refurbished ones minimizes the impact of potential issues with the refurbished drives
- Consistent Performance: Following the best practice of keeping same-sized drives together in pools
However I'm currently favouring easiness of life and trust my backup solution (I hope not to read this line in the future with regret :P), so I'll go with two vdevs.
Key Takeaways
Through this investigation, I learned several important lessons about ZFS storage design:
- Raw parity drive count isn't the only reliability metric - configuration matters more than simple redundancy numbers
- Pool layout significantly impacts both performance and failure scenarios
- Sometimes simpler configurations (like separate pools) can provide better overall reliability than more complex ones
- Consider the full lifecycle of the storage, including maintenance operations like resilver times
- When expanding storage, don't underestimate the value of isolating different generations or sources of hardware
- The history and source of drives (new vs refurbished) should influence your pool design decisions
This investigation reinforced that storage design isn't just about maximizing space or performance - it's about finding the right balance of reliability, performance, and manageability for your specific needs. When dealing with mixed drive sources and different capacities, this balance becomes even more critical.
References and further reading
diff --git a/mkdocs.yml b/mkdocs.yml index 74292dd717..596f1506e8 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -103,6 +103,8 @@ nav: - Email clients: - himalaya: himalaya.md - alot: alot.md + - k9: k9.md + - Email protocols: - Maildir: maildir.md - Instant Messages Management: @@ -371,6 +373,7 @@ nav: - File management configuration: - NeoTree: neotree.md - Telescope: telescope.md + - fzf.nvim: fzf_nvim.md - Editing specific configuration: - vim_editor_plugins.md - Vim formatters: vim_formatters.md @@ -566,7 +569,10 @@ nav: - OpenZFS storage planning: zfs_storage_planning.md - Sanoid: sanoid.md - ZFS Prometheus exporter: zfs_exporter.md - - Hard drive health: hard_drive_health.md + - Hard drive health: + - hard_drive_health.md + - Smartctl: smartctl.md + - badblocks: badblocks.md - Resilience: - linux_resilience.md - Memtest: memtest.md @@ -768,7 +774,8 @@ nav: # - Streaming channels: streaming_channels.md - Music: - Sister Rosetta Tharpe: sister_rosetta_tharpe.md - - Video Gaming: + - Videogames: + - DragonSweeper: dragonsweeper.md - King Arthur Gold: kag.md - The Battle for Wesnoth: - The Battle for Wesnoth: wesnoth.md
badblocks⚑
-
New: Check the health of a disk with badblocks.
The
badblocks
command will write and read the disk with different patterns, thus overwriting the whole disk, so you will loose all the data in the disk.This test is good for rotational disks as there is no disk degradation on massive writes, do not use it on SSD though.
WARNING: be sure that you specify the correct disk!!
badblocks -wsv -b 4096 /dev/sde | tee disk_analysis_log.txt
If errors are shown is that all of the spare sectors of the disk are used, so you must not use this disk anymore. Again, check
dmesg
for traces of disk errors. -
New: Removing a disk from the pool.
zpool remove tank0 sda
This will trigger the data evacuation from the disk. Check
zpool status
to see when it finishes. -
New: Encrypting ZFS Drives with LUKS.
Warning: Proceed with Extreme Caution
IMPORTANT SAFETY NOTICE:
- These instructions will COMPLETELY WIPE the target drive
- Do NOT attempt on production servers
- Experiment only on drives with no valuable data
- Seek professional help if anything is unclear
Prerequisites
- A drive you want to encrypt (will be referred to as
/dev/sdx
) - Root access
- Basic understanding of Linux command line
- Backup of all important data
Step 1: Create LUKS Encryption Layer
First, format the drive with LUKS encryption:
sudo cryptsetup luksFormat /dev/sdx
- You'll be prompted for a sudo password
- Create a strong encryption password (mix of uppercase, lowercase, numbers, symbols)
- Note the precise capitalization in commands
Step 2: Open the Encrypted Disk
Open the newly encrypted disk:
sudo cryptsetup luksOpen /dev/sdx sdx_crypt
This creates a mapped device at
/dev/mapper/sdx_crypt
Step 3: Create ZFS Pool or the vdev
For example to create a ZFS pool on the encrypted device:
sudo zpool create -f -o ashift=12 \ -O compression=lz4 \ + zpool /dev/mapper/sdx_crypt
Check the create zpool section to know which configuration flags to use.
Step 4: Set Up Automatic Unlocking
Generate a Keyfile
Create a random binary keyfile:
sudo dd bs=1024 count=4 if=/dev/urandom of=/etc/zfs/keys/sdx.key sudo chmod 0400 /etc/zfs/keys/sdx.key
Add Keyfile to LUKS
Add the keyfile to the LUKS disk:
sudo cryptsetup luksAddKey /dev/sdx /etc/zfs/keys/sdx.key
- You'll be asked to enter the original encryption password
- This adds the binary file to the LUKS disk header
- Now you can unlock the drive using either the password or the keyfile
Step 5: Configure Automatic Mounting
Find Drive UUID
Get the drive's UUID:
sudo blkid
Look for the line with
TYPE="crypto_LUKS"
. Copy the UUID.Update Crypttab
Edit the crypttab file:
sudo vim /etc/crypttab
Add an entry like:
sdx_crypt UUID=your-uuid-here /etc/zfs/keys/sdx.key luks,discard
Final Step: Reboot
- Reboot your system
- The drive will be automatically decrypted and imported
Best Practices
+- Keep your keyfile and encryption password secure - Store keyfiles with restricted permissions - Consider backing up the LUKS header
Troubleshooting
- Double-check UUIDs
- Verify keyfile permissions
- Ensure cryptsetup and ZFS are installed
Security Notes
- This method provides full-disk encryption at rest
- Data is inaccessible without the key or password
- Protects against physical drive theft
Disclaimer
While these instructions are comprehensive, they come with inherent risks. Always:
- Have backups
- Test in non-critical environments first
- Understand each step before executing
Further reading
-
New: Add a disk to an existing vdev.
zpool add tank /dev/sdx
-
New: Add a vdev to an existing pool.
``bash zpool add main raidz1-1 /dev/disk-1 /dev/disk-2 /dev/disk-3 /dev/disk-4 ```
You don't need to specify the
ashift
or theautoexpand
as they are set on zpool creation. -
New: Add zfs book.
Authentication⚑
Authentik⚑
-
New: Add api and library docs.
There is a python library
Operating Systems⚑
Linux⚑
Linux Snippets⚑
-
New: Record the audio from your computer.
You can record audio being played in a browser using
ffmpeg
- Check your default audio source:
pactl list sources | grep -E 'Name|Description'
- Record using
ffmpeg
:
ffmpeg -f pulse -i <your_monitor_source> output.wav
Example:
ffmpeg -f pulse -i alsa_output.pci-0000_00_1b.0.analog-stereo.monitor output.wav
- Stop recording with Ctrl+C.
Relevant content⚑
Videogames⚑
DragonSweeper⚑
-
New: Introduce dragonsweeper.
DragonSweeper is an addictive simple RPG-tinged take on the Minesweeper formula. You can play it for free in your browser.
If you're lost at the beginning start reading the ArsTechnica blog post.
Tips
- Use
Shift
to mark numbers you already know.
References
- Use